From gnb@melbourne.sgi.com Wed Jun 1 01:17:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 01:17:32 -0700 (PDT) Received: from larry.melbourne.sgi.com (mverd138.asia.info.net [61.14.31.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j518HRXq022082 for ; Wed, 1 Jun 2005 01:17:28 -0700 Received: from [134.14.55.176] (hole.melbourne.sgi.com [134.14.55.176]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA15672; Wed, 1 Jun 2005 18:16:26 +1000 Subject: Re: Locking model for NAPI drivers From: Greg Banks To: "David S. Miller" Cc: Linux Network Development list In-Reply-To: <20050531.154847.63995530.davem@davemloft.net> References: <20050531.154847.63995530.davem@davemloft.net> Content-Type: text/plain Organization: Silicon Graphics Inc, Australian Software Group. Message-Id: <1117613796.26331.2479.camel@hole.melbourne.sgi.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6-1mdk Date: Wed, 01 Jun 2005 18:16:36 +1000 Content-Transfer-Encoding: 7bit X-archive-position: 1940 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gnb@melbourne.sgi.com Precedence: bulk X-list: netdev On Wed, 2005-06-01 at 08:48, David S. Miller wrote: > So the idea is, if we can make all of the spinlocks BH locks we'll > solve a whole bunch of problems: > [...] > 2) the driver will actually produce useful profiling data > via oprofile and friends since timer interrupts will run > even while holding the locks That would be really, really nice. Greg. -- Greg Banks, R&D Software Engineer, SGI Australian Software Group. I don't speak for SGI. From herbert@gondor.apana.org.au Wed Jun 1 01:44:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 01:44:09 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j518i1Xq023834 for ; Wed, 1 Jun 2005 01:44:03 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DdOoK-0005CB-00; Wed, 01 Jun 2005 18:42:48 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DdOoF-00082p-00; Wed, 01 Jun 2005 18:42:43 +1000 From: Herbert Xu To: ak@muc.de (Andi Kleen) Subject: Re: Locking model for NAPI drivers Cc: davem@davemloft.net, netdev@oss.sgi.com Organization: Core In-Reply-To: X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Wed, 01 Jun 2005 18:42:43 +1000 X-archive-position: 1941 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Andi Kleen wrote: > > That is because of the kmap_atomic it does right? At least in the i386 > highmem implementation I don't see any code that would be less safe in > hard interrupt context compared to BHs. And FRV and mips look like they > allow it too. To make it safe we'll have to allocate another precious km_type entry. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From raghunathan.venkatesan@wipro.com Wed Jun 1 04:40:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 04:41:12 -0700 (PDT) Received: from wip-ec-wd.wipro.com (wip-ec-wd.wipro.com [203.101.113.39]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51BepXq004532 for ; Wed, 1 Jun 2005 04:40:54 -0700 Received: from wip-ec-wd.wipro.com (localhost.wipro.com [127.0.0.1]) by localhost (Postfix) with ESMTP id 5CA9D205E4; Wed, 1 Jun 2005 17:00:54 +0530 (IST) Received: from blr-ec-bh01.wipro.com (unknown [10.201.50.91]) by wip-ec-wd.wipro.com (Postfix) with ESMTP id 3B276205E1; Wed, 1 Jun 2005 17:00:54 +0530 (IST) Received: from chn-snr-bh2.wipro.com ([10.145.50.92]) by blr-ec-bh01.wipro.com with Microsoft SMTPSVC(6.0.3790.211); Wed, 1 Jun 2005 17:09:48 +0530 Received: from CHN-SNR-MBX01.wipro.com ([10.145.50.181]) by chn-snr-bh2.wipro.com with Microsoft SMTPSVC(6.0.3790.0); Wed, 1 Jun 2005 17:04:44 +0530 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_001_01C5669D.E951F95B" Subject: Unable to handle kernel paging request at virtual address 04000460 Date: Wed, 1 Jun 2005 17:01:23 +0530 Message-ID: <438662DA48DCAA41B1DF648BD4BD76C0E45DF1@CHN-SNR-MBX01.wipro.com> X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Thread-Topic: Unable to handle kernel paging request at virtual address 04000460 Thread-Index: AcVgd+bKyjXc1BZZTzOXOhBhBJ2c9wAwfS0wAFOR2UABBPMC8A== From: To: , , X-OriginalArrivalTime: 01 Jun 2005 11:34:44.0715 (UTC) FILETIME=[E8897BB0:01C5669D] X-archive-position: 1942 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: raghunathan.venkatesan@wipro.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. ------_=_NextPart_001_01C5669D.E951F95B Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Everyone, We are facing the following crash in custom Linux 2.4.26 kernel, when we run a netperf TCP Stream (sizes varying from 64 to 32586 bytes) test over an IPSEC tunnel created between a host and a VPN server through our box. This is a Au1550 MIPS32 based board (DB1550 Cabernet board from AMD). We observe that crash happens randomly (the PrId keeps changing at each crash), because of burstiness in the netperf tool generated traffic. Please look into the following capture below. I'd like some help in debugging this issue. The same set of IPSEC drivers (not from Linux) works fine on a custom Linux 2.4.25 based kernel. We debugged the Oops traces and found that all problems arise in skbuff (donno where in skbuff). Is there a patch that needs to be applied for Linux 2.4.26 ?=20 Thanks & Regards, Raghu Venkatesan Project Manager (E & PE, Semiconductor & Access), CDC2, Sozhanganallur, Chennai - 600 119, INDIA +91 -44-24500200 Ext. 2643 raghunathan.venkatesan@wipro.com =20 ------_=_NextPart_001_01C5669D.E951F95B Content-Type: application/octet-stream; name="recent.cap_send1.oops" Content-Transfer-Encoding: base64 Content-Description: recent.cap_send1.oops Content-Disposition: attachment; filename="recent.cap_send1.oops" a3N5bW9vcHMgMi40Ljkgb24gaTY4NiAyLjQuMjItMS4yMTE1Lm5wdGwuICBPcHRpb25zIHVzZWQK ICAgICAtdiAvaG9tZS9hbWQvcHJvamVjdC9hbWQva2VybmVsL3ZtbGludXggKGRlZmF1bHQpCiAg ICAgLUsgKHNwZWNpZmllZCkKICAgICAtbCAvcHJvYy9tb2R1bGVzIChkZWZhdWx0KQogICAgIC1v IC9ob21lL2FtZC9wcm9qZWN0L2FtZC9maWxlc3lzdGVtL3Vzci9saWIvbW9kdWxlcy8gKGRlZmF1 bHQpCiAgICAgLW0gL2hvbWUvYW1kL3Byb2plY3QvYW1kL2tlcm5lbC9TeXN0ZW0ubWFwIChkZWZh dWx0KQogICAgIC10IGVsZjMyLWxpdHRsZW1pcHMgLWEgbWlwczo0NjAwCgpObyBtb2R1bGVzIGlu IGtzeW1zLCBza2lwcGluZyBvYmplY3RzCk5vIGtzeW1zLCBza2lwcGluZyBsc21vZApVbmFibGUg dG8gaGFuZGxlIGtlcm5lbCBwYWdpbmcgcmVxdWVzdCBhdCB2aXJ0dWFsIGFkZHJlc3MgMDIwMDA0 ZDQsIGVwYyA9PSA4MDI0YWY2YywgcmEgPT0gODAyNGIwOTQKT29wcyBpbiBmYXVsdC5jOjpkb19w YWdlX2ZhdWx0LCBsaW5lIDIwNjoKJDAgOiAwMDAwMDAwMCAxMDAwZmMwMCA4YWJiYjYwMCAwMjAw MDQ2MCAwMjAwMDQ2MCA4YWJiYjVlYyAwMDAwMDAwMCAwMDAwMDVlYwokOCA6IDVhZDM0MzZlIDhh YmJiZGVjIGIzZGU1ZDcxIDU2NzM2OTg4IDA3ODNmZGZiIDgwMzIzODU4IDgwMzIzODA0IDI0ZTEy YWU1CiQxNjogMDIwMDA0NjAgMDAwMDAwMDEgOGFiYmI4MDAgMDAwMDA2MDAgMDAwMDBjZmMgMDAw MDA1ZGMgMDAwMDAwMTQgMDAwMDM0MDgKJDI0OiAwMDAwMDAwMCAyYWVhM2M3MCAgICAgICAgICAg ICAgICAgICA4MDMyMjAwMCA4MDMyM2EyOCAwMDAwMzQxYyA4MDI0YjA5NApIaSA6IDAwMDAwMDAw CkxvIDogMDAwMDA4MDAKZXBjICAgOiA4MDI0YWY2YyAgICBOb3QgdGFpbnRlZApTdGF0dXM6IDEw MDBmYzAzCkNhdXNlIDogMDA4MDAwMDgKUHJvY2VzcyBzd2FwcGVyIChwaWQ6IDAsIHN0YWNrcGFn ZT04MDMyMjAwMCkKU3RhY2s6ICAgIDhiOTYyNDgwIDAwMDAwMDAwIDAwMDAwMDAwIDAwMDAwMDAw IDAwMDAwODAwIDhiNmFmNDYwIDgwMjRiMDk0CiA4YjZhZjQ2MCA4YWJiYjgwMCAwMDAwMDYwMCAw MDAwMGNmYyAwMDAwMDVkYyAwMDAwMDgwMCA4YjZhZjQ2MCA4MDI0YjdjYwogODAyNGI3YjAgMjRl MTJhZTUgODAzMjM4NTggODAzMjM4MDQgYzAxYzIwNTAgOGI2YWY0NjAgODAzYTA0MDAgMDAwMDA1 YzgKIDgxMmJlMzAwIDgwMjUwMWQ0IDAwMDAwNWRjIDAwMDAwMDE0IDAwMDAyZTQwIDAwMDAwMDAw IDJhZWEzYzcwIDhiNmFmNDYwCiA4YWVhMTE2MCAwMDAwMDVjOCA4MDI2YTllOCAwMDAwMmU1NCA4 MDI2YTE4NCAxMDAwZmMwMyAwMDAwMDAwMCA4YjZhZjQ2MAogOGFiYmIwMTAgLi4uCkNhbGwgVHJh Y2U6ICAgWzw4MDI0YjA5ND5dIFs8ODAyNGI3Y2M+XSBbPDgwMjRiN2IwPl0gWzw4MDI1MDFkND5d IFs8ODAyNmE5ZTg+XQogWzw4MDI2YTE4ND5dIFs8ODAyNmEzMGM+XSBbPDgwMjZhMWRjPl0gWzw4 MDI2YTkwYz5dIFs8ODAyNmE5MGM+XSBbPDgwMjljNDE4Pl0KIFs8ODAyNmE5MGM+XSBbPDgwMjZh OTBjPl0gWzw4MDI1YTQ4ND5dIFs8ODAyNmE5MGM+XSBbPDgwMjZhOTBjPl0gWzw4MDI1YTk0OD5d CiBbPDgwMmRhMGUwPl0gWzw4MDI2YTkwYz5dIFs8ODAyNmE4ZDQ+XSBbPDgwMjZhOTBjPl0gWzw4 MDI2YTMwYz5dIFs8ODAyNmExODQ+XQogWzw4MDI2NzEzMD5dIFs8ODAyNjcxYjA+XSBbPDgwMjZh NzQ0Pl0gWzw4MDI1YTk4Yz5dIFs8ODAyOWVkODg+XSBbPDgwMjY3MTMwPl0KIFs8ODAyOWZkMzQ+ XSBbPDgwMjY3MDZjPl0gWzw4MDI2NzEzMD5dIFs8ODAyNjU3Zjg+XSBbPDgwMjY1YTIwPl0gWzw4 MDI1YTQ4ND5dCiBbPGMwMWNlMmE4Pl0gWzw4MDI2NTdmOD5dIFs8ODAyNjU3Zjg+XSBbPDgwMjVh OThjPl0gWzw4MDI1YTk0OD5dIC4uLgpXYXJuaW5nIChPb3BzX3RyYWNlX2xpbmUpOiBnYXJiYWdl ICcuLi4nIGF0IGVuZCBvZiB0cmFjZSBsaW5lIGlnbm9yZWQKQ29kZTogOGM1MDAwMDggIGFjNDAw MDA4ICAwMjAwMjAyMSA8OGM4MjAwNzQ+IDEwNTEwMDA5ICA4ZTEwMDAwMCAgYzA4MzAwNzQgIDAw NzExMDIzICBlMDgyMDA3NAoKCj4+UkE7ICAwMDAwMDAwMDgwMjRiMDk0IDxza2JfcmVsZWFzZV9k YXRhK2IwL2JjPgo+PiQxMzsgMDAwMDAwMDA4MDMyMzg1OCA8aW5pdF90YXNrX3VuaW9uKzE4NTgv MjAwMD4KPj4kMTQ7IDAwMDAwMDAwODAzMjM4MDQgPGluaXRfdGFza191bmlvbisxODA0LzIwMDA+ Cj4+JDI4OyAwMDAwMDAwMDgwMzIyMDAwIDxpbml0X3Rhc2tfdW5pb24rMC8yMDAwPgo+PiQyOTsg MDAwMDAwMDA4MDMyM2EyOCA8aW5pdF90YXNrX3VuaW9uKzFhMjgvMjAwMD4KPj4kMzE7IDAwMDAw MDAwODAyNGIwOTQgPHNrYl9yZWxlYXNlX2RhdGErYjAvYmM+Cgo+PlBDOyAgMDAwMDAwMDA4MDI0 YWY2YyA8c2tiX2Ryb3BfZnJhZ2xpc3QrMzQvNzQ+ICAgPD09PT09CgpUcmFjZTsgMDAwMDAwMDA4 MDI0YjA5NCA8c2tiX3JlbGVhc2VfZGF0YStiMC9iYz4KVHJhY2U7IDAwMDAwMDAwODAyNGI3Y2Mg PHNrYl9saW5lYXJpemUrYzQvMTRjPgpUcmFjZTsgMDAwMDAwMDA4MDI0YjdiMCA8c2tiX2xpbmVh cml6ZSthOC8xNGM+ClRyYWNlOyAwMDAwMDAwMDgwMjUwMWQ0IDxkZXZfcXVldWVfeG1pdCs1MC8z Yjg+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOWU4IDxpcF9maW5pc2hfb3V0cHV0MitlYy8xNTA+ClRy YWNlOyAwMDAwMDAwMDgwMjZhMTg0IDxpcF9mcmFnbWVudCsyNDAvNTAwPgpUcmFjZTsgMDAwMDAw MDA4MDI2YTMwYyA8aXBfZnJhZ21lbnQrM2M4LzUwMD4KVHJhY2U7IDAwMDAwMDAwODAyNmExZGMg PGlwX2ZyYWdtZW50KzI5OC81MDA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBjIDxpcF9maW5pc2hf b3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBjIDxpcF9maW5pc2hfb3V0cHV0 MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjljNDE4IDxpcF9yZWZyYWcrNjgvNzQ+ClRyYWNl OyAwMDAwMDAwMDgwMjZhOTBjIDxpcF9maW5pc2hfb3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAw MDAwMDgwMjZhOTBjIDxpcF9maW5pc2hfb3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgw MjVhNDg0IDxuZl9pdGVyYXRlKzk0LzExND4KVHJhY2U7IDAwMDAwMDAwODAyNmE5MGMgPGlwX2Zp bmlzaF9vdXRwdXQyKzEwLzE1MD4KVHJhY2U7IDAwMDAwMDAwODAyNmE5MGMgPGlwX2ZpbmlzaF9v dXRwdXQyKzEwLzE1MD4KVHJhY2U7IDAwMDAwMDAwODAyNWE5NDggPG5mX2hvb2tfc2xvdysxMjgv MWY4PgpUcmFjZTsgMDAwMDAwMDA4MDJkYTBlMCA8bWVtc2V0KzAvMWM+ClRyYWNlOyAwMDAwMDAw MDgwMjZhOTBjIDxpcF9maW5pc2hfb3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZh OGQ0IDxpcF9maW5pc2hfb3V0cHV0KzFhMC8xYTQ+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBjIDxp cF9maW5pc2hfb3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhMzBjIDxpcF9mcmFn bWVudCszYzgvNTAwPgpUcmFjZTsgMDAwMDAwMDA4MDI2YTE4NCA8aXBfZnJhZ21lbnQrMjQwLzUw MD4KVHJhY2U7IDAwMDAwMDAwODAyNjcxMzAgPGlwX2ZvcndhcmRfZmluaXNoKzEwL2EwPgpUcmFj ZTsgMDAwMDAwMDA4MDI2NzFiMCA8aXBfZm9yd2FyZF9maW5pc2grOTAvYTA+ClRyYWNlOyAwMDAw MDAwMDgwMjZhNzQ0IDxpcF9maW5pc2hfb3V0cHV0KzEwLzFhND4KVHJhY2U7IDAwMDAwMDAwODAy NWE5OGMgPG5mX2hvb2tfc2xvdysxNmMvMWY4PgpUcmFjZTsgMDAwMDAwMDA4MDI5ZWQ4OCA8aXBf Y3RfcmVmcmVzaCs4NC9iOD4KVHJhY2U7IDAwMDAwMDAwODAyNjcxMzAgPGlwX2ZvcndhcmRfZmlu aXNoKzEwL2EwPgpUcmFjZTsgMDAwMDAwMDA4MDI5ZmQzNCA8aWNtcF9wYWNrZXQrOTgvOWM+ClRy YWNlOyAwMDAwMDAwMDgwMjY3MDZjIDxfX2dudV9jb21waWxlZF9jKzI2Yy8zMjA+ClRyYWNlOyAw MDAwMDAwMDgwMjY3MTMwIDxpcF9mb3J3YXJkX2ZpbmlzaCsxMC9hMD4KVHJhY2U7IDAwMDAwMDAw ODAyNjU3ZjggPGlwX3Jjdl9maW5pc2grMTAvMmE4PgpUcmFjZTsgMDAwMDAwMDA4MDI2NWEyMCA8 aXBfcmN2X2ZpbmlzaCsyMzgvMmE4PgpUcmFjZTsgMDAwMDAwMDA4MDI1YTQ4NCA8bmZfaXRlcmF0 ZSs5NC8xMTQ+ClRyYWNlOyAwMDAwMDAwMGMwMWNlMmE4IDxFTkRfT0ZfQ09ERSszZmUzYmFhOC8/ Pz8/PgpUcmFjZTsgMDAwMDAwMDA4MDI2NTdmOCA8aXBfcmN2X2ZpbmlzaCsxMC8yYTg+ClRyYWNl OyAwMDAwMDAwMDgwMjY1N2Y4IDxpcF9yY3ZfZmluaXNoKzEwLzJhOD4KVHJhY2U7IDAwMDAwMDAw ODAyNWE5OGMgPG5mX2hvb2tfc2xvdysxNmMvMWY4PgpUcmFjZTsgMDAwMDAwMDA4MDI1YTk0OCA8 bmZfaG9va19zbG93KzEyOC8xZjg+CgpDb2RlOyAgMDAwMDAwMDA4MDI0YWY2MCA8c2tiX2Ryb3Bf ZnJhZ2xpc3QrMjgvNzQ+CjAwMDAwMDAwIDxfUEM+OgpDb2RlOyAgMDAwMDAwMDA4MDI0YWY2MCA8 c2tiX2Ryb3BfZnJhZ2xpc3QrMjgvNzQ+CiAgIDA6ICAgOGM1MDAwMDggIGx3ICAgICAgczAsOCh2 MCkKQ29kZTsgIDAwMDAwMDAwODAyNGFmNjQgPHNrYl9kcm9wX2ZyYWdsaXN0KzJjLzc0PgogICA0 OiAgIGFjNDAwMDA4ICBzdyAgICAgIHplcm8sOCh2MCkKQ29kZTsgIDAwMDAwMDAwODAyNGFmNjgg PHNrYl9kcm9wX2ZyYWdsaXN0KzMwLzc0PgogICA4OiAgIDAyMDAyMDIxICBtb3ZlICAgIGEwLHMw CkNvZGU7ICAwMDAwMDAwMDgwMjRhZjZjIDxza2JfZHJvcF9mcmFnbGlzdCszNC83ND4gICA8PT09 PT0KICAgYzogICA4YzgyMDA3NCAgbHcgICAgICB2MCwxMTYoYTApICAgPD09PT09CkNvZGU7ICAw MDAwMDAwMDgwMjRhZjcwIDxza2JfZHJvcF9mcmFnbGlzdCszOC83ND4KICAxMDogICAxMDUxMDAw OSAgYmVxICAgICB2MCxzMSwzOCA8X1BDKzB4Mzg+CkNvZGU7ICAwMDAwMDAwMDgwMjRhZjc0IDxz a2JfZHJvcF9mcmFnbGlzdCszYy83ND4KICAxNDogICA4ZTEwMDAwMCAgbHcgICAgICBzMCwwKHMw KQpDb2RlOyAgMDAwMDAwMDA4MDI0YWY3OCA8c2tiX2Ryb3BfZnJhZ2xpc3QrNDAvNzQ+CiAgMTg6 ICAgYzA4MzAwNzQgIGxsICAgICAgdjEsMTE2KGEwKQpDb2RlOyAgMDAwMDAwMDA4MDI0YWY3YyA8 c2tiX2Ryb3BfZnJhZ2xpc3QrNDQvNzQ+CiAgMWM6ICAgMDA3MTEwMjMgIHN1YnUgICAgdjAsdjEs czEKQ29kZTsgIDAwMDAwMDAwODAyNGFmODAgPHNrYl9kcm9wX2ZyYWdsaXN0KzQ4Lzc0PgogIDIw OiAgIGUwODIwMDc0ICBzYyAgICAgIHYwLDExNihhMCkKCktlcm5lbCBwYW5pYzogQWllZSwga2ls bGluZyBpbnRlcnJ1cHQgaGFuZGxlciEKCjEgd2FybmluZyBpc3N1ZWQuICBSZXN1bHRzIG1heSBu b3QgYmUgcmVsaWFibGUuCg== ------_=_NextPart_001_01C5669D.E951F95B Content-Type: application/octet-stream; name="recent.cap.oops" Content-Transfer-Encoding: base64 Content-Description: recent.cap.oops Content-Disposition: attachment; filename="recent.cap.oops" a3N5bW9vcHMgMi40Ljkgb24gaTY4NiAyLjQuMjItMS4yMTE1Lm5wdGwuICBPcHRpb25zIHVzZWQK ICAgICAtdiAvaG9tZS9hbWQvcHJvamVjdC9hbWQva2VybmVsL3ZtbGludXggKGRlZmF1bHQpCiAg ICAgLUsgKHNwZWNpZmllZCkKICAgICAtbCAvcHJvYy9tb2R1bGVzIChkZWZhdWx0KQogICAgIC1v IC9ob21lL2FtZC9wcm9qZWN0L2FtZC9maWxlc3lzdGVtL3Vzci9saWIvbW9kdWxlcy8gKGRlZmF1 bHQpCiAgICAgLW0gL2hvbWUvYW1kL3Byb2plY3QvYW1kL2tlcm5lbC9TeXN0ZW0ubWFwIChkZWZh dWx0KQogICAgIC10IGVsZjMyLWxpdHRsZW1pcHMgLWEgbWlwczo0NjAwCgpObyBtb2R1bGVzIGlu IGtzeW1zLCBza2lwcGluZyBvYmplY3RzCk5vIGtzeW1zLCBza2lwcGluZyBsc21vZApVbmFibGUg dG8gaGFuZGxlIGtlcm5lbCBwYWdpbmcgcmVxdWVzdCBhdCB2aXJ0dWFsIGFkZHJlc3MgMDQwMDA0 NjAsIGVwYyA9PSA4MDI0YjIwYywgcmEgPT0gODAyYzQ5ZjgKT29wcyBpbiBmYXVsdC5jOjpkb19w YWdlX2ZhdWx0LCBsaW5lIDIwNjoKJDAgOiAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAw MDAwMSA4Yjc4MzU4MCAwMDAwMDAwMCAwNDAwMDQ2MCAwMDAwMDAwMQokOCA6IDAwMDAwMDAwIDAw MDAwMDAwIDAwMDAwMDAyIGQzZDBiMDAwIDgwMzIzYjY4IDAwMDAwMDAwIDgwMzIzZDYwIDdiN2E3 OTc4CiQxNjogODEyYmViMjAgODEyYmViMjAgZmZmZmZmZmYgOGJiMGQ4MDAgODAzYTA4MDQgMDAw MDAwMDAgMDAwMDAwMDIgODAzMjNlMTAKJDI0OiAwMDAwMDAwMCAyYjAwYWM3MCAgICAgICAgICAg ICAgICAgICA4MDMyMjAwMCA4MDMyM2FkMCAwMDAwMjQwMSA4MDJjNDlmOApIaSA6IDAwMDAyMDkx CkxvIDogZDY5MTI4NWUKZXBjICAgOiA4MDI0YjIwYyAgICBOb3QgdGFpbnRlZApTdGF0dXM6IDEw MDBmYzAzCkNhdXNlIDogMDA4MDAwMDgKUHJvY2VzcyBzd2FwcGVyIChwaWQ6IDAsIHN0YWNrcGFn ZT04MDMyMjAwMCkKU3RhY2s6ICAgIDAwMDAwMDAwIDhiYjBkODAwIDgwM2EwODA0IDAwMDAwMDAw IDgxMmJlYjIwIDgwMmM0OWY4IDgwMTA3YzI4CiAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCA4 MTJiZWIyMCA4MTI0ZmM2OCA4YjZhZjVhMCA4MDNhMDgwMCAwMDAwMDAwNAogODAyNTAwODggMDAw MDAwMDAgMDAwMDAwMDAgMDAwMDAwMDAgMDAwMDAwMDAgODEyYjY1NjAgODAzYTA4MDAgOGI2YWY1 YTAKIDgwM2EwODAwIDAwMDAwMDAwIDgwMjVjM2UwIDAwMDAwMDAwIDAwMDAwMDAwIDgwMzIzYzE4 IDgwMzY5YmYwIDgwMzRkN2U4CiA4MDNhMDgwMCAwMDAwMDAwMCA4MDI1MDM3YyA4MDI5YzNlYyAw MDAwMDAwMCA4Yjc4MzU4MCA4YjZhZjVhMCAwMDAwMDAwZQogOGI2YWY1YTAgLi4uCkNhbGwgVHJh Y2U6ICAgWzw4MDJjNDlmOD5dIFs8ODAxMDdjMjg+XSBbPDgwMjUwMDg4Pl0gWzw4MDI1YzNlMD5d IFs8ODAyNTAzN2M+XQogWzw4MDI5YzNlYz5dIFs8ODAyNTczYTg+XSBbPDgwMjVhNDg0Pl0gWzw4 MDI2YTkwYz5dIFs8ODAyNmE5ZTg+XSBbPDgwMjZhOTBjPl0KIFs8ODAyNWE5OGM+XSBbPDgwMjVh OTQ4Pl0gWzw4MDI2YTkwYz5dIFs8ODAyYTNkOTg+XSBbPDgwMjY3MTMwPl0gWzw4MDI2YThkND5d CiBbPDgwMjZhOTBjPl0gWzw4MDI2NzFjMD5dIFs8ODAyNjcxMzA+XSBbPDgwMjVhOThjPl0gWzw4 MDI5Y2Y1MD5dIFs8ODAyNjcxMzA+XQogWzw4MDI5ZmQwND5dIFs8ODAyNjcwNmM+XSBbPDgwMjY3 MTMwPl0gWzw4MDI2NTdmOD5dIFs8ODAyNjVhMjA+XSBbPDgwMjVhNDg0Pl0KIFs8YzAxY2UyYTg+ XSBbPDgwMjY1N2Y4Pl0gWzw4MDI2NTdmOD5dIFs8ODAyNWE5OGM+XSBbPDgwMjVhOTQ4Pl0gWzw4 MDI2NTdmOD5dCiBbPDgwMjY1NWEwPl0gWzw4MDI2NTdmOD5dIFs8ODAyNTBkNDg+XSBbPDgwMmUw MWY0Pl0gWzw4MDEwN2MyOD5dIC4uLgpXYXJuaW5nIChPb3BzX3RyYWNlX2xpbmUpOiBnYXJiYWdl ICcuLi4nIGF0IGVuZCBvZiB0cmFjZSBsaW5lIGlnbm9yZWQKQ29kZTogOGUwNjAwOWMgIDEwYzAw MDBlICAyNDAzMDAwMSA8OGNjMjAwMDA+IGMwNDUwMDAwICAwMGEzMjAyMyAgZTA0NDAwMDAgIDEw ODBmZmZjICAwMGEzMjAyMwoKCj4+UkE7ICAwMDAwMDAwMDgwMmM0OWY4IDxwYWNrZXRfcmN2X3Nw a3QrMjljLzJiMD4KPj4kMTI7IDAwMDAwMDAwODAzMjNiNjggPGluaXRfdGFza191bmlvbisxYjY4 LzIwMDA+Cj4+JDE0OyAwMDAwMDAwMDgwMzIzZDYwIDxpbml0X3Rhc2tfdW5pb24rMWQ2MC8yMDAw Pgo+PiQyMzsgMDAwMDAwMDA4MDMyM2UxMCA8aW5pdF90YXNrX3VuaW9uKzFlMTAvMjAwMD4KPj4k Mjg7IDAwMDAwMDAwODAzMjIwMDAgPGluaXRfdGFza191bmlvbiswLzIwMDA+Cj4+JDI5OyAwMDAw MDAwMDgwMzIzYWQwIDxpbml0X3Rhc2tfdW5pb24rMWFkMC8yMDAwPgo+PiQzMTsgMDAwMDAwMDA4 MDJjNDlmOCA8cGFja2V0X3Jjdl9zcGt0KzI5Yy8yYjA+Cgo+PlBDOyAgMDAwMDAwMDA4MDI0YjIw YyA8X19rZnJlZV9za2IrYTQvMTMwPiAgIDw9PT09PQoKVHJhY2U7IDAwMDAwMDAwODAyYzQ5Zjgg PHBhY2tldF9yY3Zfc3BrdCsyOWMvMmIwPgpUcmFjZTsgMDAwMDAwMDA4MDEwN2MyOCA8ZG9fZ2V0 dGltZW9mZGF5KzU4LzExND4KVHJhY2U7IDAwMDAwMDAwODAyNTAwODggPGRldl9xdWV1ZV94bWl0 X25pdCtiYy8xMTA+ClRyYWNlOyAwMDAwMDAwMDgwMjVjM2UwIDxfX2dudV9jb21waWxlZF9jKzcw LzE0Yz4KVHJhY2U7IDAwMDAwMDAwODAyNTAzN2MgPGRldl9xdWV1ZV94bWl0KzFmOC8zYjg+ClRy YWNlOyAwMDAwMDAwMDgwMjljM2VjIDxpcF9yZWZyYWcrM2MvNzQ+ClRyYWNlOyAwMDAwMDAwMDgw MjU3M2E4IDxuZWlnaF9yZXNvbHZlX291dHB1dCsxZmMvMjljPgpUcmFjZTsgMDAwMDAwMDA4MDI1 YTQ4NCA8bmZfaXRlcmF0ZSs5NC8xMTQ+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBjIDxpcF9maW5p c2hfb3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOWU4IDxpcF9maW5pc2hfb3V0 cHV0MitlYy8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBjIDxpcF9maW5pc2hfb3V0cHV0Misx MC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjVhOThjIDxuZl9ob29rX3Nsb3crMTZjLzFmOD4KVHJh Y2U7IDAwMDAwMDAwODAyNWE5NDggPG5mX2hvb2tfc2xvdysxMjgvMWY4PgpUcmFjZTsgMDAwMDAw MDA4MDI2YTkwYyA8aXBfZmluaXNoX291dHB1dDIrMTAvMTUwPgpUcmFjZTsgMDAwMDAwMDA4MDJh M2Q5OCA8aXB0X2xvY2FsX291dF9ob29rKzQvOGM+ClRyYWNlOyAwMDAwMDAwMDgwMjY3MTMwIDxp cF9mb3J3YXJkX2ZpbmlzaCsxMC9hMD4KVHJhY2U7IDAwMDAwMDAwODAyNmE4ZDQgPGlwX2Zpbmlz aF9vdXRwdXQrMWEwLzFhND4KVHJhY2U7IDAwMDAwMDAwODAyNmE5MGMgPGlwX2ZpbmlzaF9vdXRw dXQyKzEwLzE1MD4KVHJhY2U7IDAwMDAwMDAwODAyNjcxYzAgPGlwX29wdGlvbnNfYnVpbGQrMC8w PgpUcmFjZTsgMDAwMDAwMDA4MDI2NzEzMCA8aXBfZm9yd2FyZF9maW5pc2grMTAvYTA+ClRyYWNl OyAwMDAwMDAwMDgwMjVhOThjIDxuZl9ob29rX3Nsb3crMTZjLzFmOD4KVHJhY2U7IDAwMDAwMDAw ODAyOWNmNTAgPGRlYXRoX2J5X3RpbWVvdXQrM2MvYTg+ClRyYWNlOyAwMDAwMDAwMDgwMjY3MTMw IDxpcF9mb3J3YXJkX2ZpbmlzaCsxMC9hMD4KVHJhY2U7IDAwMDAwMDAwODAyOWZkMDQgPGljbXBf cGFja2V0KzY4LzljPgpUcmFjZTsgMDAwMDAwMDA4MDI2NzA2YyA8X19nbnVfY29tcGlsZWRfYysy NmMvMzIwPgpUcmFjZTsgMDAwMDAwMDA4MDI2NzEzMCA8aXBfZm9yd2FyZF9maW5pc2grMTAvYTA+ ClRyYWNlOyAwMDAwMDAwMDgwMjY1N2Y4IDxpcF9yY3ZfZmluaXNoKzEwLzJhOD4KVHJhY2U7IDAw MDAwMDAwODAyNjVhMjAgPGlwX3Jjdl9maW5pc2grMjM4LzJhOD4KVHJhY2U7IDAwMDAwMDAwODAy NWE0ODQgPG5mX2l0ZXJhdGUrOTQvMTE0PgpUcmFjZTsgMDAwMDAwMDBjMDFjZTJhOCA8RU5EX09G X0NPREUrM2ZlM2JhYTgvPz8/Pz4KVHJhY2U7IDAwMDAwMDAwODAyNjU3ZjggPGlwX3Jjdl9maW5p c2grMTAvMmE4PgpUcmFjZTsgMDAwMDAwMDA4MDI2NTdmOCA8aXBfcmN2X2ZpbmlzaCsxMC8yYTg+ ClRyYWNlOyAwMDAwMDAwMDgwMjVhOThjIDxuZl9ob29rX3Nsb3crMTZjLzFmOD4KVHJhY2U7IDAw MDAwMDAwODAyNWE5NDggPG5mX2hvb2tfc2xvdysxMjgvMWY4PgpUcmFjZTsgMDAwMDAwMDA4MDI2 NTdmOCA8aXBfcmN2X2ZpbmlzaCsxMC8yYTg+ClRyYWNlOyAwMDAwMDAwMDgwMjY1NWEwIDxpcF9y Y3YrNTEwLzU3OD4KVHJhY2U7IDAwMDAwMDAwODAyNjU3ZjggPGlwX3Jjdl9maW5pc2grMTAvMmE4 PgpUcmFjZTsgMDAwMDAwMDA4MDI1MGQ0OCA8bmV0aWZfcmVjZWl2ZV9za2IrMjcwLzJjMD4KVHJh Y2U7IDAwMDAwMDAwODAyZTAxZjQgPGF1MTAwMF9JUlErMTM0LzFhMD4KVHJhY2U7IDAwMDAwMDAw ODAxMDdjMjggPGRvX2dldHRpbWVvZmRheSs1OC8xMTQ+CgpDb2RlOyAgMDAwMDAwMDA4MDI0YjIw MCA8X19rZnJlZV9za2IrOTgvMTMwPgowMDAwMDAwMCA8X1BDPjoKQ29kZTsgIDAwMDAwMDAwODAy NGIyMDAgPF9fa2ZyZWVfc2tiKzk4LzEzMD4KICAgMDogICA4ZTA2MDA5YyAgbHcgICAgICBhMiwx NTYoczApCkNvZGU7ICAwMDAwMDAwMDgwMjRiMjA0IDxfX2tmcmVlX3NrYis5Yy8xMzA+CiAgIDQ6 ICAgMTBjMDAwMGUgIGJlcXogICAgYTIsNDAgPF9QQysweDQwPgpDb2RlOyAgMDAwMDAwMDA4MDI0 YjIwOCA8X19rZnJlZV9za2IrYTAvMTMwPgogICA4OiAgIDI0MDMwMDAxICBsaSAgICAgIHYxLDEK Q29kZTsgIDAwMDAwMDAwODAyNGIyMGMgPF9fa2ZyZWVfc2tiK2E0LzEzMD4gICA8PT09PT0KICAg YzogICA4Y2MyMDAwMCAgbHcgICAgICB2MCwwKGEyKSAgIDw9PT09PQpDb2RlOyAgMDAwMDAwMDA4 MDI0YjIxMCA8X19rZnJlZV9za2IrYTgvMTMwPgogIDEwOiAgIGMwNDUwMDAwICBsbCAgICAgIGEx LDAodjApCkNvZGU7ICAwMDAwMDAwMDgwMjRiMjE0IDxfX2tmcmVlX3NrYithYy8xMzA+CiAgMTQ6 ICAgMDBhMzIwMjMgIHN1YnUgICAgYTAsYTEsdjEKQ29kZTsgIDAwMDAwMDAwODAyNGIyMTggPF9f a2ZyZWVfc2tiK2IwLzEzMD4KICAxODogICBlMDQ0MDAwMCAgc2MgICAgICBhMCwwKHYwKQpDb2Rl OyAgMDAwMDAwMDA4MDI0YjIxYyA8X19rZnJlZV9za2IrYjQvMTMwPgogIDFjOiAgIDEwODBmZmZj ICBiZXF6ICAgIGEwLDEwIDxfUEMrMHgxMD4KQ29kZTsgIDAwMDAwMDAwODAyNGIyMjAgPF9fa2Zy ZWVfc2tiK2I4LzEzMD4KICAyMDogICAwMGEzMjAyMyAgc3VidSAgICBhMCxhMSx2MQoKS2VybmVs IHBhbmljOiBBaWVlLCBraWxsaW5nIGludGVycnVwdCBoYW5kbGVyIQoKMSB3YXJuaW5nIGlzc3Vl ZC4gIFJlc3VsdHMgbWF5IG5vdCBiZSByZWxpYWJsZS4K ------_=_NextPart_001_01C5669D.E951F95B Content-Type: application/octet-stream; name="recent.cap_recv.oops" Content-Transfer-Encoding: base64 Content-Description: recent.cap_recv.oops Content-Disposition: attachment; filename="recent.cap_recv.oops" a3N5bW9vcHMgMi40Ljkgb24gaTY4NiAyLjQuMjItMS4yMTE1Lm5wdGwuICBPcHRpb25zIHVzZWQK ICAgICAtdiAvaG9tZS9hbWQvcHJvamVjdC9hbWQva2VybmVsL3ZtbGludXggKGRlZmF1bHQpCiAg ICAgLUsgKHNwZWNpZmllZCkKICAgICAtbCAvcHJvYy9tb2R1bGVzIChkZWZhdWx0KQogICAgIC1v IC9ob21lL2FtZC9wcm9qZWN0L2FtZC9maWxlc3lzdGVtL3Vzci9saWIvbW9kdWxlcy8gKGRlZmF1 bHQpCiAgICAgLW0gL2hvbWUvYW1kL3Byb2plY3QvYW1kL2tlcm5lbC9TeXN0ZW0ubWFwIChkZWZh dWx0KQogICAgIC10IGVsZjMyLWxpdHRsZW1pcHMgLWEgbWlwczo0NjAwCgpObyBtb2R1bGVzIGlu IGtzeW1zLCBza2lwcGluZyBvYmplY3RzCk5vIGtzeW1zLCBza2lwcGluZyBsc21vZApVbmFibGUg dG8gaGFuZGxlIGtlcm5lbCBwYWdpbmcgcmVxdWVzdCBhdCB2aXJ0dWFsIGFkZHJlc3MgMDAwMDMy NjAsIGVwYyA9PSA4MDI0YjIwYywgcmEgPT0gODAyYzQ5ZjgKT29wcyBpbiBmYXVsdC5jOjpkb19w YWdlX2ZhdWx0LCBsaW5lIDIwNjoKJDAgOiAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAw MDAwMSA4Yjc4MDc2MCAwMDAwMDAwMCAwMDAwMzI2MCAwMDAwMDAwMQokOCA6IDAwMDAwMDAwIDAw MDAwMDAwIDAwMDAwMDAyIGQzZDBiMDAwIGMwMTE1MDAwIDAwMDAxNGI4IDhiOWJmZDI4IDdiN2E3 OTc4CiQxNjogOGI2YjU0NjAgOGI2YjU0NjAgZmZmZmZmZmYgOGI5MGY4MDAgODAzYTA4MDQgMDAw MDAwMDAgMDAwMDAwMDIgOGI5YmZkZDgKJDI0OiAwMDAwMDAwMCAyYWNhZDU1MCAgICAgICAgICAg ICAgICAgICA4YjliZTAwMCA4YjliZmE5OCAwMDAwNDc5ZCA4MDJjNDlmOApIaSA6IDAwMDAyMzYx CkxvIDogNzY1MGYxMDgKZXBjICAgOiA4MDI0YjIwYyAgICBOb3QgdGFpbnRlZApTdGF0dXM6IDEw MDBmYzAzCkNhdXNlIDogMDA4MDAwMDgKUHJvY2VzcyB2b3Nsb2cgKHBpZDogMTM0LCBzdGFja3Bh Z2U9OGI5YmUwMDApClN0YWNrOiAgICAwMDAwMDAwMCA4YjkwZjgwMCA4MDNhMDgwNCAwMDAwMDAw MCA4YjZiNTQ2MCA4MDJjNDlmOCA4MDEwN2MyOAogMDAwMDAwMDAgMDAwMDAwMDAgMDAwMDAwMDAg OGI2YjU0NjAgODEyNGZjNjggODEyYmVkMDAgODAzYTA4MDAgMDAwMDAwMDQKIDgwMjUwMDg4IDAw MDAwMDAwIDAwMDAwMDAwIDgwMjlkMzgwIDAwMDAwMDAwIDgxMmI2NTYwIDgwM2EwODAwIDgxMmJl ZDAwCiA4MDNhMDgwMCAwMDAwMDAwMCA4MDI1YzNlMCA4MDI2YTkwYyAwMDAwMDAwMyAwMDAwMDAw MiA4MDI5YzNhYyA4MDM0ZDdlOAogODAzYTA4MDAgMDAwMDAwMDAgODAyNTAzN2MgODAyOWMzZWMg MDAwMDAwMDAgOGI3ODA3NjAgODEyYmVkMDAgMDAwMDAwMGUKIDgxMmJlZDAwIC4uLgpDYWxsIFRy YWNlOiAgIFs8ODAyYzQ5Zjg+XSBbPDgwMTA3YzI4Pl0gWzw4MDI1MDA4OD5dIFs8ODAyOWQzODA+ XSBbPDgwMjVjM2UwPl0KIFs8ODAyNmE5MGM+XSBbPDgwMjljM2FjPl0gWzw4MDI1MDM3Yz5dIFs8 ODAyOWMzZWM+XSBbPDgwMjU3M2E4Pl0gWzw4MDI1YTQ4ND5dCiBbPDgwMjZhOTBjPl0gWzw4MDI2 YTllOD5dIFs8ODAyNmE5MGM+XSBbPDgwMjVhOThjPl0gWzw4MDI1YTk0OD5dIFs8ODAyNmE5MGM+ XQogWzw4MDJhM2Q5OD5dIFs8ODAyNjcxMzA+XSBbPDgwMjZhOGQ0Pl0gWzw4MDI2YTkwYz5dIFs8 ODAyNjcxYzA+XSBbPDgwMjY3MTMwPl0KIFs8ODAyNWE5OGM+XSBbPDgwMjY3MTMwPl0gWzw4MDI5 ZmQzND5dIFs8ODAyNjcwNmM+XSBbPDgwMjY3MTMwPl0gWzw4MDI2NTdmOD5dCiBbPDgwMjY1YTIw Pl0gWzw4MDI1YTQ4ND5dIFs8YzAxY2UyYTg+XSBbPDgwMjY1N2Y4Pl0gWzw4MDI2NTdmOD5dIFs8 ODAyNWE5OGM+XQogWzw4MDI1YTk0OD5dIFs8ODAyNjU3Zjg+XSBbPDgwMjY1NWEwPl0gWzw4MDI2 NTdmOD5dIFs8ODAxMDEzM2M+XSAuLi4KV2FybmluZyAoT29wc190cmFjZV9saW5lKTogZ2FyYmFn ZSAnLi4uJyBhdCBlbmQgb2YgdHJhY2UgbGluZSBpZ25vcmVkCkNvZGU6IDhlMDYwMDljICAxMGMw MDAwZSAgMjQwMzAwMDEgPDhjYzIwMDAwPiBjMDQ1MDAwMCAgMDBhMzIwMjMgIGUwNDQwMDAwICAx MDgwZmZmYyAgMDBhMzIwMjMKCgo+PlJBOyAgMDAwMDAwMDA4MDJjNDlmOCA8cGFja2V0X3Jjdl9z cGt0KzI5Yy8yYjA+Cj4+JDMxOyAwMDAwMDAwMDgwMmM0OWY4IDxwYWNrZXRfcmN2X3Nwa3QrMjlj LzJiMD4KCj4+UEM7ICAwMDAwMDAwMDgwMjRiMjBjIDxfX2tmcmVlX3NrYithNC8xMzA+ICAgPD09 PT09CgpUcmFjZTsgMDAwMDAwMDA4MDJjNDlmOCA8cGFja2V0X3Jjdl9zcGt0KzI5Yy8yYjA+ClRy YWNlOyAwMDAwMDAwMDgwMTA3YzI4IDxkb19nZXR0aW1lb2ZkYXkrNTgvMTE0PgpUcmFjZTsgMDAw MDAwMDA4MDI1MDA4OCA8ZGV2X3F1ZXVlX3htaXRfbml0K2JjLzExMD4KVHJhY2U7IDAwMDAwMDAw ODAyOWQzODAgPF9faXBfY29ubnRyYWNrX2NvbmZpcm0rMjM4LzJjOD4KVHJhY2U7IDAwMDAwMDAw ODAyNWMzZTAgPF9fZ251X2NvbXBpbGVkX2MrNzAvMTRjPgpUcmFjZTsgMDAwMDAwMDA4MDI2YTkw YyA8aXBfZmluaXNoX291dHB1dDIrMTAvMTUwPgpUcmFjZTsgMDAwMDAwMDA4MDI5YzNhYyA8aXBf Y29uZmlybSs0OC80Yz4KVHJhY2U7IDAwMDAwMDAwODAyNTAzN2MgPGRldl9xdWV1ZV94bWl0KzFm OC8zYjg+ClRyYWNlOyAwMDAwMDAwMDgwMjljM2VjIDxpcF9yZWZyYWcrM2MvNzQ+ClRyYWNlOyAw MDAwMDAwMDgwMjU3M2E4IDxuZWlnaF9yZXNvbHZlX291dHB1dCsxZmMvMjljPgpUcmFjZTsgMDAw MDAwMDA4MDI1YTQ4NCA8bmZfaXRlcmF0ZSs5NC8xMTQ+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBj IDxpcF9maW5pc2hfb3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOWU4IDxpcF9m aW5pc2hfb3V0cHV0MitlYy8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBjIDxpcF9maW5pc2hf b3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjVhOThjIDxuZl9ob29rX3Nsb3crMTZj LzFmOD4KVHJhY2U7IDAwMDAwMDAwODAyNWE5NDggPG5mX2hvb2tfc2xvdysxMjgvMWY4PgpUcmFj ZTsgMDAwMDAwMDA4MDI2YTkwYyA8aXBfZmluaXNoX291dHB1dDIrMTAvMTUwPgpUcmFjZTsgMDAw MDAwMDA4MDJhM2Q5OCA8aXB0X2xvY2FsX291dF9ob29rKzQvOGM+ClRyYWNlOyAwMDAwMDAwMDgw MjY3MTMwIDxpcF9mb3J3YXJkX2ZpbmlzaCsxMC9hMD4KVHJhY2U7IDAwMDAwMDAwODAyNmE4ZDQg PGlwX2ZpbmlzaF9vdXRwdXQrMWEwLzFhND4KVHJhY2U7IDAwMDAwMDAwODAyNmE5MGMgPGlwX2Zp bmlzaF9vdXRwdXQyKzEwLzE1MD4KVHJhY2U7IDAwMDAwMDAwODAyNjcxYzAgPGlwX29wdGlvbnNf YnVpbGQrMC8wPgpUcmFjZTsgMDAwMDAwMDA4MDI2NzEzMCA8aXBfZm9yd2FyZF9maW5pc2grMTAv YTA+ClRyYWNlOyAwMDAwMDAwMDgwMjVhOThjIDxuZl9ob29rX3Nsb3crMTZjLzFmOD4KVHJhY2U7 IDAwMDAwMDAwODAyNjcxMzAgPGlwX2ZvcndhcmRfZmluaXNoKzEwL2EwPgpUcmFjZTsgMDAwMDAw MDA4MDI5ZmQzNCA8aWNtcF9wYWNrZXQrOTgvOWM+ClRyYWNlOyAwMDAwMDAwMDgwMjY3MDZjIDxf X2dudV9jb21waWxlZF9jKzI2Yy8zMjA+ClRyYWNlOyAwMDAwMDAwMDgwMjY3MTMwIDxpcF9mb3J3 YXJkX2ZpbmlzaCsxMC9hMD4KVHJhY2U7IDAwMDAwMDAwODAyNjU3ZjggPGlwX3Jjdl9maW5pc2gr MTAvMmE4PgpUcmFjZTsgMDAwMDAwMDA4MDI2NWEyMCA8aXBfcmN2X2ZpbmlzaCsyMzgvMmE4PgpU cmFjZTsgMDAwMDAwMDA4MDI1YTQ4NCA8bmZfaXRlcmF0ZSs5NC8xMTQ+ClRyYWNlOyAwMDAwMDAw MGMwMWNlMmE4IDxFTkRfT0ZfQ09ERSszZmUzYmFhOC8/Pz8/PgpUcmFjZTsgMDAwMDAwMDA4MDI2 NTdmOCA8aXBfcmN2X2ZpbmlzaCsxMC8yYTg+ClRyYWNlOyAwMDAwMDAwMDgwMjY1N2Y4IDxpcF9y Y3ZfZmluaXNoKzEwLzJhOD4KVHJhY2U7IDAwMDAwMDAwODAyNWE5OGMgPG5mX2hvb2tfc2xvdysx NmMvMWY4PgpUcmFjZTsgMDAwMDAwMDA4MDI1YTk0OCA8bmZfaG9va19zbG93KzEyOC8xZjg+ClRy YWNlOyAwMDAwMDAwMDgwMjY1N2Y4IDxpcF9yY3ZfZmluaXNoKzEwLzJhOD4KVHJhY2U7IDAwMDAw MDAwODAyNjU1YTAgPGlwX3Jjdis1MTAvNTc4PgpUcmFjZTsgMDAwMDAwMDA4MDI2NTdmOCA8aXBf cmN2X2ZpbmlzaCsxMC8yYTg+ClRyYWNlOyAwMDAwMDAwMDgwMTAxMzNjIDxkb19JUlErZjQvMTE4 PgoKQ29kZTsgIDAwMDAwMDAwODAyNGIyMDAgPF9fa2ZyZWVfc2tiKzk4LzEzMD4KMDAwMDAwMDAg PF9QQz46CkNvZGU7ICAwMDAwMDAwMDgwMjRiMjAwIDxfX2tmcmVlX3NrYis5OC8xMzA+CiAgIDA6 ICAgOGUwNjAwOWMgIGx3ICAgICAgYTIsMTU2KHMwKQpDb2RlOyAgMDAwMDAwMDA4MDI0YjIwNCA8 X19rZnJlZV9za2IrOWMvMTMwPgogICA0OiAgIDEwYzAwMDBlICBiZXF6ICAgIGEyLDQwIDxfUEMr MHg0MD4KQ29kZTsgIDAwMDAwMDAwODAyNGIyMDggPF9fa2ZyZWVfc2tiK2EwLzEzMD4KICAgODog ICAyNDAzMDAwMSAgbGkgICAgICB2MSwxCkNvZGU7ICAwMDAwMDAwMDgwMjRiMjBjIDxfX2tmcmVl X3NrYithNC8xMzA+ICAgPD09PT09CiAgIGM6ICAgOGNjMjAwMDAgIGx3ICAgICAgdjAsMChhMikg ICA8PT09PT0KQ29kZTsgIDAwMDAwMDAwODAyNGIyMTAgPF9fa2ZyZWVfc2tiK2E4LzEzMD4KICAx MDogICBjMDQ1MDAwMCAgbGwgICAgICBhMSwwKHYwKQpDb2RlOyAgMDAwMDAwMDA4MDI0YjIxNCA8 X19rZnJlZV9za2IrYWMvMTMwPgogIDE0OiAgIDAwYTMyMDIzICBzdWJ1ICAgIGEwLGExLHYxCkNv ZGU7ICAwMDAwMDAwMDgwMjRiMjE4IDxfX2tmcmVlX3NrYitiMC8xMzA+CiAgMTg6ICAgZTA0NDAw MDAgIHNjICAgICAgYTAsMCh2MCkKQ29kZTsgIDAwMDAwMDAwODAyNGIyMWMgPF9fa2ZyZWVfc2ti K2I0LzEzMD4KICAxYzogICAxMDgwZmZmYyAgYmVxeiAgICBhMCwxMCA8X1BDKzB4MTA+CkNvZGU7 ICAwMDAwMDAwMDgwMjRiMjIwIDxfX2tmcmVlX3NrYitiOC8xMzA+CiAgMjA6ICAgMDBhMzIwMjMg IHN1YnUgICAgYTAsYTEsdjEKCktlcm5lbCBwYW5pYzogQWllZSwga2lsbGluZyBpbnRlcnJ1cHQg aGFuZGxlciEKCjEgd2FybmluZyBpc3N1ZWQuICBSZXN1bHRzIG1heSBub3QgYmUgcmVsaWFibGUu Cg== ------_=_NextPart_001_01C5669D.E951F95B Content-Type: application/octet-stream; name="recent.cap_send.oops" Content-Transfer-Encoding: base64 Content-Description: recent.cap_send.oops Content-Disposition: attachment; filename="recent.cap_send.oops" a3N5bW9vcHMgMi40Ljkgb24gaTY4NiAyLjQuMjItMS4yMTE1Lm5wdGwuICBPcHRpb25zIHVzZWQK ICAgICAtdiAvaG9tZS9hbWQvcHJvamVjdC9hbWQva2VybmVsL3ZtbGludXggKGRlZmF1bHQpCiAg ICAgLUsgKHNwZWNpZmllZCkKICAgICAtbCAvcHJvYy9tb2R1bGVzIChkZWZhdWx0KQogICAgIC1v IC9ob21lL2FtZC9wcm9qZWN0L2FtZC9maWxlc3lzdGVtL3Vzci9saWIvbW9kdWxlcy8gKGRlZmF1 bHQpCiAgICAgLW0gL2hvbWUvYW1kL3Byb2plY3QvYW1kL2tlcm5lbC9TeXN0ZW0ubWFwIChkZWZh dWx0KQogICAgIC10IGVsZjMyLWxpdHRsZW1pcHMgLWEgbWlwczo0NjAwCgpObyBtb2R1bGVzIGlu IGtzeW1zLCBza2lwcGluZyBvYmplY3RzCk5vIGtzeW1zLCBza2lwcGluZyBsc21vZApVbmFibGUg dG8gaGFuZGxlIGtlcm5lbCBwYWdpbmcgcmVxdWVzdCBhdCB2aXJ0dWFsIGFkZHJlc3MgMDAwMDMy ZDQsIGVwYyA9PSA4MDI0YWY2YywgcmEgPT0gODAyNGIwOTQKT29wcyBpbiBmYXVsdC5jOjpkb19w YWdlX2ZhdWx0LCBsaW5lIDIwNjoKJDAgOiAwMDAwMDAwMCAxMDAwZmMwMCA4YWM4MWUwMCAwMDAw MzI2MCAwMDAwMzI2MCAwMDAwMDAwMCAwMDAwMDAwMCA4YjM4YjM0MAokOCA6IDAwMDAwMDMwIDgw MmRhMWEwIDAwMDAwMDEwIGJmYmViZGJjIGEzYTJhMWEwIDAwMDAwMDAwIDhhYjc5ZGU4IGE3YTZh NWE0CiQxNjogMDAwMDMyNjAgMDAwMDAwMDEgOGFlYTgyNjAgYzAxNzI5NGMgMDAwMDAwMGYgODAy NGIxNzggYzAxNjdhYjggYzAxNzI5NTAKJDI0OiAwMDAwMDAxMCAwMDQwZTBmMCAgICAgICAgICAg ICAgICAgICA4YWI3ODAwMCA4YWI3OWE2OCBjMDE3MjdkOCA4MDI0YjA5NApIaSA6IDAwMDAwMDAw CkxvIDogMDAwMDAwMGIKZXBjICAgOiA4MDI0YWY2YyAgICBOb3QgdGFpbnRlZApTdGF0dXM6IDEw MDBmYzAzCkNhdXNlIDogMDA4MDAwMDgKUHJvY2VzcyBtZG0td2lwcm8tbm8tZGUgKHBpZDogNDEw LCBzdGFja3BhZ2U9OGFiNzgwMDApClN0YWNrOiAgICA4YWI3OWFkOCA4MDM2OWJmMCAwMDAwMDAw NCA4MDI1YTQ4NCA4YjZiNTQ2MCA4YjZiNTQ2MCA4MDI0YjA5NAogZmZmYmM0NzMgODAyNmE5MGMg ODAzYTA0MDAgODEyYmVhODAgODAzYTA0MDAgOGI2YjU0NjAgOGIzOGIzNjAgODAyNGIwYzQKIDAw MDAwMDAwIDAwMDAwMDAyIDAwMDA0MGQyIDgwMmRhMGUwIDhhYjc5YzU4IDhiNmI1NDYwIDgwMjRi Mjk4IDgxMmI2NDYwCiA4MDNhMDQwMCA4MDNhMDQwMCA4YWI3OWFkOCA4YjZiNTQ2MCBjMDE3MWY1 OCA4MTJiZWJjMCA4MDM5MDZhOCAwMDAwMDAyMAogODAyNGFlMzggOGI2YjU3ODAgOGFhYzQwZjYg OGI0MjhkNjAgMDAwMDAwMDAgODEyYmViYzAgOGFhYzAwMTAgOGFlYTgyNjAKIDAwMDA0MGQyIC4u LgpDYWxsIFRyYWNlOiAgIFs8ODAyNWE0ODQ+XSBbPDgwMjRiMDk0Pl0gWzw4MDI2YTkwYz5dIFs8 ODAyNGIwYzQ+XSBbPDgwMmRhMGUwPl0KIFs8ODAyNGIyOTg+XSBbPGMwMTcxZjU4Pl0gWzw4MDI0 YWUzOD5dIFs8ODAyZDlkODA+XSBbPGMwMTcxZTEwPl0gWzw4MDI2YTkwYz5dCiBbPGMwMTcyNDE0 Pl0gWzxjMDE3NDBlOD5dIFs8ODAyNWE0ODQ+XSBbPDgwMjZhOTBjPl0gWzw4MDI2YTkwYz5dIFs8 ODAyNWE5NDg+XQogWzw4MDJkYTBlMD5dIFs8ODAyNmE5MGM+XSBbPGMwMTc1MWRjPl0gWzw4MDI2 YThkND5dIFs8ODAyNmE5MGM+XSBbPDgwMjZhMzBjPl0KIFs8ODAyNmExODQ+XSBbPDgwMjY3MTMw Pl0gWzw4MDI2NzFiMD5dIFs8ODAyNmE3NDQ+XSBbPDgwMjVhOThjPl0gWzw4MDI2NzEzMD5dCiBb PDgwMjY3MDZjPl0gWzw4MDI2NzEzMD5dIFs8ODAyNjU3Zjg+XSBbPDgwMjY1YTIwPl0gWzw4MDI1 YTQ4ND5dIFs8YzAxY2UyYTg+XQogWzw4MDI2NTdmOD5dIFs8ODAyNjU3Zjg+XSBbPDgwMjVhOThj Pl0gWzw4MDI1YTk0OD5dIFs8ODAyNjU3Zjg+XSAuLi4KV2FybmluZyAoT29wc190cmFjZV9saW5l KTogZ2FyYmFnZSAnLi4uJyBhdCBlbmQgb2YgdHJhY2UgbGluZSBpZ25vcmVkCkNvZGU6IDhjNTAw MDA4ICBhYzQwMDAwOCAgMDIwMDIwMjEgPDhjODIwMDc0PiAxMDUxMDAwOSAgOGUxMDAwMDAgIGMw ODMwMDc0ICAwMDcxMTAyMyAgZTA4MjAwNzQKCgo+PlJBOyAgMDAwMDAwMDA4MDI0YjA5NCA8c2ti X3JlbGVhc2VfZGF0YStiMC9iYz4KPj4kOTsgMDAwMDAwMDA4MDJkYTFhMCA8bWVtc2V0X3BhcnRp YWwrMjQvNmM+Cj4+JDIxOyAwMDAwMDAwMDgwMjRiMTc4IDxfX2tmcmVlX3NrYisxMC8xMzA+Cj4+ JDMxOyAwMDAwMDAwMDgwMjRiMDk0IDxza2JfcmVsZWFzZV9kYXRhK2IwL2JjPgoKPj5QQzsgIDAw MDAwMDAwODAyNGFmNmMgPHNrYl9kcm9wX2ZyYWdsaXN0KzM0Lzc0PiAgIDw9PT09PQoKVHJhY2U7 IDAwMDAwMDAwODAyNWE0ODQgPG5mX2l0ZXJhdGUrOTQvMTE0PgpUcmFjZTsgMDAwMDAwMDA4MDI0 YjA5NCA8c2tiX3JlbGVhc2VfZGF0YStiMC9iYz4KVHJhY2U7IDAwMDAwMDAwODAyNmE5MGMgPGlw X2ZpbmlzaF9vdXRwdXQyKzEwLzE1MD4KVHJhY2U7IDAwMDAwMDAwODAyNGIwYzQgPGtmcmVlX3Nr Ym1lbSsyNC9jOD4KVHJhY2U7IDAwMDAwMDAwODAyZGEwZTAgPG1lbXNldCswLzFjPgpUcmFjZTsg MDAwMDAwMDA4MDI0YjI5OCA8c2tiX2Nsb25lKzAvMjUwPgpUcmFjZTsgMDAwMDAwMDBjMDE3MWY1 OCA8RU5EX09GX0NPREUrM2ZkZGY3NTgvPz8/Pz4KVHJhY2U7IDAwMDAwMDAwODAyNGFlMzggPGFs bG9jX3NrYisxNjAvMjYwPgpUcmFjZTsgMDAwMDAwMDA4MDJkOWQ4MCA8bWVtY3B5KzAvND4KVHJh Y2U7IDAwMDAwMDAwYzAxNzFlMTAgPEVORF9PRl9DT0RFKzNmZGRmNjEwLz8/Pz8+ClRyYWNlOyAw MDAwMDAwMDgwMjZhOTBjIDxpcF9maW5pc2hfb3V0cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAw MGMwMTcyNDE0IDxFTkRfT0ZfQ09ERSszZmRkZmMxNC8/Pz8/PgpUcmFjZTsgMDAwMDAwMDBjMDE3 NDBlOCA8RU5EX09GX0NPREUrM2ZkZTE4ZTgvPz8/Pz4KVHJhY2U7IDAwMDAwMDAwODAyNWE0ODQg PG5mX2l0ZXJhdGUrOTQvMTE0PgpUcmFjZTsgMDAwMDAwMDA4MDI2YTkwYyA8aXBfZmluaXNoX291 dHB1dDIrMTAvMTUwPgpUcmFjZTsgMDAwMDAwMDA4MDI2YTkwYyA8aXBfZmluaXNoX291dHB1dDIr MTAvMTUwPgpUcmFjZTsgMDAwMDAwMDA4MDI1YTk0OCA8bmZfaG9va19zbG93KzEyOC8xZjg+ClRy YWNlOyAwMDAwMDAwMDgwMmRhMGUwIDxtZW1zZXQrMC8xYz4KVHJhY2U7IDAwMDAwMDAwODAyNmE5 MGMgPGlwX2ZpbmlzaF9vdXRwdXQyKzEwLzE1MD4KVHJhY2U7IDAwMDAwMDAwYzAxNzUxZGMgPEVO RF9PRl9DT0RFKzNmZGUyOWRjLz8/Pz8+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOGQ0IDxpcF9maW5p c2hfb3V0cHV0KzFhMC8xYTQ+ClRyYWNlOyAwMDAwMDAwMDgwMjZhOTBjIDxpcF9maW5pc2hfb3V0 cHV0MisxMC8xNTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhMzBjIDxpcF9mcmFnbWVudCszYzgvNTAw PgpUcmFjZTsgMDAwMDAwMDA4MDI2YTE4NCA8aXBfZnJhZ21lbnQrMjQwLzUwMD4KVHJhY2U7IDAw MDAwMDAwODAyNjcxMzAgPGlwX2ZvcndhcmRfZmluaXNoKzEwL2EwPgpUcmFjZTsgMDAwMDAwMDA4 MDI2NzFiMCA8aXBfZm9yd2FyZF9maW5pc2grOTAvYTA+ClRyYWNlOyAwMDAwMDAwMDgwMjZhNzQ0 IDxpcF9maW5pc2hfb3V0cHV0KzEwLzFhND4KVHJhY2U7IDAwMDAwMDAwODAyNWE5OGMgPG5mX2hv b2tfc2xvdysxNmMvMWY4PgpUcmFjZTsgMDAwMDAwMDA4MDI2NzEzMCA8aXBfZm9yd2FyZF9maW5p c2grMTAvYTA+ClRyYWNlOyAwMDAwMDAwMDgwMjY3MDZjIDxfX2dudV9jb21waWxlZF9jKzI2Yy8z MjA+ClRyYWNlOyAwMDAwMDAwMDgwMjY3MTMwIDxpcF9mb3J3YXJkX2ZpbmlzaCsxMC9hMD4KVHJh Y2U7IDAwMDAwMDAwODAyNjU3ZjggPGlwX3Jjdl9maW5pc2grMTAvMmE4PgpUcmFjZTsgMDAwMDAw MDA4MDI2NWEyMCA8aXBfcmN2X2ZpbmlzaCsyMzgvMmE4PgpUcmFjZTsgMDAwMDAwMDA4MDI1YTQ4 NCA8bmZfaXRlcmF0ZSs5NC8xMTQ+ClRyYWNlOyAwMDAwMDAwMGMwMWNlMmE4IDxFTkRfT0ZfQ09E RSszZmUzYmFhOC8/Pz8/PgpUcmFjZTsgMDAwMDAwMDA4MDI2NTdmOCA8aXBfcmN2X2ZpbmlzaCsx MC8yYTg+ClRyYWNlOyAwMDAwMDAwMDgwMjY1N2Y4IDxpcF9yY3ZfZmluaXNoKzEwLzJhOD4KVHJh Y2U7IDAwMDAwMDAwODAyNWE5OGMgPG5mX2hvb2tfc2xvdysxNmMvMWY4PgpUcmFjZTsgMDAwMDAw MDA4MDI1YTk0OCA8bmZfaG9va19zbG93KzEyOC8xZjg+ClRyYWNlOyAwMDAwMDAwMDgwMjY1N2Y4 IDxpcF9yY3ZfZmluaXNoKzEwLzJhOD4KCkNvZGU7ICAwMDAwMDAwMDgwMjRhZjYwIDxza2JfZHJv cF9mcmFnbGlzdCsyOC83ND4KMDAwMDAwMDAgPF9QQz46CkNvZGU7ICAwMDAwMDAwMDgwMjRhZjYw IDxza2JfZHJvcF9mcmFnbGlzdCsyOC83ND4KICAgMDogICA4YzUwMDAwOCAgbHcgICAgICBzMCw4 KHYwKQpDb2RlOyAgMDAwMDAwMDA4MDI0YWY2NCA8c2tiX2Ryb3BfZnJhZ2xpc3QrMmMvNzQ+CiAg IDQ6ICAgYWM0MDAwMDggIHN3ICAgICAgemVybyw4KHYwKQpDb2RlOyAgMDAwMDAwMDA4MDI0YWY2 OCA8c2tiX2Ryb3BfZnJhZ2xpc3QrMzAvNzQ+CiAgIDg6ICAgMDIwMDIwMjEgIG1vdmUgICAgYTAs czAKQ29kZTsgIDAwMDAwMDAwODAyNGFmNmMgPHNrYl9kcm9wX2ZyYWdsaXN0KzM0Lzc0PiAgIDw9 PT09PQogICBjOiAgIDhjODIwMDc0ICBsdyAgICAgIHYwLDExNihhMCkgICA8PT09PT0KQ29kZTsg IDAwMDAwMDAwODAyNGFmNzAgPHNrYl9kcm9wX2ZyYWdsaXN0KzM4Lzc0PgogIDEwOiAgIDEwNTEw MDA5ICBiZXEgICAgIHYwLHMxLDM4IDxfUEMrMHgzOD4KQ29kZTsgIDAwMDAwMDAwODAyNGFmNzQg PHNrYl9kcm9wX2ZyYWdsaXN0KzNjLzc0PgogIDE0OiAgIDhlMTAwMDAwICBsdyAgICAgIHMwLDAo czApCkNvZGU7ICAwMDAwMDAwMDgwMjRhZjc4IDxza2JfZHJvcF9mcmFnbGlzdCs0MC83ND4KICAx ODogICBjMDgzMDA3NCAgbGwgICAgICB2MSwxMTYoYTApCkNvZGU7ICAwMDAwMDAwMDgwMjRhZjdj IDxza2JfZHJvcF9mcmFnbGlzdCs0NC83ND4KICAxYzogICAwMDcxMTAyMyAgc3VidSAgICB2MCx2 MSxzMQpDb2RlOyAgMDAwMDAwMDA4MDI0YWY4MCA8c2tiX2Ryb3BfZnJhZ2xpc3QrNDgvNzQ+CiAg MjA6ICAgZTA4MjAwNzQgIHNjICAgICAgdjAsMTE2KGEwKQoKS2VybmVsIHBhbmljOiBBaWVlLCBr aWxsaW5nIGludGVycnVwdCBoYW5kbGVyIQoKMSB3YXJuaW5nIGlzc3VlZC4gIFJlc3VsdHMgbWF5 IG5vdCBiZSByZWxpYWJsZS4K ------_=_NextPart_001_01C5669D.E951F95B-- From jaegert@us.ibm.com Wed Jun 1 07:00:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 07:00:54 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.143]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51E0bXq012794 for ; Wed, 1 Jun 2005 07:00:44 -0700 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e3.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j51Dxf89024306 for ; Wed, 1 Jun 2005 09:59:41 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j51DxftR261752 for ; Wed, 1 Jun 2005 09:59:41 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j51DxfQI017299 for ; Wed, 1 Jun 2005 09:59:41 -0400 Received: from d01ml605.pok.ibm.com (d01ml605.pok.ibm.com [9.56.227.91]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j51DxfnV017282; Wed, 1 Jun 2005 09:59:41 -0400 In-Reply-To: To: James Morris Cc: chrisw@osdl.org, latten@austin.ibm.com, netdev@oss.sgi.com, sds@tycho.nsa.gov, serue@us.ibm.com MIME-Version: 1.0 Subject: Re: [PATCH 2/2] Resend: LSM-IPSec Networking Hooks X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 From: Trent Jaeger Message-ID: Date: Wed, 1 Jun 2005 09:59:40 -0400 X-MIMETrack: Serialize by Router on D01ML605/01/M/IBM(Build V70_M4_01112005 Beta 3|January 11, 2005) at 06/01/2005 09:59:40, Serialize complete at 06/01/2005 09:59:40 Content-Type: multipart/alternative; boundary="=_alternative 004CDF1985257013_=" X-archive-position: 1943 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jaegert@us.ibm.com Precedence: bulk X-list: netdev This is a multipart message in MIME format. --=_alternative 004CDF1985257013_= Content-Type: text/plain; charset="US-ASCII" OK. Thanks for the detailed comments. I will review and get back with comments and mods (probably next week). Regards, Trent. ------------------------------------------------------------ Trent Jaeger IBM T.J. Watson Research Center 19 Skyline Drive, Hawthorne, NY 10532 (914) 784-7225, FAX (914) 784-7225 James Morris 05/31/2005 12:15 AM To: Trent Jaeger/Watson/IBM@IBMUS cc: netdev@oss.sgi.com, , serue@us.ltcfwd.linux.ibm.com, , Subject: Re: [PATCH 2/2] Resend: LSM-IPSec Networking Hooks On Tue, 17 May 2005, jaegert wrote: Ok, my last review in this iteration. > @@ -984,6 +1029,13 @@ static struct xfrm_state * pfkey_msg2xfr > x->lft.soft_add_expires_seconds = lifetime->sadb_lifetime_addtime; > x->lft.soft_use_expires_seconds = lifetime->sadb_lifetime_usetime; > } > + > + sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1]; > + if (sec_ctx != NULL) { > + if (security_xfrm_state_alloc(x, sec_ctx)) > + goto out; You should propagate the return value of security_xfrm_state_alloc() here by assigning it to err. > -selinux-y := avc.o hooks.o selinuxfs.o netlink.o nlmsgtab.o > +selinux-y := avc.o hooks.o selinuxfs.o netlink.o nlmsgtab.o nethooks.o What about making nethooks.o (or whatever it'll be called) conditionally compiled via CONFIG_SECURITY_NETWORK_XFRM ? (see netif.o) > + * ISSUES: > + * 1. Caching packets, so they are not dropped during negotiation This needs to be done for IPsec in general, not sure what the status is. > + * 2. Emulating a reasonable SO_PEERSEC across machines This may not be too difficult if we limit this to connected TCP sockets. > + * 3. Testing sk_policy setting with context What does this mean? Overall, this looks like a really good approach to the problem. - James -- James Morris --=_alternative 004CDF1985257013_= Content-Type: text/html; charset="US-ASCII"
OK.

Thanks for the detailed comments.  

I will review and get back with comments and mods (probably next week).

Regards,
Trent.
------------------------------------------------------------
Trent Jaeger
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532
(914) 784-7225, FAX (914) 784-7225



James Morris <jmorris@redhat.com>

05/31/2005 12:15 AM

       
        To:        Trent Jaeger/Watson/IBM@IBMUS
        cc:        netdev@oss.sgi.com, <chrisw@osdl.org>, serue@us.ltcfwd.linux.ibm.com, <latten@austin.ibm.com>, <sds@tycho.nsa.gov>
        Subject:        Re: [PATCH 2/2] Resend: LSM-IPSec Networking Hooks



On Tue, 17 May 2005, jaegert wrote:

Ok, my last review in this iteration.

> @@ -984,6 +1029,13 @@ static struct xfrm_state * pfkey_msg2xfr
>                x->lft.soft_add_expires_seconds = lifetime->sadb_lifetime_addtime;
>                x->lft.soft_use_expires_seconds = lifetime->sadb_lifetime_usetime;
>        }
> +
> +       sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1];
> +       if (sec_ctx != NULL) {
> +               if (security_xfrm_state_alloc(x, sec_ctx))
> +                       goto out;

You should propagate the return value of security_xfrm_state_alloc() here
by assigning it to err.

> -selinux-y := avc.o hooks.o selinuxfs.o netlink.o nlmsgtab.o
> +selinux-y := avc.o hooks.o selinuxfs.o netlink.o nlmsgtab.o nethooks.o

What about making nethooks.o (or whatever it'll be called) conditionally
compiled via CONFIG_SECURITY_NETWORK_XFRM ? (see netif.o)


> + * ISSUES:
> + *   1. Caching packets, so they are not dropped during negotiation

This needs to be done for IPsec in general, not sure what the status is.

> + *   2. Emulating a reasonable SO_PEERSEC across machines

This may not be too difficult if we limit this to connected TCP sockets.

> + *   3. Testing sk_policy setting with context

What does this mean?


Overall, this looks like a really good approach to the problem.


- James
--
James Morris
<jmorris@redhat.com>



--=_alternative 004CDF1985257013_=-- From kernel@linuxace.com Wed Jun 1 10:01:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 10:01:59 -0700 (PDT) Received: from linuxace.com (adsl-67-120-171-161.dsl.lsan03.pacbell.net [67.120.171.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j51H1sXq026236 for ; Wed, 1 Jun 2005 10:01:54 -0700 Received: (qmail 20132 invoked by uid 0); 1 Jun 2005 17:00:58 -0000 Date: Wed, 1 Jun 2005 10:00:58 -0700 From: Phil Oester To: Herbert Xu Cc: netdev@oss.sgi.com, akpm@osdl.org Subject: Re: 2.6.12-rcx networking oops Message-ID: <20050601170058.GA20112@linuxace.com> References: <20050531224012.GA16789@linuxace.com> <20050601054955.GA2625@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050601054955.GA2625@gondor.apana.org.au> User-Agent: Mutt/1.4.1i X-archive-position: 1944 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kernel@linuxace.com Precedence: bulk X-list: netdev On Wed, Jun 01, 2005 at 03:49:55PM +1000, Herbert Xu wrote: > This looks like stack overflow. %esi is meant to be "res" which is > a local variable. As you can see, it's pointing below %esp and > threadinfo. Ok, so I enabled DEBUG_STACKOVERFLOW in addition to CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC, and got the below today...so maybe it is a slab issue? 0xc0238cdd is in dst_alloc (net/core/dst.c:124). 119 if (ops->gc && atomic_read(&ops->entries) > ops->gc_thresh) { 120 if (ops->gc()) 121 return NULL; 122 } 123 dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); 0xc013912b is at mm/slab.c:3077. 3072 size = kmem_cache_size(c); 3073 local_irq_restore(flags); 3074 } 3075 3076 return size; 3077 } Phil invalid operand: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 EIP: 0060:[] Not tainted VLI EFLAGS: 00016292 (2.6.12-rc5-git5) EIP is at ksize+0x7b/0x100 eax: c0238cdd ebx: f7ba9c20 ecx: f7babf78 edx: dcc59000 esi: 00000020 edi: 0000e3ba ebp: c0338d98 esp: c0338d88 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c0338000 task=c1989b00) Stack: 00000000 04000000 c02d1a00 ffffff97 c0338db0 c0238cdd c0338e58 04000000 00000000 ffffff97 c0338eb4 c0245cb7 00000002 f7b01000 c0338dec c0338df0 f7318ef8 00000000 00000000 00000001 f72dbef8 0000a704 103c243b f27ceec0 Call Trace: [] show_stack+0x7a/0x90 [] show_registers+0x14d/0x1b0 [] die+0xf9/0x180 [] do_trap+0xa0/0xb0 [] do_invalid_op+0xa9/0xc0 [] error_code+0x4f/0x54 [] dst_alloc+0x2d/0xa0 [] ip_route_input_slow+0x4a7/0x840 [] ip_route_input+0x9a/0x160 [] ip_rcv+0x3b0/0x4d0 [] netif_receive_skb+0x13a/0x1a0 [] e1000_clean_rx_irq+0x180/0x4d0 [] e1000_clean+0x40/0xe0 [] net_rx_action+0x90/0x130 [] __do_softirq+0xd4/0xf0 [] do_softirq+0x52/0x70 ======================= [] irq_exit+0x3a/0x40 [] do_IRQ+0x68/0xa0 [] common_interrupt+0x1a/0x20 [] cpu_idle+0x7b/0x80 [] start_secondary+0x73/0x90 [<00000000>] stext+0x3feffd6c/0xc [] 0xc198afb4 Code: 8d 05 0c e2 34 c0 e8 e9 25 15 00 e9 96 dd ff ff 8d 05 0c e2 34 c0 e8 a9 25 15 00 e9 00 e2 ff ff 8d 05 0c e2 34 c0 e8 c9 25 15 00 23 e2 ff ff 8d 05 0c e2 34 c0 e8 89 25 15 00 e9 84 e2 ff ff <0>Kernel panic - not syncing: Fatal exception in interrupt From mmporter@cox.net Wed Jun 1 11:26:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 11:26:39 -0700 (PDT) Received: from fed1rmmtao09.cox.net (fed1rmmtao09.cox.net [68.230.241.30]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51IQXXq032754 for ; Wed, 1 Jun 2005 11:26:34 -0700 Received: from liberty.homelinux.org ([68.2.41.86]) by fed1rmmtao09.cox.net (InterMail vM.6.01.04.00 201-2131-118-20041027) with ESMTP id <20050601182536.OPJC7275.fed1rmmtao09.cox.net@liberty.homelinux.org>; Wed, 1 Jun 2005 14:25:36 -0400 Received: (from mmporter@localhost) by liberty.homelinux.org (8.9.3/8.9.3/Debian 8.9.3-21) id LAA16886; Wed, 1 Jun 2005 11:25:34 -0700 Date: Wed, 1 Jun 2005 11:25:34 -0700 From: Matt Porter To: torvalds@osdl.org, akpm@osdl.org, jgarzik@pobox.com Cc: linux-kernel@vger.kernel.org, linuxppc-embedded@ozlabs.org, netdev@oss.sgi.com Subject: [PATCH][3/3] RapidIO support: net driver over messaging Message-ID: <20050601112534.C16559@cox.net> References: <20050601110836.A16559@cox.net> <20050601111516.B16559@cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20050601111516.B16559@cox.net>; from mporter@kernel.crashing.org on Wed, Jun 01, 2005 at 11:15:17AM -0700 X-archive-position: 1945 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mporter@kernel.crashing.org Precedence: bulk X-list: netdev Adds an "Ethernet" driver which sends Ethernet packets over the standard RapidIO messaging. This depends on the core RIO patch for mailbox/doorbell access. Signed-off-by: Matt Porter Index: drivers/net/Kconfig =================================================================== --- f0bf7810dbe8c4073832d6c3785364084e9523a7/drivers/net/Kconfig (mode:100644) +++ 4ed27b6e30a69f314a2ca131e80ac45e2111f245/drivers/net/Kconfig (mode:100644) @@ -2185,6 +2185,20 @@ tristate "iSeries Virtual Ethernet driver support" depends on NETDEVICES && PPC_ISERIES +config RIONET + tristate "RapidIO Ethernet over messaging driver support" + depends on NETDEVICES && RAPIDIO + +config RIONET_TX_SIZE + int "Number of outbound queue entries" + depends on RIONET + default "128" + +config RIONET_RX_SIZE + int "Number of inbound queue entries" + depends on RIONET + default "128" + config FDDI bool "FDDI driver support" depends on NETDEVICES && (PCI || EISA) Index: drivers/net/Makefile =================================================================== --- f0bf7810dbe8c4073832d6c3785364084e9523a7/drivers/net/Makefile (mode:100644) +++ 4ed27b6e30a69f314a2ca131e80ac45e2111f245/drivers/net/Makefile (mode:100644) @@ -58,6 +58,7 @@ obj-$(CONFIG_VIA_RHINE) += via-rhine.o obj-$(CONFIG_VIA_VELOCITY) += via-velocity.o obj-$(CONFIG_ADAPTEC_STARFIRE) += starfire.o +obj-$(CONFIG_RIONET) += rionet.o # # end link order section Index: drivers/net/rionet.c =================================================================== --- /dev/null (tree:f0bf7810dbe8c4073832d6c3785364084e9523a7) +++ 4ed27b6e30a69f314a2ca131e80ac45e2111f245/drivers/net/rionet.c (mode:100644) @@ -0,0 +1,622 @@ +/* + * rionet - Ethernet driver over RapidIO messaging services + * + * Copyright 2005 MontaVista Software, Inc. + * Matt Porter + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#define DRV_NAME "rionet" +#define DRV_VERSION "0.1" +#define DRV_AUTHOR "Matt Porter " +#define DRV_DESC "Ethernet over RapidIO" + +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); +MODULE_LICENSE("GPL"); + +#define RIONET_DEFAULT_MSGLEVEL 0 +#define RIONET_DOORBELL_JOIN 0x1000 +#define RIONET_DOORBELL_LEAVE 0x1001 + +#define RIONET_MAILBOX 0 + +#define RIONET_TX_RING_SIZE CONFIG_RIONET_TX_SIZE +#define RIONET_RX_RING_SIZE CONFIG_RIONET_RX_SIZE + +LIST_HEAD(rionet_peers); + +struct rionet_private { + struct rio_mport *mport; + struct sk_buff *rx_skb[RIONET_RX_RING_SIZE]; + struct sk_buff *tx_skb[RIONET_TX_RING_SIZE]; + struct net_device_stats stats; + int rx_slot; + int tx_slot; + int tx_cnt; + int ack_slot; + spinlock_t lock; + u32 msg_enable; +}; + +struct rionet_peer { + struct list_head node; + struct rio_dev *rdev; + struct resource *res; +}; + +static int rionet_check = 0; +static int rionet_capable = 1; +static struct net_device *sndev = NULL; + +/* + * This is a fast lookup table for for translating TX + * Ethernet packets into a destination RIO device. It + * could be made into a hash table to save memory depending + * on system trade-offs. + */ +static struct rio_dev *rionet_active[RIO_MAX_ROUTE_ENTRIES]; + +#define is_rionet_capable(pef, src_ops, dst_ops) \ + ((pef & RIO_PEF_INB_MBOX) && \ + (pef & RIO_PEF_INB_DOORBELL) && \ + (src_ops & RIO_SRC_OPS_DOORBELL) && \ + (dst_ops & RIO_DST_OPS_DOORBELL)) +#define dev_rionet_capable(dev) \ + is_rionet_capable(dev->pef, dev->src_ops, dev->dst_ops) + +#define RIONET_MAC_MATCH(x) (*(u32 *)x == 0x00010001) +#define RIONET_GET_DESTID(x) (*(u16 *)(x + 4)) + +static struct net_device_stats *rionet_stats(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + return &rnet->stats; +} + +static int rionet_rx_clean(struct net_device *ndev) +{ + int i; + int error = 0; + struct rionet_private *rnet = ndev->priv; + void *data; + + i = rnet->rx_slot; + + do { + if (!rnet->rx_skb[i]) { + rnet->stats.rx_dropped++; + continue; + } + + if (!(data = rio_get_inb_message(rnet->mport, RIONET_MAILBOX))) + break; + + rnet->rx_skb[i]->data = data; + skb_put(rnet->rx_skb[i], RIO_MAX_MSG_SIZE); + rnet->rx_skb[i]->dev = sndev; + rnet->rx_skb[i]->protocol = + eth_type_trans(rnet->rx_skb[i], sndev); + error = netif_rx(rnet->rx_skb[i]); + + if (error == NET_RX_DROP) { + rnet->stats.rx_dropped++; + } else if (error == NET_RX_BAD) { + if (netif_msg_rx_err(rnet)) + printk(KERN_WARNING "%s: bad rx packet\n", + DRV_NAME); + rnet->stats.rx_errors++; + } else { + rnet->stats.rx_packets++; + rnet->stats.rx_bytes += RIO_MAX_MSG_SIZE; + } + + } while ((i = (i + 1) % RIONET_RX_RING_SIZE) != rnet->rx_slot); + + return i; +} + +static void rionet_rx_fill(struct net_device *ndev, int end) +{ + int i; + struct rionet_private *rnet = ndev->priv; + + i = rnet->rx_slot; + do { + rnet->rx_skb[i] = dev_alloc_skb(RIO_MAX_MSG_SIZE); + + if (!rnet->rx_skb[i]) + break; + + rio_add_inb_buffer(rnet->mport, RIONET_MAILBOX, + rnet->rx_skb[i]->data); + } while ((i = (i + 1) % RIONET_RX_RING_SIZE) != end); + + rnet->rx_slot = i; +} + +static int rionet_queue_tx_msg(struct sk_buff *skb, struct net_device *ndev, + struct rio_dev *rdev) +{ + struct rionet_private *rnet = ndev->priv; + + rio_add_outb_message(rnet->mport, rdev, 0, skb->data, skb->len); + rnet->tx_skb[rnet->tx_slot] = skb; + + rnet->stats.tx_packets++; + rnet->stats.tx_bytes += skb->len; + + if (++rnet->tx_cnt == RIONET_TX_RING_SIZE) + netif_stop_queue(ndev); + + if (++rnet->tx_slot == RIONET_TX_RING_SIZE) + rnet->tx_slot = 0; + + if (netif_msg_tx_queued(rnet)) + printk(KERN_INFO "%s: queued skb %8.8x len %8.8x\n", DRV_NAME, + (u32) skb, skb->len); + + return 0; +} + +static int rionet_start_xmit(struct sk_buff *skb, struct net_device *ndev) +{ + int i; + struct rionet_private *rnet = ndev->priv; + struct ethhdr *eth = (struct ethhdr *)skb->data; + u16 destid; + + spin_lock_irq(&rnet->lock); + + if ((rnet->tx_cnt + 1) > RIONET_TX_RING_SIZE) { + netif_stop_queue(ndev); + spin_unlock_irq(&rnet->lock); + return -EBUSY; + } + + if (eth->h_dest[0] & 0x01) { + /* + * XXX Need to delay queuing if ring max is reached, + * flush additional packets in tx_event() before + * awakening the queue. We can easily exceed ring + * size with a large number of nodes or even a + * small number where the ring is relatively full + * on entrance to hard_start_xmit. + */ + for (i = 0; i < RIO_MAX_ROUTE_ENTRIES; i++) + if (rionet_active[i]) + rionet_queue_tx_msg(skb, ndev, + rionet_active[i]); + } else if (RIONET_MAC_MATCH(eth->h_dest)) { + destid = RIONET_GET_DESTID(eth->h_dest); + if (rionet_active[destid]) + rionet_queue_tx_msg(skb, ndev, rionet_active[destid]); + } + + spin_unlock_irq(&rnet->lock); + + return 0; +} + +static int rionet_set_mac_address(struct net_device *ndev, void *p) +{ + struct sockaddr *addr = p; + + if (!is_valid_ether_addr(addr->sa_data)) + return -EADDRNOTAVAIL; + + memcpy(ndev->dev_addr, addr->sa_data, ndev->addr_len); + + return 0; +} + +static int rionet_change_mtu(struct net_device *ndev, int new_mtu) +{ + struct rionet_private *rnet = ndev->priv; + + if (netif_msg_drv(rnet)) + printk(KERN_WARNING + "%s: rionet_change_mtu(): not implemented\n", DRV_NAME); + + return 0; +} + +static void rionet_set_multicast_list(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + + if (netif_msg_drv(rnet)) + printk(KERN_WARNING + "%s: rionet_set_multicast_list(): not implemented\n", + DRV_NAME); +} + +static void rionet_dbell_event(struct rio_mport *mport, u16 sid, u16 tid, + u16 info) +{ + struct net_device *ndev = sndev; + struct rionet_private *rnet = ndev->priv; + struct rionet_peer *peer; + + if (netif_msg_intr(rnet)) + printk(KERN_INFO "%s: doorbell sid %4.4x tid %4.4x info %4.4x", + DRV_NAME, sid, tid, info); + if (info == RIONET_DOORBELL_JOIN) { + if (!rionet_active[sid]) { + list_for_each_entry(peer, &rionet_peers, node) { + if (peer->rdev->destid == sid) + rionet_active[sid] = peer->rdev; + } + rio_mport_send_doorbell(mport, sid, + RIONET_DOORBELL_JOIN); + } + } else if (info == RIONET_DOORBELL_LEAVE) { + rionet_active[sid] = NULL; + } else { + if (netif_msg_intr(rnet)) + printk(KERN_WARNING "%s: unhandled doorbell\n", + DRV_NAME); + } +} + +static void rionet_inb_msg_event(struct rio_mport *mport, int mbox, int slot) +{ + int n; + struct net_device *ndev = sndev; + struct rionet_private *rnet = (struct rionet_private *)ndev->priv; + + if (netif_msg_intr(rnet)) + printk(KERN_INFO "%s: inbound message event, mbox %d slot %d\n", + DRV_NAME, mbox, slot); + + spin_lock(&rnet->lock); + if ((n = rionet_rx_clean(ndev)) != rnet->rx_slot) + rionet_rx_fill(ndev, n); + spin_unlock(&rnet->lock); +} + +static void rionet_outb_msg_event(struct rio_mport *mport, int mbox, int slot) +{ + struct net_device *ndev = sndev; + struct rionet_private *rnet = ndev->priv; + + spin_lock(&rnet->lock); + + if (netif_msg_intr(rnet)) + printk(KERN_INFO + "%s: outbound message event, mbox %d slot %d\n", + DRV_NAME, mbox, slot); + + while (rnet->tx_cnt && (rnet->ack_slot != slot)) { + /* dma unmap single */ + dev_kfree_skb_irq(rnet->tx_skb[rnet->ack_slot]); + rnet->tx_skb[rnet->ack_slot] = NULL; + if (++rnet->ack_slot == RIONET_TX_RING_SIZE) + rnet->ack_slot = 0; + rnet->tx_cnt--; + } + + if (rnet->tx_cnt < RIONET_TX_RING_SIZE) + netif_wake_queue(ndev); + + spin_unlock(&rnet->lock); +} + +static int rionet_open(struct net_device *ndev) +{ + int i, rc = 0; + struct rionet_peer *peer, *tmp; + u32 pwdcsr; + struct rionet_private *rnet = ndev->priv; + + if (netif_msg_ifup(rnet)) + printk(KERN_INFO "%s: open\n", DRV_NAME); + + if ((rc = rio_request_inb_dbell(rnet->mport, + RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE, + rionet_dbell_event)) < 0) + goto out; + + if ((rc = rio_request_inb_mbox(rnet->mport, + RIONET_MAILBOX, + RIONET_RX_RING_SIZE, + rionet_inb_msg_event)) < 0) + goto out; + + if ((rc = rio_request_outb_mbox(rnet->mport, + RIONET_MAILBOX, + RIONET_TX_RING_SIZE, + rionet_outb_msg_event)) < 0) + goto out; + + /* Initialize inbound message ring */ + for (i = 0; i < RIONET_RX_RING_SIZE; i++) + rnet->rx_skb[i] = NULL; + rnet->rx_slot = 0; + rionet_rx_fill(ndev, 0); + + rnet->tx_slot = 0; + rnet->tx_cnt = 0; + rnet->ack_slot = 0; + + spin_lock_init(&rnet->lock); + + rnet->msg_enable = RIONET_DEFAULT_MSGLEVEL; + + netif_carrier_on(ndev); + netif_start_queue(ndev); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + if (!(peer->res = rio_request_outb_dbell(peer->rdev, + RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE))) + { + printk(KERN_ERR "%s: error requesting doorbells\n", + DRV_NAME); + continue; + } + + /* + * If device has initialized inbound doorbells, + * send a join message + */ + rio_read_config_32(peer->rdev, RIO_WRITE_PORT_CSR, &pwdcsr); + if (pwdcsr & RIO_DOORBELL_AVAIL) + rio_send_doorbell(peer->rdev, RIONET_DOORBELL_JOIN); + } + + out: + return rc; +} + +static int rionet_close(struct net_device *ndev) +{ + struct rionet_private *rnet = (struct rionet_private *)ndev->priv; + struct rionet_peer *peer, *tmp; + int i; + + if (netif_msg_ifup(rnet)) + printk(KERN_INFO "%s: close\n", DRV_NAME); + + netif_stop_queue(ndev); + netif_carrier_off(ndev); + + for (i = 0; i < RIONET_RX_RING_SIZE; i++) + if (rnet->rx_skb[i]) + kfree_skb(rnet->rx_skb[i]); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + if (rionet_active[peer->rdev->destid]) { + rio_send_doorbell(peer->rdev, RIONET_DOORBELL_LEAVE); + rionet_active[peer->rdev->destid] = NULL; + } + rio_release_outb_dbell(peer->rdev, peer->res); + } + + rio_release_inb_dbell(rnet->mport, RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE); + rio_release_inb_mbox(rnet->mport, RIONET_MAILBOX); + rio_release_outb_mbox(rnet->mport, RIONET_MAILBOX); + + return 0; +} + +static void rionet_remove(struct rio_dev *rdev) +{ + struct net_device *ndev = NULL; + struct rionet_peer *peer, *tmp; + + unregister_netdev(ndev); + kfree(ndev); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + list_del(&peer->node); + kfree(peer); + } +} + +static int rionet_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd) +{ + return -EOPNOTSUPP; +} + +static void rionet_get_drvinfo(struct net_device *ndev, + struct ethtool_drvinfo *info) +{ + struct rionet_private *rnet = ndev->priv; + + strcpy(info->driver, DRV_NAME); + strcpy(info->version, DRV_VERSION); + strcpy(info->fw_version, "n/a"); + sprintf(info->bus_info, "RIO master port %d", rnet->mport->id); +} + +static u32 rionet_get_msglevel(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + + return rnet->msg_enable; +} + +static void rionet_set_msglevel(struct net_device *ndev, u32 value) +{ + struct rionet_private *rnet = ndev->priv; + + rnet->msg_enable = value; +} + +static u32 rionet_get_link(struct net_device *ndev) +{ + return netif_carrier_ok(ndev); +} + +static struct ethtool_ops rionet_ethtool_ops = { + .get_drvinfo = rionet_get_drvinfo, + .get_msglevel = rionet_get_msglevel, + .set_msglevel = rionet_set_msglevel, + .get_link = rionet_get_link, +}; + +static int rionet_setup_netdev(struct rio_mport *mport) +{ + int rc = 0; + struct net_device *ndev = NULL; + struct rionet_private *rnet; + u16 device_id; + + /* Allocate our net_device structure */ + ndev = alloc_etherdev(sizeof(struct rionet_private)); + if (ndev == NULL) { + printk(KERN_INFO "%s: could not allocate ethernet device.\n", + DRV_NAME); + rc = -ENOMEM; + goto out; + } + + /* + * XXX hack, store point a static at ndev so we can get it... + * Perhaps need an array of these that the handler can + * index via the mbox number. + */ + sndev = ndev; + + /* Set up private area */ + rnet = (struct rionet_private *)ndev->priv; + rnet->mport = mport; + + /* Set the default MAC address */ + device_id = rio_local_get_device_id(mport); + ndev->dev_addr[0] = 0x00; + ndev->dev_addr[1] = 0x01; + ndev->dev_addr[2] = 0x00; + ndev->dev_addr[3] = 0x01; + ndev->dev_addr[4] = device_id >> 8; + ndev->dev_addr[5] = device_id & 0xff; + + /* Fill in the driver function table */ + ndev->open = &rionet_open; + ndev->hard_start_xmit = &rionet_start_xmit; + ndev->stop = &rionet_close; + ndev->get_stats = &rionet_stats; + ndev->change_mtu = &rionet_change_mtu; + ndev->set_mac_address = &rionet_set_mac_address; + ndev->set_multicast_list = &rionet_set_multicast_list; + ndev->do_ioctl = &rionet_ioctl; + SET_ETHTOOL_OPS(ndev, &rionet_ethtool_ops); + + ndev->mtu = RIO_MAX_MSG_SIZE - 14; + + SET_MODULE_OWNER(ndev); + + rc = register_netdev(ndev); + if (rc != 0) + goto out; + + printk("%s: %s %s Version %s, MAC %02x:%02x:%02x:%02x:%02x:%02x\n", + ndev->name, + DRV_NAME, + DRV_DESC, + DRV_VERSION, + ndev->dev_addr[0], ndev->dev_addr[1], ndev->dev_addr[2], + ndev->dev_addr[3], ndev->dev_addr[4], ndev->dev_addr[5]); + + out: + return rc; +} + +/* + * XXX Make multi-net safe + */ +static int rionet_probe(struct rio_dev *rdev, const struct rio_device_id *id) +{ + int rc = -ENODEV; + u32 lpef, lsrc_ops, ldst_ops; + struct rionet_peer *peer; + + /* If local device is not rionet capable, give up quickly */ + if (!rionet_capable) + goto out; + + /* + * First time through, make sure local device is rionet + * capable, setup netdev, and set flags so this is skipped + * on later probes + */ + if (!rionet_check) { + rio_local_read_config_32(rdev->net->hport, RIO_PEF_CAR, &lpef); + rio_local_read_config_32(rdev->net->hport, RIO_SRC_OPS_CAR, + &lsrc_ops); + rio_local_read_config_32(rdev->net->hport, RIO_DST_OPS_CAR, + &ldst_ops); + if (!is_rionet_capable(lpef, lsrc_ops, ldst_ops)) { + printk(KERN_ERR + "%s: local device is not network capable\n", + DRV_NAME); + rionet_check = 1; + rionet_capable = 0; + goto out; + } + + rc = rionet_setup_netdev(rdev->net->hport); + rionet_check = 1; + } + + /* + * If the remote device has mailbox/doorbell capabilities, + * add it to the peer list. + */ + if (dev_rionet_capable(rdev)) { + if (!(peer = kmalloc(sizeof(struct rionet_peer), GFP_KERNEL))) { + rc = -ENOMEM; + goto out; + } + peer->rdev = rdev; + list_add_tail(&peer->node, &rionet_peers); + } + + out: + return rc; +} + +static struct rio_device_id rionet_id_table[] = { + {RIO_DEVICE(RIO_ANY_ID, RIO_ANY_ID)} +}; + +static struct rio_driver rionet_driver = { + .name = "rionet", + .id_table = rionet_id_table, + .probe = rionet_probe, + .remove = rionet_remove, +}; + +static int __init rionet_init(void) +{ + return rio_register_driver(&rionet_driver); +} + +static void __exit rionet_exit(void) +{ + rio_unregister_driver(&rionet_driver); +} + +module_init(rionet_init); +module_exit(rionet_exit); From davem@davemloft.net Wed Jun 1 11:56:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 11:56:30 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51IuMXq001813 for ; Wed, 1 Jun 2005 11:56:28 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DdYMW-0003Ku-N6; Wed, 01 Jun 2005 11:54:44 -0700 Date: Wed, 01 Jun 2005 11:54:44 -0700 (PDT) Message-Id: <20050601.115444.68157121.davem@davemloft.net> To: raghunathan.venkatesan@wipro.com Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com, linux@der-keiler.de Subject: Re: Unable to handle kernel paging request at virtual address 04000460 From: "David S. Miller" In-Reply-To: <438662DA48DCAA41B1DF648BD4BD76C0E45DF1@CHN-SNR-MBX01.wipro.com> References: <438662DA48DCAA41B1DF648BD4BD76C0E45DF1@CHN-SNR-MBX01.wipro.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1946 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Please don't ask the community to debug your custom kernel with private VPN driver modules installed. From afleming@freescale.com Wed Jun 1 13:46:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 13:46:40 -0700 (PDT) Received: from az33egw01.freescale.net (az33egw01.freescale.net [192.88.158.102]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51KkZXq010355 for ; Wed, 1 Jun 2005 13:46:36 -0700 Received: from az33smr02.freescale.net (az33smr02.freescale.net [10.64.34.200]) by az33egw01.freescale.net (8.12.11/az33egw01) with ESMTP id j51KouYg020960; Wed, 1 Jun 2005 13:50:56 -0700 (MST) Received: from [10.82.17.56] ([10.82.17.56]) by az33smr02.freescale.net (8.13.1/8.13.0) with ESMTP id j51KmgBH016530; Wed, 1 Jun 2005 15:48:42 -0500 (CDT) In-Reply-To: <20050531105939.7486e071@dxpl.pdx.osdl.net> References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> Cc: Netdev , Embedded PPC Linux list , Kumar Gala Content-Transfer-Encoding: 7bit From: Andy Fleming Subject: Re: RFC: PHY Abstraction Layer II Date: Wed, 1 Jun 2005 15:45:26 -0500 To: Stephen Hemminger X-Mailer: Apple Mail (2.730) X-archive-position: 1947 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: afleming@freescale.com Precedence: bulk X-list: netdev On May 31, 2005, at 12:59, Stephen Hemminger wrote: > Here are some patches: > * allow phy's to be modules > * use driver owner for ref count > * make local functions static where ever possible I agree with all these. > * get rid of bus read may sleep implication in comment. > since you are holding phy spin lock it better not!! But not this one. The phy_read and phy_write functions are reading from and writing to a bus. It is a reasonable implementation to have the operation block in the bus driver, and be awoken when an interrupt signals the operation is done. All of the phydev spinlocks have been arranged so as to prevent the lock being taken during interrupt time. Unless I've misunderstood spinlocks (it wouldn't be the first time), as long as the lock is never taken in interrupt time, it should be ok to hold the lock, and wait for an interrupt before clearing the lock. Andy Fleming From gwingerde@home.nl Wed Jun 1 13:58:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 13:58:33 -0700 (PDT) Received: from smtpq3.home.nl (smtpq3.home.nl [213.51.128.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51KwRXq011274 for ; Wed, 1 Jun 2005 13:58:29 -0700 Received: from [213.51.128.134] (port=47200 helo=smtp3.home.nl) by smtpq3.home.nl with esmtp (Exim 4.30) id 1DdaHH-0007SM-4Z; Wed, 01 Jun 2005 22:57:27 +0200 Received: from cc10088-a.ensch1.ov.home.nl ([217.123.128.105]:58103 helo=[192.168.14.1]) by smtp3.home.nl with esmtp (Exim 4.30) id 1DdaHF-0000hZ-Q1; Wed, 01 Jun 2005 22:57:25 +0200 Message-ID: <429E1FAB.6080503@home.nl> Date: Wed, 01 Jun 2005 22:50:51 +0200 From: Gertjan van Wingerde User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050322) X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com, jgarzik@pobox.com Subject: [PATCH] ieee80211: Update generic definitions to latest specs. Content-Type: multipart/mixed; boundary="------------020800010603020503020809" X-AtHome-MailScanner-Information: Neem contact op met support@home.nl voor meer informatie X-AtHome-MailScanner: Found to be clean X-archive-position: 1948 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gwingerde@home.nl Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --------------020800010603020503020809 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, Attached patch updates the definitions of the generic ieee80211 stack to the latest versions of the published 802.11x specification suite. Please review and apply. Signed-off-by: Gertjan van Wingerde --------------020800010603020503020809 Content-Type: text/plain; name="ieee80211.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ieee80211.diff" Index: include/net/ieee80211.h =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/include/net/ieee80211.h (mode:100644) +++ uncommitted/include/net/ieee80211.h (mode:100644) @@ -103,7 +103,7 @@ #define MAX_FRAG_THRESHOLD 2346U /* Frame control field constants */ -#define IEEE80211_FCTL_VERS 0x0002 +#define IEEE80211_FCTL_VERS 0x0003 #define IEEE80211_FCTL_FTYPE 0x000c #define IEEE80211_FCTL_STYPE 0x00f0 #define IEEE80211_FCTL_TODS 0x0100 @@ -111,8 +111,8 @@ #define IEEE80211_FCTL_MOREFRAGS 0x0400 #define IEEE80211_FCTL_RETRY 0x0800 #define IEEE80211_FCTL_PM 0x1000 -#define IEEE80211_FCTL_MOREDATA 0x2000 -#define IEEE80211_FCTL_WEP 0x4000 +#define IEEE80211_FCTL_MOREDATA 0x2000 +#define IEEE80211_FCTL_PROTECTEDFRAME 0x4000 #define IEEE80211_FCTL_ORDER 0x8000 #define IEEE80211_FTYPE_MGMT 0x0000 @@ -131,6 +131,7 @@ #define IEEE80211_STYPE_DISASSOC 0x00A0 #define IEEE80211_STYPE_AUTH 0x00B0 #define IEEE80211_STYPE_DEAUTH 0x00C0 +#define IEEE80211_STYPE_ACTION 0x00D0 /* control */ #define IEEE80211_STYPE_PSPOLL 0x00A0 @@ -251,6 +252,7 @@ #define SNAP_SIZE sizeof(struct ieee80211_snap_hdr) +#define WLAN_FC_GET_VERS(fc) ((fc) & IEEE80211_FCTL_VERS) #define WLAN_FC_GET_TYPE(fc) ((fc) & IEEE80211_FCTL_FTYPE) #define WLAN_FC_GET_STYPE(fc) ((fc) & IEEE80211_FCTL_STYPE) @@ -271,6 +273,9 @@ #define WLAN_CAPABILITY_SHORT_PREAMBLE (1<<5) #define WLAN_CAPABILITY_PBCC (1<<6) #define WLAN_CAPABILITY_CHANNEL_AGILITY (1<<7) +#define WLAN_CAPABILITY_SPECTRUM_MGMT (1<<8) +#define WLAN_CAPABILITY_SHORT_SLOT_TIME (1<<10) +#define WLAN_CAPABILITY_OSSS_OFDM (1<<13) /* Status codes */ #define WLAN_STATUS_SUCCESS 0 @@ -285,9 +290,24 @@ #define WLAN_STATUS_AP_UNABLE_TO_HANDLE_NEW_STA 17 #define WLAN_STATUS_ASSOC_DENIED_RATES 18 /* 802.11b */ -#define WLAN_STATUS_ASSOC_DENIED_NOSHORT 19 +#define WLAN_STATUS_ASSOC_DENIED_NOSHORTPREAMBLE 19 #define WLAN_STATUS_ASSOC_DENIED_NOPBCC 20 #define WLAN_STATUS_ASSOC_DENIED_NOAGILITY 21 +/* 802.11h */ +#define WLAN_STATUS_ASSOC_DENIED_SPECTRUM_MGMT_REQUIRED 22 +#define WLAN_STATUS_ASSOC_REJECTED_POWER_CAP_UNACCEPTABLE 23 +#define WLAN_STATUS_ASSOC_REJECTED_SUPP_CHANNELS_UNACCEPTABLE 24 +/* 802.11g */ +#define WLAN_STATUS_ASSOC_DENIED_NOSHORTTIME 25 +#define WLAN_STATUS_ASSOC_DENIED_NODSSSOFDM 26 +/* 802.11i */ +#define WLAN_STATUS_INVALID_IE 40 +#define WLAN_STATUS_INVALID_GROUP_CIPHER 41 +#define WLAN_STATUS_INVALID_PAIRWISE_CIPHER 42 +#define WLAN_STATUS_INVALID_AKMP 43 +#define WLAN_STATUS_UNSUPP_RSN_VERSION 44 +#define WLAN_STATUS_INVALID_RSN_IE_CAP 45 +#define WLAN_STATUS_CIPHER_SUITE_REJECTED 46 /* Reason codes */ #define WLAN_REASON_UNSPECIFIED 1 @@ -299,6 +319,22 @@ #define WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA 7 #define WLAN_REASON_DISASSOC_STA_HAS_LEFT 8 #define WLAN_REASON_STA_REQ_ASSOC_WITHOUT_AUTH 9 +/* 802.11h */ +#define WLAN_REASON_DISASSOC_POWER_CAP_UNACCEPTABLE 10 +#define WLAN_REASON_DISASSOC_SUPP_CHANNELS_UNACCEPTABLE 11 +/* 802.11i */ +#define WLAN_REASON_INVALID_IE 13 +#define WLAN_REASON_MIC_FAILURE 14 +#define WLAN_REASON_4WAY_HANDSHAKE_TIMEOUT 15 +#define WLAN_REASON_GROUP_KEY_HANDSHAKE_TIMEOUT 16 +#define WLAN_REASON_IE_DIFFERENT 17 +#define WLAN_REASON_INVALID_GROUP_CIPHER 18 +#define WLAN_REASON_INVALID_PAIRWISE_CIPHER 19 +#define WLAN_REASON_INVALID_AKMP 20 +#define WLAN_REASON_UNSUPP_RSN_VERSION 21 +#define WLAN_REASON_INVALID_RSN_IE_CAP 22 +#define WLAN_REASON_IEEE8021X_FAILED 23 +#define WLAN_REASON_CIPHER_SUITE_REJECTED 24 #define IEEE80211_STATMASK_SIGNAL (1<<0) @@ -477,17 +513,32 @@ #define BEACON_PROBE_SSID_ID_POSITION 12 /* Management Frame Information Element Types */ -#define MFIE_TYPE_SSID 0 -#define MFIE_TYPE_RATES 1 -#define MFIE_TYPE_FH_SET 2 -#define MFIE_TYPE_DS_SET 3 -#define MFIE_TYPE_CF_SET 4 -#define MFIE_TYPE_TIM 5 -#define MFIE_TYPE_IBSS_SET 6 -#define MFIE_TYPE_CHALLENGE 16 -#define MFIE_TYPE_RSN 48 -#define MFIE_TYPE_RATES_EX 50 -#define MFIE_TYPE_GENERIC 221 +#define MFIE_TYPE_SSID 0 +#define MFIE_TYPE_RATES 1 +#define MFIE_TYPE_FH_SET 2 +#define MFIE_TYPE_DS_SET 3 +#define MFIE_TYPE_CF_SET 4 +#define MFIE_TYPE_TIM 5 +#define MFIE_TYPE_IBSS_SET 6 +#define MFIE_TYPE_COUNTRY 7 +#define MFIE_TYPE_HOP_PARAMS 8 +#define MFIE_TYPE_HOP_TABLE 9 +#define MFIE_TYPE_REQUEST 10 +#define MFIE_TYPE_CHALLENGE 16 +#define MFIE_TYPE_POWER_CONSTRAINT 32 +#define MFIE_TYPE_POWER_CAPABILITY 33 +#define MFIE_TYPE_TPC_REQUEST 34 +#define MFIE_TYPE_TPC_REPORT 35 +#define MFIE_TYPE_SUPP_CHANNELS 36 +#define MFIE_TYPE_CSA 37 +#define MFIE_TYPE_MEASURE_REQUEST 38 +#define MFIE_TYPE_MEASURE_REPORT 39 +#define MFIE_TYPE_QUIET 40 +#define MFIE_TYPE_IBSS_DFS 41 +#define MFIE_TYPE_ERP_INFO 42 +#define MFIE_TYPE_RSN 48 +#define MFIE_TYPE_RATES_EX 50 +#define MFIE_TYPE_GENERIC 221 struct ieee80211_info_element_hdr { u8 id; Index: net/ieee80211/ieee80211_rx.c =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/net/ieee80211/ieee80211_rx.c (mode:100644) +++ uncommitted/net/ieee80211/ieee80211_rx.c (mode:100644) @@ -440,7 +440,7 @@ crypt->ops->decrypt_mpdu == NULL)) crypt = NULL; - if (!crypt && (fc & IEEE80211_FCTL_WEP)) { + if (!crypt && (fc & IEEE80211_FCTL_PROTECTEDFRAME)) { /* This seems to be triggered by some (multicast?) * frames from other than current BSS, so just drop the * frames silently instead of filling system log with @@ -456,7 +456,7 @@ #ifdef NOT_YET if (type != WLAN_FC_TYPE_DATA) { if (type == WLAN_FC_TYPE_MGMT && stype == WLAN_FC_STYPE_AUTH && - fc & IEEE80211_FCTL_WEP && ieee->host_decrypt && + fc & IEEE80211_FCTL_PROTECTEDFRAME && ieee->host_decrypt && (keyidx = hostap_rx_frame_decrypt(ieee, skb, crypt)) < 0) { printk(KERN_DEBUG "%s: failed to decrypt mgmt::auth " @@ -557,7 +557,7 @@ /* skb: hdr + (possibly fragmented, possibly encrypted) payload */ - if (ieee->host_decrypt && (fc & IEEE80211_FCTL_WEP) && + if (ieee->host_decrypt && (fc & IEEE80211_FCTL_PROTECTEDFRAME) && (keyidx = ieee80211_rx_frame_decrypt(ieee, skb, crypt)) < 0) goto rx_dropped; @@ -565,7 +565,7 @@ /* skb: hdr + (possibly fragmented) plaintext payload */ // PR: FIXME: hostap has additional conditions in the "if" below: - // ieee->host_decrypt && (fc & IEEE80211_FCTL_WEP) && + // ieee->host_decrypt && (fc & IEEE80211_FCTL_PROTECTEDFRAME) && if ((frag != 0 || (fc & IEEE80211_FCTL_MOREFRAGS))) { int flen; struct sk_buff *frag_skb = ieee80211_frag_cache_get(ieee, hdr); @@ -621,12 +621,12 @@ /* skb: hdr + (possible reassembled) full MSDU payload; possibly still * encrypted/authenticated */ - if (ieee->host_decrypt && (fc & IEEE80211_FCTL_WEP) && + if (ieee->host_decrypt && (fc & IEEE80211_FCTL_PROTECTEDFRAME) && ieee80211_rx_frame_decrypt_msdu(ieee, skb, keyidx, crypt)) goto rx_dropped; hdr = (struct ieee80211_hdr *) skb->data; - if (crypt && !(fc & IEEE80211_FCTL_WEP) && !ieee->open_wep) { + if (crypt && !(fc & IEEE80211_FCTL_PROTECTEDFRAME) && !ieee->open_wep) { if (/*ieee->ieee802_1x &&*/ ieee80211_is_eapol_frame(ieee, skb)) { #ifdef CONFIG_IEEE80211_DEBUG @@ -647,7 +647,7 @@ } #ifdef CONFIG_IEEE80211_DEBUG - if (crypt && !(fc & IEEE80211_FCTL_WEP) && + if (crypt && !(fc & IEEE80211_FCTL_PROTECTEDFRAME) && ieee80211_is_eapol_frame(ieee, skb)) { struct eapol *eap = (struct eapol *)(skb->data + 24); @@ -656,7 +656,7 @@ } #endif - if (crypt && !(fc & IEEE80211_FCTL_WEP) && !ieee->open_wep && + if (crypt && !(fc & IEEE80211_FCTL_PROTECTEDFRAME) && !ieee->open_wep && !ieee80211_is_eapol_frame(ieee, skb)) { IEEE80211_DEBUG_DROP( "dropped unencrypted RX data " Index: net/ieee80211/ieee80211_tx.c =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/net/ieee80211/ieee80211_tx.c (mode:100644) +++ uncommitted/net/ieee80211/ieee80211_tx.c (mode:100644) @@ -314,7 +314,7 @@ if (encrypt) fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA | - IEEE80211_FCTL_WEP; + IEEE80211_FCTL_PROTECTEDFRAME; else fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA; Index: drivers/net/wireless/atmel.c =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/drivers/net/wireless/atmel.c (mode:100644) +++ uncommitted/drivers/net/wireless/atmel.c (mode:100644) @@ -867,7 +867,7 @@ header.duration_id = 0; header.seq_ctl = 0; if (priv->wep_is_on) - frame_ctl |= IEEE80211_FCTL_WEP; + frame_ctl |= IEEE80211_FCTL_PROTECTEDFRAME; if (priv->operating_mode == IW_MODE_ADHOC) { memcpy(&header.addr1, skb->data, 6); memcpy(&header.addr2, dev->dev_addr, 6); @@ -1117,7 +1117,7 @@ /* probe for CRC use here if needed once five packets have arrived with the same crc status, we assume we know what's happening and stop probing */ if (priv->probe_crc) { - if (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_WEP)) { + if (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_PROTECTEDFRAME)) { priv->do_rx_crc = probe_crc(priv, rx_packet_loc, msdu_size); } else { priv->do_rx_crc = probe_crc(priv, rx_packet_loc + 24, msdu_size - 24); @@ -1132,7 +1132,7 @@ } /* don't CRC header when WEP in use */ - if (priv->do_rx_crc && (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_WEP))) { + if (priv->do_rx_crc && (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_PROTECTEDFRAME))) { crc = crc32_le(0xffffffff, (unsigned char *)&header, 24); } msdu_size -= 24; /* header */ @@ -2677,7 +2677,7 @@ auth.alg = cpu_to_le16(C80211_MGMT_AAN_SHAREDKEY); /* no WEP for authentication frames with TrSeqNo 1 */ if (priv->CurrentAuthentTransactionSeqNum != 1) - header.frame_ctl |= cpu_to_le16(IEEE80211_FCTL_WEP); + header.frame_ctl |= cpu_to_le16(IEEE80211_FCTL_PROTECTEDFRAME); } else { auth.alg = cpu_to_le16(C80211_MGMT_AAN_OPENSYSTEM); } --------------020800010603020503020809-- From shemminger@osdl.org Wed Jun 1 14:20:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 14:20:21 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51LKIXq012488 for ; Wed, 1 Jun 2005 14:20:19 -0700 Received: from [10.8.0.74] (fw.osdl.org [65.172.181.6]) (authenticated bits=0) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j51LJFj9029727 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Wed, 1 Jun 2005 14:19:16 -0700 Message-ID: <429E2653.6010101@osdl.org> Date: Wed, 01 Jun 2005 14:19:15 -0700 From: Stephen Hemminger User-Agent: Mozilla Thunderbird 1.0.2-1.3.3 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andy Fleming CC: Netdev , Embedded PPC Linux list , Kumar Gala Subject: Re: RFC: PHY Abstraction Layer II References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> In-Reply-To: <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1949 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Andy Fleming wrote: > > On May 31, 2005, at 12:59, Stephen Hemminger wrote: > >> Here are some patches: >> * allow phy's to be modules >> * use driver owner for ref count >> * make local functions static where ever possible > > > I agree with all these. > >> * get rid of bus read may sleep implication in comment. >> since you are holding phy spin lock it better not!! > > > But not this one. The phy_read and phy_write functions are reading > from and writing to a bus. It is a reasonable implementation to have > the operation block in the bus driver, and be awoken when an > interrupt signals the operation is done. All of the phydev spinlocks > have been arranged so as to prevent the lock being taken during > interrupt time. > > Unless I've misunderstood spinlocks (it wouldn't be the first time), > as long as the lock is never taken in interrupt time, it should be ok > to hold the lock, and wait for an interrupt before clearing the lock. The problem is that sleeping is defined in the linux kernel as meaning waiting on a mutual exclusion primitive (like semaphore) that puts the current thread to sleep. It is not legal to sleep with a spinlock held. In the phy_read code you do: spin_lock_bh(&bus->mdio_lock); retval = bus->read(bus, phydev->addr, regnum); spin_unlock_bh(&bus->mdio_lock); If the bus->read function were to do something like start a request and wait on a semaphore, then you would be sleeping with a spin lock held. So bus->read can not sleep! (as sleep is defined in the linux kernel). From mchan@broadcom.com Wed Jun 1 14:32:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 14:32:36 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51LWXXq013485 for ; Wed, 1 Jun 2005 14:32:33 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Wed, 01 Jun 2005 14:31:20 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Wed, 1 Jun 2005 14:31:18 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BBO09844; Wed, 1 Jun 2005 14:31:15 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id OAA04566; Wed, 1 Jun 2005 14:31:15 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Wed, 1 Jun 2005 21:31:14 +0000 Received: from rh4 by nt-irva-0741; 01 Jun 2005 13:33:39 -0700 Subject: Re: Locking model for NAPI drivers From: "Michael Chan" To: "David S. Miller" cc: netdev@oss.sgi.com In-Reply-To: <20050531.154847.63995530.davem@davemloft.net> References: <20050531.154847.63995530.davem@davemloft.net> Date: Wed, 01 Jun 2005 13:33:39 -0700 Message-ID: <1117658019.4310.58.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6E80F6A21VO4407082-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 1950 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev On Tue, 2005-05-31 at 15:48 -0700, David S. Miller wrote: > Once we make this transformation, we need some way to synchronize > with the IRQ handler when shutting down the device or making major > configuration changes to the chip. > > The idea I came up with is a two-bit atomic bitmask. When base > level code wants to quiesce interrupt processing, it takes the > necessary driver spinlocks, sets the "SYNC" bit in the bitmask, > forces and IRQ to be asserted by the tg3 card, then waits for the > COMPLETE bit to get set by the interrupt handler. > During light testing, I found a race condition that caused tg3_irq_quiesce() to spin forever. The race condition is shown below. CPU1 CPU2 tg3_interrupt_tagged() tg3_netif_stop() netif_poll_disable() netif_rx_schedule() will do nothing tg3_full_lock() tg3_irq_quiesce() Because netif_poll_disable() is called, netif_rx_schedule() will do nothing in the interrupt handler. As a result, tg3_poll() will never be called to re-enable interrupts. Since interrupts are disabled, tg3_irq_quiesce() will not be able to set the interrupts and cause the interrupt handler to be called again, and therefore will wait forever. Even adding another call to tg3_irq_sync() at the end of the interrupt handler does not eliminate the race condition. I suppose we can enable interrupts in tg3_irq_quiesce() after setting the SYNC bit. From shemminger@osdl.org Wed Jun 1 14:38:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 14:38:41 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51LcbXq014435 for ; Wed, 1 Jun 2005 14:38:37 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j51LbZjA032260 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 1 Jun 2005 14:37:35 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j51LbYcg019087; Wed, 1 Jun 2005 14:37:35 -0700 Date: Wed, 1 Jun 2005 14:37:34 -0700 From: Stephen Hemminger To: Gertjan van Wingerde Cc: netdev@oss.sgi.com, jgarzik@pobox.com Subject: Re: [PATCH] ieee80211: Update generic definitions to latest specs. Message-ID: <20050601143734.3b7a49ca@dxpl.pdx.osdl.net> In-Reply-To: <429E1FAB.6080503@home.nl> References: <429E1FAB.6080503@home.nl> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1951 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Wed, 01 Jun 2005 22:50:51 +0200 Gertjan van Wingerde wrote: > Hi, > > Attached patch updates the definitions of the generic ieee80211 stack to > the latest versions of the published 802.11x specification suite. > Please review and apply. > > Signed-off-by: Gertjan van Wingerde > Could you change the elements that fix to be enum's instead of define's example: /* Management Frame Information Element Types */ enum ieee80211_mfie { MFIE_TYPE_SSID = 0, MFIE_TYPE_RATES = 1, MFIE_TYPE_FH_SET= 2, ... From shemminger@osdl.org Wed Jun 1 14:42:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 14:42:28 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51LgOXq015034 for ; Wed, 1 Jun 2005 14:42:24 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j51LfNjA032676 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 1 Jun 2005 14:41:24 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j51LfN66019413; Wed, 1 Jun 2005 14:41:23 -0700 Date: Wed, 1 Jun 2005 14:41:23 -0700 From: Stephen Hemminger To: Andy Fleming Cc: Netdev , Embedded PPC Linux list , Kumar Gala Subject: Re: RFC: PHY Abstraction Layer II Message-ID: <20050601144123.2bc11c06@dxpl.pdx.osdl.net> In-Reply-To: <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1952 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Wed, 1 Jun 2005 15:45:26 -0500 Andy Fleming wrote: > > On May 31, 2005, at 12:59, Stephen Hemminger wrote: > > > Here are some patches: > > * allow phy's to be modules > > * use driver owner for ref count > > * make local functions static where ever possible > > I agree with all these. > > > * get rid of bus read may sleep implication in comment. > > since you are holding phy spin lock it better not!! > On a different note, I am not sure that using sysfs/kobject bus object is the right thing for this object. Isn't the phy instance really just an kobject whose parent is the network device? I can't see a 1 to N relationship between phy bus and phy objects existing. The main use I can see for being a driver object is to catch suspend/resume, and wouldn't you want that to be tied to the network device. From davem@davemloft.net Wed Jun 1 15:22:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 15:23:02 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51MMwXq017804 for ; Wed, 1 Jun 2005 15:22:59 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Ddbag-0004kn-VP; Wed, 01 Jun 2005 15:21:35 -0700 Date: Wed, 01 Jun 2005 15:21:34 -0700 (PDT) Message-Id: <20050601.152134.120445266.davem@davemloft.net> To: mchan@broadcom.com Cc: netdev@oss.sgi.com Subject: Re: Locking model for NAPI drivers From: "David S. Miller" In-Reply-To: <1117658019.4310.58.camel@rh4> References: <20050531.154847.63995530.davem@davemloft.net> <1117658019.4310.58.camel@rh4> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1953 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev From: "Michael Chan" Date: Wed, 01 Jun 2005 13:33:39 -0700 > I suppose we can enable interrupts in tg3_irq_quiesce() after setting > the SYNC bit. Since the caller shuts down NAPI ->poll(), after setting the SYNC bit we can just check the MAILBOX register, and if a '1' is there just return. Does one need to mask out the upper bits of the regiser in order to avoid seeing the IRQ tag in such a comparison? Another potential problem is if the chip is hung for some reason, and even though an interrupt is asserted it does not send the interrupt. We'd hang in this case as well. Therefore it may be wise to add a timeout to the COMPLETE bit polling loop in order to handle such cases properly. From mchan@broadcom.com Wed Jun 1 15:32:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 15:32:58 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51MWsXq018653 for ; Wed, 1 Jun 2005 15:32:55 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Wed, 01 Jun 2005 15:31:50 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Wed, 1 Jun 2005 15:31:49 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BBP11233; Wed, 1 Jun 2005 15:31:45 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id PAA24578; Wed, 1 Jun 2005 15:31:45 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Wed, 1 Jun 2005 22:31:45 +0000 Received: from rh4 by nt-irva-0741; 01 Jun 2005 14:34:10 -0700 Subject: Re: Locking model for NAPI drivers From: "Michael Chan" To: "David S. Miller" cc: netdev@oss.sgi.com In-Reply-To: <20050601.152134.120445266.davem@davemloft.net> References: <20050531.154847.63995530.davem@davemloft.net> <1117658019.4310.58.camel@rh4> <20050601.152134.120445266.davem@davemloft.net> Date: Wed, 01 Jun 2005 14:34:10 -0700 Message-ID: <1117661650.4310.62.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6E80E8DC1VO4417184-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 1954 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev On Wed, 2005-06-01 at 15:21 -0700, David S. Miller wrote: > From: "Michael Chan" > Date: Wed, 01 Jun 2005 13:33:39 -0700 > > > I suppose we can enable interrupts in tg3_irq_quiesce() after setting > > the SYNC bit. > > Since the caller shuts down NAPI ->poll(), after setting the SYNC bit > we can just check the MAILBOX register, and if a '1' is there just > return. Does one need to mask out the upper bits of the regiser in > order to avoid seeing the IRQ tag in such a comparison? > No, just check for the value 1 since that's the value we use to disable interrupts. The value read back will always be 1 if 1 was previously written to it. From afleming@freescale.com Wed Jun 1 15:38:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 15:38:03 -0700 (PDT) Received: from az33egw02.freescale.net (az33egw02.freescale.net [192.88.158.103]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51MbxXq019309 for ; Wed, 1 Jun 2005 15:37:59 -0700 Received: from az33smr02.freescale.net (az33smr02.freescale.net [10.64.34.200]) by az33egw02.freescale.net (8.12.11/az33egw02) with ESMTP id j51Mf5oC009076; Wed, 1 Jun 2005 15:41:06 -0700 (MST) Received: from [10.82.17.56] ([10.82.17.56]) by az33smr02.freescale.net (8.13.1/8.13.0) with ESMTP id j51Me9Xo018231; Wed, 1 Jun 2005 17:40:10 -0500 (CDT) In-Reply-To: <20050601144123.2bc11c06@dxpl.pdx.osdl.net> References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> <20050601144123.2bc11c06@dxpl.pdx.osdl.net> Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <9A2D608A-D818-455B-96F4-ED42413556C0@freescale.com> Cc: Netdev , Embedded PPC Linux list , Kumar Gala Content-Transfer-Encoding: 7bit From: Andy Fleming Subject: Re: RFC: PHY Abstraction Layer II Date: Wed, 1 Jun 2005 17:36:54 -0500 To: Stephen Hemminger X-Mailer: Apple Mail (2.730) X-archive-position: 1955 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: afleming@freescale.com Precedence: bulk X-list: netdev On Jun 1, 2005, at 16:41, Stephen Hemminger wrote: > On Wed, 1 Jun 2005 15:45:26 -0500 > Andy Fleming wrote: >> >>> * get rid of bus read may sleep implication in comment. >>> since you are holding phy spin lock it better not!! >>> >> >> > > On a different note, I am not sure that using sysfs/kobject bus object > is the right thing for this object. Isn't the phy instance really > just > an kobject whose parent is the network device? I can't see a 1 to N > relationship between phy bus and phy objects existing. Well, the MII Management bus is, in fact, a bus. When a network driver wants to modify a PHY, it must access that bus. Many ethernet controllers have a 1 to 1 relationship, since a typical NIC is a PCI card with 1 ethernet port (meaning one controller, and one PHY). However, many systems have multiple ethernet controllers attached to one bus, which configures multiple PHYs. Currently, these systems have been relying on luck to prevent multiple accesses to the same bus. This tends to work because all of the PHY support is contained within the ethernet driver, so it is easy for such drivers to ensure that only one PHY transaction is done at a time. This system begins to fall apart, though, when the PHY drivers start operating more independently to react to changing PHY state. It really begins to fall apart if you have multiple drivers trying to access a shared bus. For instance, the 8560 ADS board has 2 gigabit ethernet ports controlled by the gianfar driver, and 2 10/100 ports in the CPM subsystem, controlled by the fcc_enet driver. These two drivers each have an access point for the bus, which use different mechanisms (one is a bit bang interface, and one is register based). Using the new abstraction, it is possible for the FCC driver to use the gianfar driver's bus, thus saving code, and reducing complexity. > > The main use I can see for being a driver object is to catch > suspend/resume, > and wouldn't you want that to be tied to the network device. It would be quite easy for the network driver to suspend or resume the PHY and bus objects under the new abstraction. However, if eth0 is suspended, should it suspend the whole bus, and all the PHYs on it? By making the MII bus an independent entity, eth0 can be suspended, and it can choose to suspend its PHY, but eth1 can continue to access its PHY over the bus, since those aren't suspended. From afleming@freescale.com Wed Jun 1 15:43:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 15:44:01 -0700 (PDT) Received: from az33egw01.freescale.net (az33egw01.freescale.net [192.88.158.102]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j51MhwXq020018 for ; Wed, 1 Jun 2005 15:43:58 -0700 Received: from az33smr02.freescale.net (az33smr02.freescale.net [10.64.34.200]) by az33egw01.freescale.net (8.12.11/az33egw01) with ESMTP id j51MmPBu017065; Wed, 1 Jun 2005 15:48:25 -0700 (MST) Received: from [10.82.17.56] ([10.82.17.56]) by az33smr02.freescale.net (8.13.1/8.13.0) with ESMTP id j51MkCFY019534; Wed, 1 Jun 2005 17:46:12 -0500 (CDT) In-Reply-To: <429E2653.6010101@osdl.org> References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> <429E2653.6010101@osdl.org> Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Cc: Netdev , Embedded PPC Linux list , Kumar Gala Content-Transfer-Encoding: 7bit From: Andy Fleming Subject: Re: RFC: PHY Abstraction Layer II Date: Wed, 1 Jun 2005 17:42:56 -0500 To: Stephen Hemminger X-Mailer: Apple Mail (2.730) X-archive-position: 1956 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: afleming@freescale.com Precedence: bulk X-list: netdev On Jun 1, 2005, at 16:19, Stephen Hemminger wrote: > Andy Fleming wrote: >> >> But not this one. The phy_read and phy_write functions are >> reading from and writing to a bus. It is a reasonable >> implementation to have the operation block in the bus driver, and >> be awoken when an interrupt signals the operation is done. All >> of the phydev spinlocks have been arranged so as to prevent the >> lock being taken during interrupt time. >> >> Unless I've misunderstood spinlocks (it wouldn't be the first >> time), as long as the lock is never taken in interrupt time, it >> should be ok to hold the lock, and wait for an interrupt before >> clearing the lock. >> > > > The problem is that sleeping is defined in the linux kernel as > meaning waiting on a mutual exclusion > primitive (like semaphore) that puts the current thread to sleep. > It is not legal to sleep with a spinlock held. > In the phy_read code you do: > spin_lock_bh(&bus->mdio_lock); > retval = bus->read(bus, phydev->addr, regnum); > spin_unlock_bh(&bus->mdio_lock); > > If the bus->read function were to do something like start a request > and wait on a semaphore, then > you would be sleeping with a spin lock held. So bus->read can not > sleep! (as sleep is defined in the > linux kernel). Hmm... I understand this reasoning, but I still need a way for a bus read to wait for an interrupt before returning. I suppose I can just have the code spin while it waits, but that seems wrong, somehow. I'm open to any suggestions. From gwingerde@home.nl Wed Jun 1 20:55:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 20:55:50 -0700 (PDT) Received: from smtpq1.home.nl (smtpq1.home.nl [213.51.128.196]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j523thXq012188 for ; Wed, 1 Jun 2005 20:55:46 -0700 Received: from [213.51.128.134] (port=59856 helo=smtp3.home.nl) by smtpq1.home.nl with esmtp (Exim 4.30) id 1Ddgn8-0007D7-Al; Thu, 02 Jun 2005 05:54:46 +0200 Received: from cc10088-a.ensch1.ov.home.nl ([217.123.128.105]:59933 helo=[192.168.14.1]) by smtp3.home.nl with esmtp (Exim 4.30) id 1Ddgn6-00071G-DC; Thu, 02 Jun 2005 05:54:44 +0200 Message-ID: <429E8175.7010609@home.nl> Date: Thu, 02 Jun 2005 05:48:05 +0200 From: Gertjan van Wingerde User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050322) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Stephen Hemminger CC: netdev@oss.sgi.com, jgarzik@pobox.com Subject: Re: [PATCH] ieee80211: Update generic definitions to latest specs. References: <429E1FAB.6080503@home.nl> <20050601143734.3b7a49ca@dxpl.pdx.osdl.net> In-Reply-To: <20050601143734.3b7a49ca@dxpl.pdx.osdl.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AtHome-MailScanner-Information: Neem contact op met support@home.nl voor meer informatie X-AtHome-MailScanner: Found to be clean X-archive-position: 1958 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gwingerde@home.nl Precedence: bulk X-list: netdev Stephen Hemminger wrote: >On Wed, 01 Jun 2005 22:50:51 +0200 >Gertjan van Wingerde wrote: > > > >>Hi, >> >>Attached patch updates the definitions of the generic ieee80211 stack to >>the latest versions of the published 802.11x specification suite. >>Please review and apply. >> >>Signed-off-by: Gertjan van Wingerde >> >> >> >Could you change the elements that fix to be enum's instead of define's > >example: > >/* Management Frame Information Element Types */ >enum ieee80211_mfie { > MFIE_TYPE_SSID = 0, > MFIE_TYPE_RATES = 1, > MFIE_TYPE_FH_SET= 2, >... > Hi Stephen, Well, my patch is really just an add-on to the existing code. Converting to enums is really a follow-up patch that can be applied on top of this one. I'm happy to produce a patch if everybody agrees. Jeff, any opinions on this? Best regards, Gertjan. From raghunathan.venkatesan@wipro.com Wed Jun 1 20:54:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 01 Jun 2005 20:54:52 -0700 (PDT) Received: from wip-ec-wd.wipro.com (wip-ec-wd.wipro.com [203.101.113.39]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j523sjXq012029 for ; Wed, 1 Jun 2005 20:54:48 -0700 Received: from wip-ec-wd.wipro.com (localhost.wipro.com [127.0.0.1]) by localhost (Postfix) with ESMTP id CA4D8205E7; Thu, 2 Jun 2005 09:14:50 +0530 (IST) Received: from blr-ec-bh01.wipro.com (unknown [10.201.50.91]) by wip-ec-wd.wipro.com (Postfix) with ESMTP id B055A205E5; Thu, 2 Jun 2005 09:14:50 +0530 (IST) Received: from chn-snr-bh2.wipro.com ([10.145.50.92]) by blr-ec-bh01.wipro.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 2 Jun 2005 09:23:44 +0530 Received: from CHN-SNR-MBX01.wipro.com ([10.145.50.181]) by chn-snr-bh2.wipro.com with Microsoft SMTPSVC(6.0.3790.0); Thu, 2 Jun 2005 09:23:43 +0530 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: Unable to handle kernel paging request at virtual address 04000460 Date: Thu, 2 Jun 2005 09:20:21 +0530 Message-ID: <438662DA48DCAA41B1DF648BD4BD76C0E461B8@CHN-SNR-MBX01.wipro.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Unable to handle kernel paging request at virtual address 04000460 Thread-Index: AcVm23XgN0MUhNi8RfmehJZdjhz+YAASlBxQ From: To: Cc: , , X-OriginalArrivalTime: 02 Jun 2005 03:53:43.0508 (UTC) FILETIME=[AB960140:01C56726] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j523sjXq012029 X-archive-position: 1957 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: raghunathan.venkatesan@wipro.com Precedence: bulk X-list: netdev Hi David, I understand that the linux community may not be able to debug it for me. All I require is if people have seen similar problems (the problems we face are w.r.t to kfree_skb and skb_drop_fraglist crashing due to some reason, which could be a Memory Management issue or some thing we are not aware of), then let us know the patches, so that we can try them out here. Thankyou for your response. Regards, Raghu -----Original Message----- From: David S. Miller [mailto:davem@davemloft.net] Sent: Thursday, June 02, 2005 12:25 AM To: Raghunathan Venkatesan (WT01 - EMBEDDED & PRODUCT ENGINEERING SOLUTIONS) Cc: linux-net@vger.kernel.org; netdev@oss.sgi.com; linux@der-keiler.de Subject: Re: Unable to handle kernel paging request at virtual address 04000460 Please don't ask the community to debug your custom kernel with private VPN driver modules installed. From herbert@gondor.apana.org.au Thu Jun 2 02:45:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 02:45:22 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j529jEXq000794 for ; Thu, 2 Jun 2005 02:45:15 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DdmFD-0007y2-00; Thu, 02 Jun 2005 19:44:07 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DdmFA-0006o5-00; Thu, 02 Jun 2005 19:44:04 +1000 Date: Thu, 2 Jun 2005 19:44:04 +1000 To: "David S. Miller" , netdev@oss.sgi.com Subject: [IPV4/IPV6] Replace spin_lock_irq with spin_lock_bh Message-ID: <20050602094404.GA10316@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="cWoXeonUoKmBZSoM" Content-Disposition: inline User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 1959 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --cWoXeonUoKmBZSoM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: In light of my recent patch to net/ipv4/udp.c that replaced the spin_lock_irq calls on the receive queue lock with spin_lock_bh, here is a similar patch for all other occurences of spin_lock_irq on receive/error queue locks in IPv4 and IPv6. In these stacks, we know that they can only be entered from user or softirq context. Therefore it's safe to disable BH only. Signed-off-by: Herbert Xu Since this patch simply improves the consistent use of locking primitives rather fixing any real bugs, it should probably go into net-2.6.13. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --cWoXeonUoKmBZSoM Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -360,14 +360,14 @@ int ip_recv_error(struct sock *sk, struc err = copied; /* Reset and regenerate socket error */ - spin_lock_irq(&sk->sk_error_queue.lock); + spin_lock_bh(&sk->sk_error_queue.lock); sk->sk_err = 0; if ((skb2 = skb_peek(&sk->sk_error_queue)) != NULL) { sk->sk_err = SKB_EXT_ERR(skb2)->ee.ee_errno; - spin_unlock_irq(&sk->sk_error_queue.lock); + spin_unlock_bh(&sk->sk_error_queue.lock); sk->sk_error_report(sk); } else - spin_unlock_irq(&sk->sk_error_queue.lock); + spin_unlock_bh(&sk->sk_error_queue.lock); out_free_skb: kfree_skb(skb); diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -691,11 +691,11 @@ static int raw_ioctl(struct sock *sk, in struct sk_buff *skb; int amount = 0; - spin_lock_irq(&sk->sk_receive_queue.lock); + spin_lock_bh(&sk->sk_receive_queue.lock); skb = skb_peek(&sk->sk_receive_queue); if (skb != NULL) amount = skb->len; - spin_unlock_irq(&sk->sk_receive_queue.lock); + spin_unlock_bh(&sk->sk_receive_queue.lock); return put_user(amount, (int __user *)arg); } diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -353,14 +353,14 @@ int ipv6_recv_error(struct sock *sk, str err = copied; /* Reset and regenerate socket error */ - spin_lock_irq(&sk->sk_error_queue.lock); + spin_lock_bh(&sk->sk_error_queue.lock); sk->sk_err = 0; if ((skb2 = skb_peek(&sk->sk_error_queue)) != NULL) { sk->sk_err = SKB_EXT_ERR(skb2)->ee.ee_errno; - spin_unlock_irq(&sk->sk_error_queue.lock); + spin_unlock_bh(&sk->sk_error_queue.lock); sk->sk_error_report(sk); } else { - spin_unlock_irq(&sk->sk_error_queue.lock); + spin_unlock_bh(&sk->sk_error_queue.lock); } out_free_skb: diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -434,12 +434,12 @@ csum_copy_err: /* Clear queue. */ if (flags&MSG_PEEK) { int clear = 0; - spin_lock_irq(&sk->sk_receive_queue.lock); + spin_lock_bh(&sk->sk_receive_queue.lock); if (skb == skb_peek(&sk->sk_receive_queue)) { __skb_unlink(skb, &sk->sk_receive_queue); clear = 1; } - spin_unlock_irq(&sk->sk_receive_queue.lock); + spin_unlock_bh(&sk->sk_receive_queue.lock); if (clear) kfree_skb(skb); } @@ -971,11 +971,11 @@ static int rawv6_ioctl(struct sock *sk, struct sk_buff *skb; int amount = 0; - spin_lock_irq(&sk->sk_receive_queue.lock); + spin_lock_bh(&sk->sk_receive_queue.lock); skb = skb_peek(&sk->sk_receive_queue); if (skb != NULL) amount = skb->tail - skb->h.raw; - spin_unlock_irq(&sk->sk_receive_queue.lock); + spin_unlock_bh(&sk->sk_receive_queue.lock); return put_user(amount, (int __user *)arg); } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -300,12 +300,12 @@ csum_copy_err: /* Clear queue. */ if (flags&MSG_PEEK) { int clear = 0; - spin_lock_irq(&sk->sk_receive_queue.lock); + spin_lock_bh(&sk->sk_receive_queue.lock); if (skb == skb_peek(&sk->sk_receive_queue)) { __skb_unlink(skb, &sk->sk_receive_queue); clear = 1; } - spin_unlock_irq(&sk->sk_receive_queue.lock); + spin_unlock_bh(&sk->sk_receive_queue.lock); if (clear) kfree_skb(skb); } --cWoXeonUoKmBZSoM-- From herbert@gondor.apana.org.au Thu Jun 2 02:56:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 02:56:10 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j529u2Xq001566 for ; Thu, 2 Jun 2005 02:56:03 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DdmPm-00084I-00; Thu, 02 Jun 2005 19:55:02 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DdmPj-0007pT-00; Thu, 02 Jun 2005 19:54:59 +1000 Date: Thu, 2 Jun 2005 19:54:59 +1000 To: "David S. Miller" , netdev@oss.sgi.com Subject: [SCTP] Replace spin_lock_irqsave with spin_lock_bh Message-ID: <20050602095459.GA26638@gondor.apana.org.au> References: <20050602094404.GA10316@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="2oS5YaxWCcQjTEyO" Content-Disposition: inline In-Reply-To: <20050602094404.GA10316@gondor.apana.org.au> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 1960 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev --2oS5YaxWCcQjTEyO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Dave: This patch replaces the spin_lock_irqsave call on the receive queue lock in SCTP with spin_lock_bh. Despite the proliferation of spin_lock_irqsave calls in this stack, it is only entered from the IPv4/IPv6 stack and user space. That is, it is never entered from hardirq context. The call in question is only called from recvmsg which means that IRQs aren't disabled. Therefore it is safe to replace it with spin_lock_bh. Signed-off-by: Herbert Xu As before, this should probably only go into net-2.6.13. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --2oS5YaxWCcQjTEyO Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p diff --git a/net/sctp/socket.c b/net/sctp/socket.c --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4368,15 +4368,11 @@ static struct sk_buff *sctp_skb_recv_dat * However, this function was corrent in any case. 8) */ if (flags & MSG_PEEK) { - unsigned long cpu_flags; - - sctp_spin_lock_irqsave(&sk->sk_receive_queue.lock, - cpu_flags); + spin_lock_bh(&sk->sk_receive_queue.lock); skb = skb_peek(&sk->sk_receive_queue); if (skb) atomic_inc(&skb->users); - sctp_spin_unlock_irqrestore(&sk->sk_receive_queue.lock, - cpu_flags); + spin_unlock_bh(&sk->sk_receive_queue.lock); } else { skb = skb_dequeue(&sk->sk_receive_queue); } --2oS5YaxWCcQjTEyO-- From jtbbesaa@bipt106.bi.ehu.es Thu Jun 2 03:40:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 03:40:04 -0700 (PDT) Received: from bipt106.bi.ehu.es (bipt106.bi.ehu.es [158.227.67.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52AdsXq003451 for ; Thu, 2 Jun 2005 03:39:59 -0700 Received: from bipt54.bi.ehu.es ([158.227.75.54] helo=ibook.ziberghetto.dhis.org) by bipt106.bi.ehu.es with esmtp (Exim 3.35 #1 (Debian)) id 1Ddn6I-0002Yr-00; Thu, 02 Jun 2005 12:38:58 +0200 Received: by ibook.ziberghetto.dhis.org (Postfix, from userid 1000) id 1D9BB20F1F; Thu, 2 Jun 2005 12:38:26 +0200 (CEST) From: Alfredo Beaumont Sainz Organization: Euskal Herriko Unibertsitatea To: netdev@oss.sgi.com Subject: Problems with Broadcom and Intel PRO/1000 cards Date: Thu, 2 Jun 2005 12:38:19 +0200 User-Agent: KMail/1.8 MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2076079.6N4Hu5pznk"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200506021238.25615.jtbbesaa@aintel.bi.ehu.es> X-archive-position: 1961 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jtbbesaa@bipt106.bi.ehu.es Precedence: bulk X-list: netdev --nextPart2076079.6N4Hu5pznk Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, I've a dual opteron machine with an integrated dual Broadcom 5704 10/100/10= 00=20 (tg3 driver) and an Intel PRO/1000 MT (e1000 driver). It seems that I canno= t=20 make them work a Gbps. I've a crossover cable connecting a interface of the= =20 Broadcom (eth1) with the Intel (eth2), but they connect at 100Mbps: # /sbin/mii-tool -v eth1: negotiated 100baseTx-FD, link ok product info: vendor 00:08:18, model 25 rev 0 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD eth2: negotiated 100baseTx-FD, link ok product info: vendor 00:50:43, model 2 rev 5 basic mode: autonegotiation enabled basic status: autonegotiation complete, link ok capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control As you can see, there's no 1000 FD advsertising. Forcing it with ethtool ma= kes=20 them lose link connection: # /usr/sbin/ethtool -s eth1 speed 1000 duplex full # /sbin/mii-tool -v eth1: no link product info: vendor 00:08:18, model 25 rev 0 basic mode: autonegotiation enabled basic status: no link capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD eth2: no link product info: vendor 00:50:43, model 2 rev 5 basic mode: autonegotiation enabled basic status: no link capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control After some secs link is recovered, at 100 again, and dmesg shows the follow= ing=20 kernel messages: tg3: eth1: Link is down. e1000: eth2: e1000_watchdog: NIC Link is Down tg3: eth1: Link is up at 1000 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. e1000: eth2: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex According to the messages links would be at 1000 but they are not really. T= he=20 same happens when forcing eth2. I'm using kernel version 2.6.11.11 but it also happened with previous versi= on=20 of the kernel. Any hints? Thanks. =2D-=20 Alfredo Beaumont. GPG: http://aintel.bi.ehu.es/~jtbbesaa/jtbbesaa.gpg.asc Elektronika eta Telekomunikazioak Saila (Ingeniaritza Telematikoa) Euskal Herriko Unibertsitatea, Bilbao (Basque Country). http://www.ehu.es --nextPart2076079.6N4Hu5pznk Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQBCnuGh6KTU/EgLc1ERAgOsAJ9Bs8oPJEelifI+GtiP62cMEfl8ZQCfXxc6 e2z/CGhpOy0qWoXNj22/SMQ= =4LuQ -----END PGP SIGNATURE----- --nextPart2076079.6N4Hu5pznk-- From postman@harrier.cohaesio.com Thu Jun 2 04:34:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 04:34:19 -0700 (PDT) Received: from harrier.cohaesio.com (harrier.cohaesio.com [212.97.128.50]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52BYFXq009978 for ; Thu, 2 Jun 2005 04:34:16 -0700 Received: by harrier.cohaesio.com (Postfix, from userid 88) id 7BF0647; Thu, 2 Jun 2005 13:33:14 +0200 (CEST) X-Original-To: news2mail@news.cohaesio.com Delivered-To: news2mail@news.cohaesio.com From: "Anders K. Pedersen" Subject: Re: Problems with Broadcom and Intel PRO/1000 cards Date: Thu, 02 Jun 2005 13:34:09 +0200 Organization: Cohaesio A/S Lines: 13 Message-ID: References: <200506021238.25615.jtbbesaa@aintel.bi.ehu.es> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: harrier.cohaesio.com 1117711993 26359 212.97.128.136 (2 Jun 2005 11:33:13 GMT) X-Complaints-To: newsmaster@news.cohaesio.com X-Accept-Language: en-us, en To: netdev@oss.sgi.com X-archive-position: 1962 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akp@cohaesio.com Precedence: bulk X-list: netdev Alfredo Beaumont Sainz wrote: > I've a dual opteron machine with an integrated dual Broadcom 5704 10/100/1000 > (tg3 driver) and an Intel PRO/1000 MT (e1000 driver). It seems that I cannot > make them work a Gbps. I've a crossover cable connecting a interface of the > Broadcom (eth1) with the Intel (eth2), but they connect at 100Mbps: > > # /sbin/mii-tool -v mii-tool does not (yet) support more than 100 Mbit/s, so it will report a 1000 Mbit/s connection as only running 100 Mbit/s. Use ethtool for now. Regards, Anders K. Pedersen From bunk@stusta.de Thu Jun 2 05:16:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 05:16:26 -0700 (PDT) Received: from mailout.stusta.mhn.de (emailhub.stusta.mhn.de [141.84.69.5]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j52CGMXq011914 for ; Thu, 2 Jun 2005 05:16:23 -0700 Received: (qmail 16519 invoked from network); 2 Jun 2005 12:15:12 -0000 Received: from r063144.stusta.swh.mhn.de (10.150.63.144) by mailhub.stusta.mhn.de with SMTP; 2 Jun 2005 12:15:12 -0000 Received: by r063144.stusta.swh.mhn.de (Postfix, from userid 1000) id 6DB05BB5F8; Thu, 2 Jun 2005 14:15:11 +0200 (CEST) Date: Thu, 2 Jun 2005 14:15:11 +0200 From: Adrian Bunk To: Andrew Morton , shemminger@osdl.org Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: 2.6.12-rc5-mm2: "bic unavailable using TCP reno" messages Message-ID: <20050602121511.GE4992@stusta.de> References: <20050601022824.33c8206e.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050601022824.33c8206e.akpm@osdl.org> User-Agent: Mutt/1.5.9i X-archive-position: 1963 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@stusta.de Precedence: bulk X-list: netdev On Wed, Jun 01, 2005 at 02:28:24AM -0700, Andrew Morton wrote: >... > Changes since 2.6.12-rc5-mm1: >... > +tcp-tcp_infra.patch >... > Steve Hemminger's TCP enhancements. >... I said "no" to CONFIG_TCP_CONG_BIC, and now my syslog is full of messages kernel: bic unavailable using TCP reno I have no problem with such a message being shown once - but once should be enough. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From hadi@cyberus.ca Thu Jun 2 05:27:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 05:27:55 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52CRkXq012728 for ; Thu, 2 Jun 2005 05:27:47 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1Ddomh-0004kV-UR for netdev@oss.sgi.com; Thu, 02 Jun 2005 08:26:51 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1Ddomg-0006iO-KG; Thu, 02 Jun 2005 08:26:50 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Jon Mason Cc: "David S. Miller" , mitch.a.williams@intel.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, john.ronciak@intel.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <200505311828.44304.jdmason@us.ibm.com> References: <1117241786.6251.7.camel@localhost.localdomain> <200505311707.54487.jdmason@us.ibm.com> <20050531.151443.74564699.davem@davemloft.net> <200505311828.44304.jdmason@us.ibm.com> Content-Type: text/plain Organization: unknown Date: Thu, 02 Jun 2005 08:26:46 -0400 Message-Id: <1117715207.6050.21.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 1964 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Tue, 2005-31-05 at 18:28 -0500, Jon Mason wrote: > On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote: > > From: Jon Mason > > Date: Tue, 31 May 2005 17:07:54 -0500 > > > > > Of course some performace analysis would have to be done to determine the > > > optimal numbers for each speed/duplexity setting per driver. > > > > per cpu speed, per memory bus speed, per I/O bus speed, and add in other > > complications such as NUMA > > > > My point is that whatever experimental number you come up with will be > > good for that driver on your systems, not necessarily for others. > > > > Even within a system, whatever number you select will be the wrong > > thing to use if one starts a continuous I/O stream to the SATA > > controller in the next PCI slot, for example. > > > > We keep getting bitten by this, as the Altix perf data continually shows, > > and we need to absolutely stop thinking this way. > > > > The way to go is to make selections based upon observed events and > > mesaurements. > > I'm not arguing against a /proc entry to tune dev->weight for those sysadmins > advanced enough to do that. I am arguing that we can make the driver smarter > (at little/no cost) for "out of the box" users. > What is the point of making the driver "smarter"? Recall, the algorithm used to schedule the netdevices is based on an extension of Weighted Round Robin from Varghese et al known as DRR (ask gooogle for details). The idea is to provide fairness amongst many drivers. As an example, if you have a gige driver it shouldnt be taking all the resources at the expense of starving the fastether driver. If the admin wants one driver to be more "important" than the other, s/he will make sure it has a higher weight. cheers, jamal From hadi@cyberus.ca Thu Jun 2 06:05:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 06:06:04 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52D5rXq015814 for ; Thu, 2 Jun 2005 06:05:57 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DdpNa-0004ha-Tm for netdev@oss.sgi.com; Thu, 02 Jun 2005 09:04:58 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DdpNY-0004wq-08; Thu, 02 Jun 2005 09:04:56 -0400 Subject: PATCH: explicit typing WAS(Re: PATCH: rtnetlink explicit flags setting From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: tgraf@suug.ch, netdev@oss.sgi.com In-Reply-To: <20050531.153125.95894437.davem@davemloft.net> References: <1117197157.6688.24.camel@localhost.localdomain> <20050531.144338.112623594.davem@davemloft.net> <20050531222646.GK15391@postel.suug.ch> <20050531.153125.95894437.davem@davemloft.net> Content-Type: multipart/mixed; boundary="=-MNGFh9ieSNAM2tZgwH9J" Organization: unknown Date: Thu, 02 Jun 2005 09:04:52 -0400 Message-Id: <1117717493.6050.29.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 X-archive-position: 1965 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev --=-MNGFh9ieSNAM2tZgwH9J Content-Type: text/plain Content-Transfer-Encoding: 7bit On Tue, 2005-31-05 at 15:31 -0700, David S. Miller wrote: > From: Thomas Graf > Date: Wed, 1 Jun 2005 00:26:46 +0200 > > > > Please use explicit "unsigned int flags" instead of "unsigned flags". > > > > I converted this already in the two patches later in the thread. > > I see, thanks for pointing this out. > If you want to do it right, it should be a u16 actually ;-> In any case since we are being gracious - lets fix where i cutnpasted it from using TheLinuxWay ;-> ------------- This patch converts "unsigned flags" to use more explict types like u16 instead and incrementally introduces NLMSG_NEW(). Signed-off-by: Jamal Hadi Salim cheers, jamal --=-MNGFh9ieSNAM2tZgwH9J Content-Disposition: attachment; filename=expl_p Content-Type: text/plain; name=expl_p; charset=UTF-8 Content-Transfer-Encoding: 7bit net/ipv6/addrconf.c: needs update net/sched/act_api.c: needs update net/sched/cls_api.c: needs update net/sched/sch_api.c: needs update Index: net/ipv6/addrconf.c =================================================================== --- faa2ccd541211d62ece040534da95da9476d4f14/net/ipv6/addrconf.c (mode:100644) +++ uncommitted/net/ipv6/addrconf.c (mode:100644) @@ -131,7 +131,7 @@ static int addrconf_ifdown(struct net_device *dev, int how); -static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags); +static void addrconf_dad_start(struct inet6_ifaddr *ifp, u32 flags); static void addrconf_dad_timer(unsigned long data); static void addrconf_dad_completed(struct inet6_ifaddr *ifp); static void addrconf_rs_timer(unsigned long data); @@ -491,7 +491,7 @@ static struct inet6_ifaddr * ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen, - int scope, unsigned flags) + int scope, u32 flags) { struct inet6_ifaddr *ifa = NULL; struct rt6_info *rt; @@ -1319,7 +1319,7 @@ static void addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev, - unsigned long expires, unsigned flags) + unsigned long expires, u32 flags) { struct in6_rtmsg rtmsg; @@ -2228,7 +2228,7 @@ /* * Duplicate Address Detection */ -static void addrconf_dad_start(struct inet6_ifaddr *ifp, int flags) +static void addrconf_dad_start(struct inet6_ifaddr *ifp, u32 flags) { struct inet6_dev *idev = ifp->idev; struct net_device *dev = idev->dev; @@ -2670,7 +2670,7 @@ } static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca, - u32 pid, u32 seq, int event, unsigned flags) + u32 pid, u32 seq, int event, u16 flags) { struct ifaddrmsg *ifm; struct nlmsghdr *nlh; Index: net/sched/act_api.c =================================================================== --- faa2ccd541211d62ece040534da95da9476d4f14/net/sched/act_api.c (mode:100644) +++ uncommitted/net/sched/act_api.c (mode:100644) @@ -428,15 +428,15 @@ static int tca_get_fill(struct sk_buff *skb, struct tc_action *a, u32 pid, u32 seq, - unsigned flags, int event, int bind, int ref) + u16 flags, int event, int bind, int ref) { struct tcamsg *t; struct nlmsghdr *nlh; unsigned char *b = skb->tail; struct rtattr *x; - nlh = NLMSG_PUT(skb, pid, seq, event, sizeof(*t)); - nlh->nlmsg_flags = flags; + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*t), flags); + t = NLMSG_DATA(nlh); t->tca_family = AF_UNSPEC; @@ -669,7 +669,7 @@ } static int tcf_add_notify(struct tc_action *a, u32 pid, u32 seq, int event, - unsigned flags) + u16 flags) { struct tcamsg *t; struct nlmsghdr *nlh; @@ -684,8 +684,7 @@ b = (unsigned char *)skb->tail; - nlh = NLMSG_PUT(skb, pid, seq, event, sizeof(*t)); - nlh->nlmsg_flags = flags; + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*t), flags); t = NLMSG_DATA(nlh); t->tca_family = AF_UNSPEC; Index: net/sched/cls_api.c =================================================================== --- faa2ccd541211d62ece040534da95da9476d4f14/net/sched/cls_api.c (mode:100644) +++ uncommitted/net/sched/cls_api.c (mode:100644) @@ -322,14 +322,13 @@ static int tcf_fill_node(struct sk_buff *skb, struct tcf_proto *tp, unsigned long fh, - u32 pid, u32 seq, unsigned flags, int event) + u32 pid, u32 seq, u16 flags, int event) { struct tcmsg *tcm; struct nlmsghdr *nlh; unsigned char *b = skb->tail; - nlh = NLMSG_PUT(skb, pid, seq, event, sizeof(*tcm)); - nlh->nlmsg_flags = flags; + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*tcm), flags); tcm = NLMSG_DATA(nlh); tcm->tcm_family = AF_UNSPEC; tcm->tcm_ifindex = tp->q->dev->ifindex; Index: net/sched/sch_api.c =================================================================== --- faa2ccd541211d62ece040534da95da9476d4f14/net/sched/sch_api.c (mode:100644) +++ uncommitted/net/sched/sch_api.c (mode:100644) @@ -760,15 +760,14 @@ } static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, - u32 pid, u32 seq, unsigned flags, int event) + u32 pid, u32 seq, u16 flags, int event) { struct tcmsg *tcm; struct nlmsghdr *nlh; unsigned char *b = skb->tail; struct gnet_dump d; - nlh = NLMSG_PUT(skb, pid, seq, event, sizeof(*tcm)); - nlh->nlmsg_flags = flags; + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*tcm), flags); tcm = NLMSG_DATA(nlh); tcm->tcm_family = AF_UNSPEC; tcm->tcm_ifindex = q->dev->ifindex; @@ -997,7 +996,7 @@ static int tc_fill_tclass(struct sk_buff *skb, struct Qdisc *q, unsigned long cl, - u32 pid, u32 seq, unsigned flags, int event) + u32 pid, u32 seq, u16 flags, int event) { struct tcmsg *tcm; struct nlmsghdr *nlh; @@ -1005,8 +1004,7 @@ struct gnet_dump d; struct Qdisc_class_ops *cl_ops = q->ops->cl_ops; - nlh = NLMSG_PUT(skb, pid, seq, event, sizeof(*tcm)); - nlh->nlmsg_flags = flags; + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*tcm), flags); tcm = NLMSG_DATA(nlh); tcm->tcm_family = AF_UNSPEC; tcm->tcm_ifindex = q->dev->ifindex; --=-MNGFh9ieSNAM2tZgwH9J-- From abonilla@linuxwireless.org Thu Jun 2 06:06:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 06:06:22 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52D6EXq015848 for ; Thu, 2 Jun 2005 06:06:15 -0700 Received: from WCRSJO2KPAB047 ([200.9.49.66]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with SMTP id j52D4vgC001796; Thu, 2 Jun 2005 09:04:58 -0400 Reply-To: From: "Alejandro Bonilla" To: "'Alfredo Beaumont Sainz'" , Subject: RE: Problems with Broadcom and Intel PRO/1000 cards Date: Thu, 2 Jun 2005 07:04:43 -0600 Message-ID: <001c01c56773$a5684060$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <200506021238.25615.jtbbesaa@aintel.bi.ehu.es> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478 Importance: Normal X-archive-position: 1966 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev > Hi, > > I've a dual opteron machine with an integrated dual Broadcom > 5704 10/100/1000 > (tg3 driver) and an Intel PRO/1000 MT (e1000 driver). It > seems that I cannot > make them work a Gbps. I've a crossover cable connecting a > interface of the > Broadcom (eth1) with the Intel (eth2), but they connect at 100Mbps: > Only time that I have seen this before, it was because I was using an incorrect cable. Make sure you have the _REAL_ Gb crossover cable. http://logout.sh/computers/net/gigabit/ Also, I would trust in dmesg and not in some other tool. .Alejandro From baruch@ev-en.org Thu Jun 2 06:59:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 06:59:26 -0700 (PDT) Received: from galon.ev-en.org (rrcs-24-123-59-149.central.biz.rr.com [24.123.59.149]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52DxNXq020273 for ; Thu, 2 Jun 2005 06:59:24 -0700 Received: by galon.ev-en.org (Postfix, from userid 105) id 9282711A953; Thu, 2 Jun 2005 16:58:24 +0300 (IDT) Received: from [10.220.3.66] (hamilton.nuim.ie [149.157.192.252]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by galon.ev-en.org (Postfix) with ESMTP id 9DB9D11A952; Thu, 2 Jun 2005 16:58:21 +0300 (IDT) Message-ID: <429F1079.5070701@ev-en.org> Date: Thu, 02 Jun 2005 14:58:17 +0100 From: Baruch Even User-Agent: Debian Thunderbird 1.0.2 (X11/20050331) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Adrian Bunk Cc: Andrew Morton , shemminger@osdl.org, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 2.6.12-rc5-mm2: "bic unavailable using TCP reno" messages References: <20050601022824.33c8206e.akpm@osdl.org> <20050602121511.GE4992@stusta.de> In-Reply-To: <20050602121511.GE4992@stusta.de> X-Enigmail-Version: 0.91.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-archive-position: 1967 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: baruch@ev-en.org Precedence: bulk X-list: netdev Adrian Bunk wrote: > On Wed, Jun 01, 2005 at 02:28:24AM -0700, Andrew Morton wrote: > >>... >>Changes since 2.6.12-rc5-mm1: >>... >>+tcp-tcp_infra.patch >>... >> Steve Hemminger's TCP enhancements. >>... > > > I said "no" to CONFIG_TCP_CONG_BIC, and now my syslog is full of messages > kernel: bic unavailable using TCP reno > > I have no problem with such a message being shown once - but once should > be enough. The best solution for this would be to check the available protocols at setup time and not at connection creation time. This would also provide a better feedback to the user, since he will either see that what he set was taken, or it wasn't. In the current mechanism you can set the protocol to 'foo' and it will show back as 'foo'. You'll get complaints only once a connection is attempted with this protocol. It does mean some extra work in the sysctl stage, but it's better IMO to do it there rather than at connection setup time. Baruch From hadi@cyberus.ca Thu Jun 2 07:13:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 07:14:02 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52EDsXq021395 for ; Thu, 2 Jun 2005 07:13:59 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DdqRN-0007RQ-AU for netdev@oss.sgi.com; Thu, 02 Jun 2005 08:12:57 -0600 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1Ddpp0-0002iM-B1; Thu, 02 Jun 2005 09:33:18 -0400 Subject: Re: [PATCH 3/4] [NEIGH] neighbour table configuration and statistics via rtnetlink From: jamal Reply-To: hadi@cyberus.ca To: Thomas Graf Cc: "David S. Miller" , netdev@oss.sgi.com In-Reply-To: <20050531161315.GH15391@postel.suug.ch> References: <20050527151608.GZ15391@postel.suug.ch> <1117209411.6383.104.camel@localhost.localdomain> <20050527163516.GB15391@postel.suug.ch> <1117244567.6251.34.camel@localhost.localdomain> <20050528120731.GP15391@postel.suug.ch> <1117533847.6134.32.camel@localhost.localdomain> <20050531114251.GC15391@postel.suug.ch> <1117543711.6134.48.camel@localhost.localdomain> <20050531131747.GF15391@postel.suug.ch> <1117551561.6279.2.camel@localhost.localdomain> <20050531161315.GH15391@postel.suug.ch> Content-Type: text/plain Organization: unknown Date: Thu, 02 Jun 2005 09:33:15 -0400 Message-Id: <1117719195.6050.54.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 1969 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev On Tue, 2005-31-05 at 18:13 +0200, Thomas Graf wrote: [..] > So what I propose is to have the neighbour table parameters, > e.g. everything in arp_tbl be distributed over RTM_NEIGHTBL > and put the device specific parameters into devconfig, > e.g. in_dev->arp_parms. > Right, this is what i am saying a well. The only caveat i was pointing out is that the devconfig piece is more than just the neighbor stuff - and of course it hasnt been written, yet;-> The major challenge will be events - some change via /proc, sysfs etc should generate event. I suggest something along usage of notifier_block with something like NETDEV_CONFIG to transport these things around. Damn, if only i can find my patch .... I had already started doing events based on changes from /proc or sysctl etc. > Absolutely, more specific: > > netdevice -> inet_device -> parameter set -> neighbour table > or: > neighbour table -> list of parameter sets -> netdevice > > both ways are possible right now. Sounds good to me. cheers, jamal From jtbbesaa@bipt106.bi.ehu.es Thu Jun 2 07:13:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 07:13:34 -0700 (PDT) Received: from bipt106.bi.ehu.es (bipt106.bi.ehu.es [158.227.67.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52EDPXq021320 for ; Thu, 2 Jun 2005 07:13:28 -0700 Received: from bipt54.bi.ehu.es ([158.227.75.54] helo=ibook.ziberghetto.dhis.org) by bipt106.bi.ehu.es with esmtp (Exim 3.35 #1 (Debian)) id 1DdqQu-0005HX-00 for ; Thu, 02 Jun 2005 16:12:28 +0200 Received: by ibook.ziberghetto.dhis.org (Postfix, from userid 1000) id 04FA121151; Thu, 2 Jun 2005 16:11:55 +0200 (CEST) From: Alfredo Beaumont Sainz Organization: Euskal Herriko Unibertsitatea To: netdev@oss.sgi.com Subject: Re: Problems with Broadcom and Intel PRO/1000 cards Date: Thu, 2 Jun 2005 16:11:42 +0200 User-Agent: KMail/1.8 References: <200506021238.25615.jtbbesaa@aintel.bi.ehu.es> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1279143.OyKeIErFOt"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200506021611.54933.jtbbesaa@aintel.bi.ehu.es> X-archive-position: 1968 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jtbbesaa@bipt106.bi.ehu.es Precedence: bulk X-list: netdev --nextPart1279143.OyKeIErFOt Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Og, 2005eko Ekaren 02a 13:34(e)an, Anders K. Pedersen(e)k idatzi zuen: > Alfredo Beaumont Sainz wrote: > > I've a dual opteron machine with an integrated dual Broadcom 5704 > > 10/100/1000 (tg3 driver) and an Intel PRO/1000 MT (e1000 driver). It > > seems that I cannot make them work a Gbps. I've a crossover cable > > connecting a interface of the Broadcom (eth1) with the Intel (eth2), but > > they connect at 100Mbps: > > > > # /sbin/mii-tool -v > > mii-tool does not (yet) support more than 100 Mbit/s, so it will report > a 1000 Mbit/s connection as only running 100 Mbit/s. Use ethtool for now. Ouch, you are right. They are really working at 1000Mbit/s. I should have=20 checked that. They work with a crossover cable, but I still have problems w= ith=20 the switch. I'll further investigate before posting again. Thanks! =2D-=20 Alfredo Beaumont. GPG: http://aintel.bi.ehu.es/~jtbbesaa/jtbbesaa.gpg.asc Elektronika eta Telekomunikazioak Saila (Ingeniaritza Telematikoa) Euskal Herriko Unibertsitatea, Bilbao (Basque Country). http://www.ehu.es --nextPart1279143.OyKeIErFOt Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQBCnxOq6KTU/EgLc1ERAsbKAJ9U+j2OiPemLbu1oNp/t/T1ijHWDQCeLRho mxzLFdj20GxHxb4LXD7z5pM= =drCD -----END PGP SIGNATURE----- --nextPart1279143.OyKeIErFOt-- From Peter.Kutschera@arcs.ac.at Thu Jun 2 08:51:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 08:51:41 -0700 (PDT) Received: from s0ms2.arc.local (arcmail.arcs.ac.at [62.218.164.36]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52FpUXq031245 for ; Thu, 2 Jun 2005 08:51:31 -0700 Received: from s1ms3.D01.arc.local ([172.24.10.15]) by s0ms2.arc.local with Microsoft SMTPSVC(6.0.3790.0); Thu, 2 Jun 2005 17:50:28 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: R8169 from U.S.Robotics not found by driver Date: Thu, 2 Jun 2005 17:50:28 +0200 Message-ID: <3BDD1137DBC16749ACF2C93F82FCA98DA107D2@s1ms3.D01.arc.local> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: R8169 from U.S.Robotics not found by driver Thread-Index: AcVnisxe/CbtXBnaRKOcSTb93OxXtg== From: "Kutschera Peter" To: "Linux r8169 crew" X-OriginalArrivalTime: 02 Jun 2005 15:50:28.0291 (UTC) FILETIME=[CC6F2130:01C5678A] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j52FpUXq031245 X-archive-position: 1970 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Peter.Kutschera@arcs.ac.at Precedence: bulk X-list: netdev Hello to whoever is out there! I found your e-mail address in r8169.c: MODULE_AUTHOR("Realtek and the Linux r8169 crew "); MODULE_DESCRIPTION("RealTek RTL-8169 Gigabit Ethernet driver"); Maybe you are interested in the following problem? I just bought a new 1000MB NIC from U.S.Robotics since I was thinking there is a driver in kernel 2.6.8. It wasn't. But there is a driver (on the CD and also downloadable from http://www.usr.com/support/product-template.asp?prod=7902 (see linux.exe :-)) And there is also a newer driver in 2.6.11. The different results are: Modprobe r8169 with the driver from 2.6.8 or 2.6.11 simple has no effect - the module is loaded but there is no error message, no eth1 (it's my 2nd network card, eth0 in onboard) and nothing in dmesg :-( I was building and using the driver from U.S.Robotics with 2.6.8 and 2.6.11: pinguc1:~# modprobe r8169 pinguc1:~# dmesg | tail ACPI: PCI interrupt 0000:00:04.0[A] -> GSI 25 (level, low) -> IRQ 193 eth1: Identified chip type is 'RTL8169s/8110s'. eth1: U.S. Robotics 10/100/1000 PCI NIC driver version 2.0 at 0xf89e8000, 00:c0:49:59:28:71, IRQ 193 eth1: Auto-negotiation Enabled. eth1: 1000Mbps Full-duplex operation. pinguc1:~# ifup eth1 pinguc1:~# ping cluster2 PING cluster2 (192.168.1.2) 56(84) bytes of data. 64 bytes from cluster2 (192.168.1.2): icmp_seq=1 ttl=64 time=0.069 ms Fine, isnt' it? NO IT IS NOT :-( It works fine for a wile but when starting to put LOTS OF DATA about this interface: pinguc1:~# dmesg | tail irq 193: nobody cared! [] __report_bad_irq+0x31/0x77 [] note_interrupt+0x4c/0x71 [] __do_IRQ+0xd9/0x121 [] do_IRQ+0x1b/0x28 [] common_interrupt+0x1a/0x20 [] default_idle+0x0/0x29 [] default_idle+0x23/0x29 [] cpu_idle+0x39/0x4e [] start_kernel+0x178/0x17c handlers: [] (rtl8169_interrupt+0x0/0x7e [r8169]) Disabling IRQ #193 No interrupt - No data transfer Maybe some of the following is usefull for you? pinguc1:~# lspci 0000:00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32) 0000:00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge 0000:00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controlle r (rev 02) 0000:00:04.0 Ethernet controller: U.S. Robotics: Unknown device 0116 (rev 10) 0000:00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 0000:00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93) 0000:00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) 0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) 0000:00:0f.3 ISA bridge: ServerWorks CSB5 LPC bridge 0000:00:10.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) 0000:00:10.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) 0000:01:02.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 02) 0000:01:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fu sion-MPT Dual Ultra320 SCSI (rev 07) pinguc1:~# hd /proc/bus/pci/01/04.0 00000000 00 10 30 00 17 01 30 02 07 00 00 01 10 48 00 00 |..0...0......H..| 00000010 01 dc 00 00 04 00 f1 fc 00 00 00 00 04 00 f0 fc |.Ü....ñü......ðü| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 28 10 35 01 |............(.5.| 00000030 00 00 e0 fc 50 00 00 00 00 00 00 00 0b 01 11 12 |..àüP...........| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000050 01 58 02 06 00 00 00 00 05 00 80 00 00 00 00 00 |.X..............| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000100 I am not sure if this is a problem of the USR-hriver or the hardware (Dell PowerEdge 1600). I would like to test your driver but it seems to me that your driver can't find the card. On another PC running the same software (debian sage with 3.6.8 cernel and USR-driver on the other end of the cable) the module from USR seems to work. If you have any tips please let me know. In the meantime i will try another PCI slot and, as iI expect this will not help, an old 3C509. Not the best choice for a linux cluster I think. Thanks Peter -- Dipl.-Ing. Peter Kutschera tel: +43 664 620 7642 http://Peter.Kutschera.at/ mailto:Peter@Kutschera.at From jbenc@suse.cz Thu Jun 2 09:51:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 09:51:38 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52GpZXq001174 for ; Thu, 2 Jun 2005 09:51:36 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 38BA662830C; Thu, 2 Jun 2005 18:50:38 +0200 (CEST) Date: Thu, 2 Jun 2005 18:50:38 +0200 From: Jiri Benc To: Gertjan van Wingerde Cc: netdev@oss.sgi.com, jgarzik@pobox.com, jbohac@suse.cz Subject: Re: [PATCH] ieee80211: Update generic definitions to latest specs. Message-ID: <20050602185038.4fd9dafb@griffin.suse.cz> In-Reply-To: <429E1FAB.6080503@home.nl> References: <429E1FAB.6080503@home.nl> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 1971 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev On Wed, 01 Jun 2005 22:50:51 +0200, Gertjan van Wingerde wrote: > +#define WLAN_STATUS_ASSOC_DENIED_SPECTRUM_MGMT_REQUIRED 22 > +#define WLAN_STATUS_ASSOC_REJECTED_POWER_CAP_UNACCEPTABLE 23 > +#define WLAN_STATUS_ASSOC_REJECTED_SUPP_CHANNELS_UNACCEPTABLE 24 > (...) > +/* 802.11h */ > +#define WLAN_REASON_DISASSOC_POWER_CAP_UNACCEPTABLE 10 > +#define WLAN_REASON_DISASSOC_SUPP_CHANNELS_UNACCEPTABLE 11 Aren't these identifiers a bit too long? It seems to be unpractical to use them. -- Jiri Benc SUSE Labs From shemminger@osdl.org Thu Jun 2 10:32:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 10:32:39 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52HWRXq003829 for ; Thu, 2 Jun 2005 10:32:27 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52HUqjA028494 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 10:30:53 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52HUqCQ005577; Thu, 2 Jun 2005 10:30:52 -0700 Date: Thu, 2 Jun 2005 10:30:52 -0700 From: Stephen Hemminger To: hadi@cyberus.ca Cc: Jon Mason , "David S. Miller" , mitch.a.williams@intel.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, john.ronciak@intel.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050602103052.66f12f21@dxpl.pdx.osdl.net> In-Reply-To: <1117715207.6050.21.camel@localhost.localdomain> References: <1117241786.6251.7.camel@localhost.localdomain> <200505311707.54487.jdmason@us.ibm.com> <20050531.151443.74564699.davem@davemloft.net> <200505311828.44304.jdmason@us.ibm.com> <1117715207.6050.21.camel@localhost.localdomain> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1972 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Thu, 02 Jun 2005 08:26:46 -0400 jamal wrote: > On Tue, 2005-31-05 at 18:28 -0500, Jon Mason wrote: > > On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote: > > > From: Jon Mason > > > Date: Tue, 31 May 2005 17:07:54 -0500 > > > > > > > Of course some performace analysis would have to be done to determine the > > > > optimal numbers for each speed/duplexity setting per driver. > > > > > > per cpu speed, per memory bus speed, per I/O bus speed, and add in other > > > complications such as NUMA > > > > > > My point is that whatever experimental number you come up with will be > > > good for that driver on your systems, not necessarily for others. > > > > > > Even within a system, whatever number you select will be the wrong > > > thing to use if one starts a continuous I/O stream to the SATA > > > controller in the next PCI slot, for example. > > > > > > We keep getting bitten by this, as the Altix perf data continually shows, > > > and we need to absolutely stop thinking this way. > > > > > > The way to go is to make selections based upon observed events and > > > mesaurements. > > > > I'm not arguing against a /proc entry to tune dev->weight for those sysadmins > > advanced enough to do that. I am arguing that we can make the driver smarter > > (at little/no cost) for "out of the box" users. > > > > What is the point of making the driver "smarter"? > Recall, the algorithm used to schedule the netdevices is based on an > extension of Weighted Round Robin from Varghese et al known as DRR (ask > gooogle for details). > The idea is to provide fairness amongst many drivers. As an example, if > you have a gige driver it shouldnt be taking all the resources at the > expense of starving the fastether driver. > If the admin wants one driver to be more "important" than the other, > s/he will make sure it has a higher weight. > In fact, since the default weighting should be based on the amount of cpu time expended per frame rather than link speed. The point is that a more "heavy weight" driver shouldn't starve out all the others. From shemminger@osdl.org Thu Jun 2 10:39:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 10:39:11 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52Hd6Xq004550 for ; Thu, 2 Jun 2005 10:39:06 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52Hc6jA029209 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 10:38:06 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52Hc5BT006053; Thu, 2 Jun 2005 10:38:05 -0700 Date: Thu, 2 Jun 2005 10:38:05 -0700 From: Stephen Hemminger To: Baruch Even Cc: Adrian Bunk , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 2.6.12-rc5-mm2: "bic unavailable using TCP reno" messages Message-ID: <20050602103805.6beb4f4e@dxpl.pdx.osdl.net> In-Reply-To: <429F1079.5070701@ev-en.org> References: <20050601022824.33c8206e.akpm@osdl.org> <20050602121511.GE4992@stusta.de> <429F1079.5070701@ev-en.org> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1973 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Thu, 02 Jun 2005 14:58:17 +0100 Baruch Even wrote: > Adrian Bunk wrote: > > On Wed, Jun 01, 2005 at 02:28:24AM -0700, Andrew Morton wrote: > > > >>... > >>Changes since 2.6.12-rc5-mm1: > >>... > >>+tcp-tcp_infra.patch > >>... > >> Steve Hemminger's TCP enhancements. > >>... > > > > > > I said "no" to CONFIG_TCP_CONG_BIC, and now my syslog is full of messages > > kernel: bic unavailable using TCP reno > > > > I have no problem with such a message being shown once - but once should > > be enough. > > The best solution for this would be to check the available protocols at > setup time and not at connection creation time. This would also provide > a better feedback to the user, since he will either see that what he set > was taken, or it wasn't. > > In the current mechanism you can set the protocol to 'foo' and it will > show back as 'foo'. You'll get complaints only once a connection is > attempted with this protocol. > > It does mean some extra work in the sysctl stage, but it's better IMO to > do it there rather than at connection setup time. > > Baruch Your right, the sysctl handler should be smarter, but that is not the problem here. The problem is that the default value is set to be BIC to be compatible with earlier kernels. Since 75% of the world isn't smart enough to figure out how to use sysctl, there is a question of what the default should be, and what to do if that is missing. One version had a messy ifdef chain to try and avoid the warning: char sysctl_tcp_congestion_control[] = #if defined(CONFIG_TCP_BIC) "bic" #elif defined(CONFIG_TCP_HTCP) "htcp" #else "reno" #endif ; but that was ugly. Another possibility is putting it in as yet another config value at kernel build time. To suppress the warning repeating, probably the best solution would be rewrite the string if we have to revert to reno. But carefully to avoid SMP issues. This also implies a smarter sysctl string handler for this value as well. P.s: saw your comparison paper, after a little more corroboration I would like to make H-TCP the default. From shemminger@osdl.org Thu Jun 2 10:45:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 10:45:34 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52HjRXq005497 for ; Thu, 2 Jun 2005 10:45:27 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52HiOjA029583 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 10:44:25 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52HiNYe006311; Thu, 2 Jun 2005 10:44:23 -0700 Date: Thu, 2 Jun 2005 10:44:23 -0700 From: Stephen Hemminger To: Cc: , , , Subject: Re: Unable to handle kernel paging request at virtual address 04000460 Message-ID: <20050602104423.2c3825e5@dxpl.pdx.osdl.net> In-Reply-To: <438662DA48DCAA41B1DF648BD4BD76C0E461B8@CHN-SNR-MBX01.wipro.com> References: <438662DA48DCAA41B1DF648BD4BD76C0E461B8@CHN-SNR-MBX01.wipro.com> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1974 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Thu, 2 Jun 2005 09:20:21 +0530 wrote: > Hi David, > I understand that the linux community may not be able to debug it for > me. All I require is if people have seen similar problems (the problems > we face are w.r.t to kfree_skb and skb_drop_fraglist crashing due to > some reason, which could be a Memory Management issue or some thing we > are not aware of), then let us know the patches, so that we can try them > out here. Turn on Debug memory allocations, spinlock debugging, sleep-inside-spinlock checking, and preempt, it will help your debugging. If you are not building your own kernel from source learn how. You are probably freeing memory twice, or not doing ref counting properly or other locking issues. Since it is your code, good luck debugging it, if you want the community help it needs to be open source code that is available for download or be in the kernel.org kernel. From shemminger@osdl.org Thu Jun 2 10:53:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 10:53:44 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52HrcXq006428 for ; Thu, 2 Jun 2005 10:53:38 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52HqcjA030322 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 10:52:39 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52HqcMY006729; Thu, 2 Jun 2005 10:52:38 -0700 Date: Thu, 2 Jun 2005 10:52:38 -0700 From: Stephen Hemminger To: Andrew Morton Cc: John Heffner , netdev@oss.sgi.com Subject: [PATCH] Scalable TCP (cleaned) Message-ID: <20050602105238.69b6bcb3@dxpl.pdx.osdl.net> In-Reply-To: <200505251550.42252.jheffner@psc.edu> References: <200505251550.42252.jheffner@psc.edu> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1975 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Here is a whitespace cleaned up version of John's scaleable TCP patch to go with the other TCP congestion algorithms, to put in -mm. -------- This patch implements Tom Kelly's Scalable TCP congestion control algorithm for the modular framework. The algorithm has some nice scaling properties, and has been used a fair bit in research, though is known to have significant fairness issues, so it's not really suitable for general purpose use. Signed-off-by: John Heffner Index: 2.6.12-rc5-tcp3/net/ipv4/Makefile =================================================================== --- 2.6.12-rc5-tcp3.orig/net/ipv4/Makefile +++ 2.6.12-rc5-tcp3/net/ipv4/Makefile @@ -35,6 +35,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_high obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o +obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \ xfrm4_output.o Index: 2.6.12-rc5-tcp3/net/ipv4/tcp_scalable.c =================================================================== --- /dev/null +++ 2.6.12-rc5-tcp3/net/ipv4/tcp_scalable.c @@ -0,0 +1,68 @@ +/* Tom Kelly's Scalable TCP + * + * See htt://www-lce.eng.cam.ac.uk/~ctk21/scalable/ + * + * John Heffner + */ + +#include +#include +#include + +/* These factors derived from the recommended values in the paper: + * .01 and and 7/8. We use 50 instead of 100 to account for + * delayed ack. + */ +#define TCP_SCALABLE_AI_CNT 50U +#define TCP_SCALABLE_MD_SCALE 3 + +static void tcp_scalable_cong_avoid(struct tcp_sock *tp, u32 ack, u32 rtt, + u32 in_flight, int flag) +{ + if (in_flight < tp->snd_cwnd) + return; + + if (tp->snd_cwnd <= tp->snd_ssthresh) { + tp->snd_cwnd++; + } else { + tp->snd_cwnd_cnt++; + if (tp->snd_cwnd_cnt > min(tp->snd_cwnd, TCP_SCALABLE_AI_CNT)){ + tp->snd_cwnd++; + tp->snd_cwnd_cnt = 0; + } + } + tp->snd_cwnd = min_t(u32, tp->snd_cwnd, tp->snd_cwnd_clamp); + tp->snd_cwnd_stamp = tcp_time_stamp; +} + +static u32 tcp_scalable_ssthresh(struct tcp_sock *tp) +{ + return max(tp->snd_cwnd - (tp->snd_cwnd>>TCP_SCALABLE_MD_SCALE), 2U); +} + + +static struct tcp_congestion_ops tcp_scalable = { + .ssthresh = tcp_scalable_ssthresh, + .cong_avoid = tcp_scalable_cong_avoid, + .min_cwnd = tcp_reno_min_cwnd, + + .owner = THIS_MODULE, + .name = "scalable", +}; + +static int __init tcp_scalable_register(void) +{ + return tcp_register_congestion_control(&tcp_scalable); +} + +static void __exit tcp_scalable_unregister(void) +{ + tcp_unregister_congestion_control(&tcp_scalable); +} + +module_init(tcp_scalable_register); +module_exit(tcp_scalable_unregister); + +MODULE_AUTHOR("John Heffner"); +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Scalable TCP"); Index: 2.6.12-rc5-tcp3/net/ipv4/Kconfig =================================================================== --- 2.6.12-rc5-tcp3.orig/net/ipv4/Kconfig +++ 2.6.12-rc5-tcp3/net/ipv4/Kconfig @@ -481,6 +481,15 @@ config TCP_CONG_VEGAS window. TCP Vegas should provide less packet loss, but it is not as aggressive as TCP Reno. +config TCP_CONG_SCALABLE + tristate "Scalable TCP" + depends on EXPERIMENTAL + default n + ---help--- + Scalable TCP is a sender-side only change to TCP which uses a + MIMD congestion control algorithm which has some nice scaling + properties, though is known to have fairness issues. + See http://www-lce.eng.cam.ac.uk/~ctk21/scalable/ endmenu From shemminger@osdl.org Thu Jun 2 11:15:43 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 11:15:47 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52IFgXq008328 for ; Thu, 2 Jun 2005 11:15:43 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52IEbjA000394 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 11:14:38 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52IEbtP008613; Thu, 2 Jun 2005 11:14:37 -0700 Date: Thu, 2 Jun 2005 11:14:37 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: Mitch Williams , netdev@oss.sgi.com, john.ronciak@intel.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: [PATCH] net: allow controlling NAPI weight with sysfs Message-ID: <20050602111437.1c492138@dxpl.pdx.osdl.net> In-Reply-To: References: Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1976 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Simple interface to allow changing network device scheduling weight with sysfs. Please consider this for 2.6.12, since risk/impact is small. Signed-off-by: Stephen Hemminger Index: napi-sysfs/net/core/net-sysfs.c =================================================================== --- napi-sysfs.orig/net/core/net-sysfs.c +++ napi-sysfs/net/core/net-sysfs.c @@ -184,6 +184,22 @@ static ssize_t store_tx_queue_len(struct static CLASS_DEVICE_ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len, store_tx_queue_len); +NETDEVICE_SHOW(weight, fmt_ulong); + +static int change_weight(struct net_device *net, unsigned long new_weight) +{ + net->weight = new_weight; + return 0; +} + +static ssize_t store_weight(struct class_device *dev, const char *buf, size_t len) +{ + return netdev_store(dev, buf, len, change_weight); +} + +static CLASS_DEVICE_ATTR(weight, S_IRUGO | S_IWUSR, show_weight, + store_weight); + static struct class_device_attribute *net_class_attributes[] = { &class_device_attr_ifindex, @@ -193,6 +209,7 @@ static struct class_device_attribute *ne &class_device_attr_features, &class_device_attr_mtu, &class_device_attr_flags, + &class_device_attr_weight, &class_device_attr_type, &class_device_attr_address, &class_device_attr_broadcast, From shemminger@osdl.org Thu Jun 2 11:20:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 11:20:13 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52IKAXq008963 for ; Thu, 2 Jun 2005 11:20:10 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52IJ9jA001475 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 11:19:10 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52IJ97X009167; Thu, 2 Jun 2005 11:19:09 -0700 Date: Thu, 2 Jun 2005 11:19:09 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: Mitch Williams , netdev@oss.sgi.com Subject: [PATCH] net: fix sysctl_ Message-ID: <20050602111909.63ef419a@dxpl.pdx.osdl.net> In-Reply-To: References: Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1977 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Changing the sysctl net.core.dev_weight has no effect because the weight of the backlog devices is set during initialization and never changed. This patch propagates any changes to the global value affected by sysctl to the per-cpu devices. It is done every time the packet handler function is run. Signed-off-by: Stephen Hemminger Index: skge-0.8/net/core/dev.c =================================================================== --- skge-0.8.orig/net/core/dev.c +++ skge-0.8/net/core/dev.c @@ -1732,6 +1732,7 @@ static int process_backlog(struct net_de struct softnet_data *queue = &__get_cpu_var(softnet_data); unsigned long start_time = jiffies; + backlog_dev->weight = weight_p; for (;;) { struct sk_buff *skb; struct net_device *dev; From romieu@fr.zoreil.com Thu Jun 2 11:36:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 11:36:10 -0700 (PDT) Received: from fr.zoreil.com (electric-eye.fr.zoreil.com [213.41.134.224]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52Ia4Xq010316 for ; Thu, 2 Jun 2005 11:36:05 -0700 Received: from electric-eye.fr.zoreil.com (localhost.localdomain [127.0.0.1]) by fr.zoreil.com (8.13.1/8.12.1) with ESMTP id j52IZ24i006169; Thu, 2 Jun 2005 20:35:02 +0200 Received: (from romieu@localhost) by electric-eye.fr.zoreil.com (8.13.1/8.13.1/Submit) id j52IYuZT006168; Thu, 2 Jun 2005 20:34:56 +0200 Date: Thu, 2 Jun 2005 20:34:56 +0200 From: Francois Romieu To: Kutschera Peter Cc: Linux r8169 crew , jgarzik@pobox.com Subject: Re: R8169 from U.S.Robotics not found by driver Message-ID: <20050602183456.GA5606@electric-eye.fr.zoreil.com> References: <3BDD1137DBC16749ACF2C93F82FCA98DA107D2@s1ms3.D01.arc.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3BDD1137DBC16749ACF2C93F82FCA98DA107D2@s1ms3.D01.arc.local> User-Agent: Mutt/1.4.1i X-Organisation: Land of Sunshine Inc. X-Subliminal-Message: Merge the r8169 driver in mainline X-archive-position: 1978 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: romieu@fr.zoreil.com Precedence: bulk X-list: netdev Kutschera Peter : [...] > If you have any tips please let me know. Upgrade to (at your option): - 2.6.12-rc5 + Jeff Garzik's r8169 git branch; - 2.6.12-rc5-mm2. Both contain the latest r8169 driver. It will handle USR hardware. If you manage to kill it, please report it. Would your setup allow to test the driver in the Mpps range by any luck ? -- Ueimor From gwingerde@home.nl Thu Jun 2 12:03:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 12:03:29 -0700 (PDT) Received: from smtpq3.home.nl (smtpq3.home.nl [213.51.128.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52J3QXq012315 for ; Thu, 2 Jun 2005 12:03:27 -0700 Message-Id: <200506021903.j52J3QXq012315@oss.sgi.com> Received: from [213.51.128.134] (port=56855 helo=smtp3.home.nl) by smtpq3.home.nl with esmtp (Exim 4.30) id 1DduxW-0005WB-6W; Thu, 02 Jun 2005 21:02:26 +0200 Received: from [10.100.3.12] (port=33042 helo=mail.home.nl) by smtp3.home.nl with smtp (Exim 4.30) id 1DduxU-00011G-VC; Thu, 02 Jun 2005 21:02:24 +0200 X-Mailer: Openwave WebEngine, version 2.8.12 (webedge20-101-197-20030912) X-Originating-IP: [213.84.184.98] From: To: Jiri Benc CC: , , Subject: Antw: Re: [PATCH] ieee80211: Update generic definitions to latest specs. Date: Thu, 2 Jun 2005 21:02:24 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-AtHome-MailScanner-Information: Neem contact op met support@home.nl voor meer informatie X-AtHome-MailScanner: Found to be clean X-archive-position: 1979 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gwingerde@home.nl Precedence: bulk X-list: netdev Content-Length: 724 Lines: 22 On Thu, 02 Jun 2005, Jiri Benc wrote: > On Wed, 01 Jun 2005 22:50:51 +0200, Gertjan van Wingerde wrote: > > +#define WLAN_STATUS_ASSOC_DENIED_SPECTRUM_MGMT_REQUIRED 22 > > +#define WLAN_STATUS_ASSOC_REJECTED_POWER_CAP_UNACCEPTABLE 23 > > +#define WLAN_STATUS_ASSOC_REJECTED_SUPP_CHANNELS_UNACCEPTABLE 24 > > (...) > > +/* 802.11h */ > > +#define WLAN_REASON_DISASSOC_POWER_CAP_UNACCEPTABLE 10 > > +#define WLAN_REASON_DISASSOC_SUPP_CHANNELS_UNACCEPTABLE 11 > > Aren't these identifiers a bit too long? It seems to be unpractical to use > them. > I was thinking about that too, but couldn't find a proper shorter version without losing the descriptive meaning. Do you have any suggestions to shorten them? BR, Gertjan From davem@davemloft.net Thu Jun 2 13:07:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:07:58 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52K7sXq019659 for ; Thu, 2 Jun 2005 13:07:54 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Ddvxo-0001uE-Nv; Thu, 02 Jun 2005 13:06:48 -0700 Date: Thu, 02 Jun 2005 13:06:48 -0700 (PDT) Message-Id: <20050602.130648.75428139.davem@davemloft.net> To: bunk@stusta.de Cc: akpm@osdl.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, yoshfuji@linux-ipv6.org Subject: Re: [2.6 patch] net/ipv6/ipv6_syms.c: unexport fl6_sock_lookup From: "David S. Miller" In-Reply-To: <20050530205653.GZ10441@stusta.de> References: <20050530205653.GZ10441@stusta.de> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1981 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 254 Lines: 9 From: Adrian Bunk Date: Mon, 30 May 2005 22:56:53 +0200 > There is no usage of this EXPORT_SYMBOL in the kernel. > > Signed-off-by: Adrian Bunk > Acked-by: Hideaki YOSHIFUJI Applied, thanks. From davem@davemloft.net Thu Jun 2 13:03:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:03:51 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52K3lXq019130 for ; Thu, 2 Jun 2005 13:03:47 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Ddvth-0001s9-Sa; Thu, 02 Jun 2005 13:02:33 -0700 Date: Thu, 02 Jun 2005 13:02:33 -0700 (PDT) Message-Id: <20050602.130233.59653068.davem@davemloft.net> To: bunk@stusta.de Cc: ja@ssi.bg, wensong@LinuxVirtualServer.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [2.6 patch] remove net/ipv4/ipvs/ip_vs_proto_icmp.c? From: "David S. Miller" In-Reply-To: <20050515132906.GW16549@stusta.de> References: <20050513041622.GE3603@stusta.de> <20050515132906.GW16549@stusta.de> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1980 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 189 Lines: 8 From: Adrian Bunk Date: Sun, 15 May 2005 15:29:06 +0200 > ip_vs_proto_icmp.c was never finished. > > Signed-off-by: Adrian Bunk Applied, thanks Adrian. From bunk@stusta.de Thu Jun 2 13:08:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:08:13 -0700 (PDT) Received: from mailout.stusta.mhn.de (mailout.stusta.mhn.de [141.84.69.5]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j52K81Xq019698 for ; Thu, 2 Jun 2005 13:08:02 -0700 Received: (qmail 32434 invoked from network); 2 Jun 2005 20:07:04 -0000 Received: from r063144.stusta.swh.mhn.de (10.150.63.144) by mailout.stusta.mhn.de with SMTP; 2 Jun 2005 20:07:04 -0000 Received: by r063144.stusta.swh.mhn.de (Postfix, from userid 1000) id 67F5ABBFA9; Thu, 2 Jun 2005 22:07:02 +0200 (CEST) Date: Thu, 2 Jun 2005 22:07:02 +0200 From: Adrian Bunk To: Andrew Morton , jkmaline@cc.hut.fi, jgarzik@pobox.com Cc: linux-kernel@vger.kernel.org, hostap@shmoo.com, netdev@oss.sgi.com Subject: [-mm patch] fix recursive IPW2200 dependencies Message-ID: <20050602200701.GG4992@stusta.de> References: <20050601022824.33c8206e.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050601022824.33c8206e.akpm@osdl.org> User-Agent: Mutt/1.5.9i X-archive-position: 1982 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@stusta.de Precedence: bulk X-list: netdev Content-Length: 999 Lines: 34 On Wed, Jun 01, 2005 at 02:28:24AM -0700, Andrew Morton wrote: >... > Changes since 2.6.12-rc5-mm1: >... > +git-netdev-we18-ieee80211-wifi.patch > > Various things added and merged in netdev land. >... This results in recursive dependencies: - IPW2200 depends on NET_RADIO - IPW2200 selects IEEE80211 - IEEE80211 selects NET_RADIO This patch fixes the IPW2200 dependencies in a way that they are similar to the IPW2100 dependencies. Signed-off-by: Adrian Bunk --- linux-2.6.12-rc5-mm2-full/drivers/net/wireless/Kconfig.old 2005-06-02 22:04:02.000000000 +0200 +++ linux-2.6.12-rc5-mm2-full/drivers/net/wireless/Kconfig 2005-06-02 22:04:40.000000000 +0200 @@ -192,9 +192,8 @@ config IPW2200 tristate "Intel PRO/Wireless 2200BG and 2915ABG Network Connection" - depends on NET_RADIO && PCI + depends on IEEE80211 && PCI select FW_LOADER - select IEEE80211 ---help--- A driver for the Intel PRO/Wireless 2200BG and 2915ABG Network Connection adapters. From davem@davemloft.net Thu Jun 2 13:14:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:14:22 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52KEJXq021078 for ; Thu, 2 Jun 2005 13:14:20 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Ddw42-0001wh-Vk; Thu, 02 Jun 2005 13:13:15 -0700 Date: Thu, 02 Jun 2005 13:13:14 -0700 (PDT) Message-Id: <20050602.131314.21926883.davem@davemloft.net> To: bunk@stusta.de Cc: akpm@osdl.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [RFC: 2.6 patch] net/ipv4/: possible cleanups From: "David S. Miller" In-Reply-To: <20050530205651.GY10441@stusta.de> References: <20050530205651.GY10441@stusta.de> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1983 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 955 Lines: 28 From: Adrian Bunk Subject: [RFC: 2.6 patch] net/ipv4/: possible cleanups Date: Mon, 30 May 2005 22:56:51 +0200 > This patch contains the following possible cleanups: > - make needlessly global code static > - #if 0 the following unused global function: > - xfrm4_state.c: xfrm4_state_fini > - remove the following unneeded EXPORT_SYMBOL's: > - ip_output.c: ip_finish_output > - ip_output.c: sysctl_ip_default_ttl > - fib_frontend.c: ip_dev_find > - inetpeer.c: inet_peer_idlock > - ip_options.c: ip_options_compile > - ip_options.c: ip_options_undo > - tcp_ipv4.c: sysctl_max_syn_backlog > > Please review which of these changes are correct and which might > conflict with pending patches. Please keep all of the ECN implementation in the tcp_ecn.h header file, even if the routine is only called in one C file. And therefore, please do not remove the tcp_enter_quickack_mode() extern declaration from tcp.h Thanks. From davem@davemloft.net Thu Jun 2 13:15:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:15:35 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52KFWXq021474 for ; Thu, 2 Jun 2005 13:15:32 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Ddw5E-0001xY-Nx; Thu, 02 Jun 2005 13:14:28 -0700 Date: Thu, 02 Jun 2005 13:14:28 -0700 (PDT) Message-Id: <20050602.131428.28787855.davem@davemloft.net> To: bunk@stusta.de Cc: akpm@osdl.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [2.6 patch] net/socket.c: unexport move_addr_to_kernel From: "David S. Miller" In-Reply-To: <20050530205647.GW10441@stusta.de> References: <20050530205647.GW10441@stusta.de> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1984 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 329 Lines: 10 From: Adrian Bunk Date: Mon, 30 May 2005 22:56:47 +0200 > I didn't find any modular usage in the kernel. > > Signed-off-by: Adrian Bunk Yes, but as a part of the socket kernel API, I could definitely see some out-of-tree code legitimately using this interface. Let's keep it around for now. From abonilla@linuxwireless.org Thu Jun 2 13:20:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:20:19 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52KKEXq022260 for ; Thu, 2 Jun 2005 13:20:15 -0700 Received: from WCRSJO2KPAB047 ([200.9.49.66]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with SMTP id j52KJEnE002565; Thu, 2 Jun 2005 16:19:14 -0400 Reply-To: From: "Alejandro Bonilla" To: "'Adrian Bunk'" , "'Andrew Morton'" , , Cc: , , Subject: RE: [-mm patch] fix recursive IPW2200 dependencies Date: Thu, 2 Jun 2005 14:19:10 -0600 Message-ID: <003a01c567b0$56bed860$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <20050602200701.GG4992@stusta.de> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478 Importance: Normal X-archive-position: 1985 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev Content-Length: 1333 Lines: 49 > On Wed, Jun 01, 2005 at 02:28:24AM -0700, Andrew Morton wrote: > >... > > Changes since 2.6.12-rc5-mm1: > >... > > +git-netdev-we18-ieee80211-wifi.patch > > > > Various things added and merged in netdev land. > >... > > This results in recursive dependencies: > - IPW2200 depends on NET_RADIO > - IPW2200 selects IEEE80211 > - IEEE80211 selects NET_RADIO > > > This patch fixes the IPW2200 dependencies in a way that they > are similar > to the IPW2100 dependencies. > > Signed-off-by: Adrian Bunk > > --- > linux-2.6.12-rc5-mm2-full/drivers/net/wireless/Kconfig.old > 2005-06-02 22:04:02.000000000 +0200 > +++ linux-2.6.12-rc5-mm2-full/drivers/net/wireless/Kconfig > 2005-06-02 22:04:40.000000000 +0200 > @@ -192,9 +192,8 @@ > > config IPW2200 > tristate "Intel PRO/Wireless 2200BG and 2915ABG Network > Connection" > - depends on NET_RADIO && PCI > + depends on IEEE80211 && PCI > select FW_LOADER > - select IEEE80211 > ---help--- > A driver for the Intel PRO/Wireless 2200BG and > 2915ABG Network > Connection adapters. I think the normal usage of the name is Intel PRO/Wireless 2200BG/2915ABG Network Connection. I'm just saying this in case that you care about Intel Trademarking or about a more unified usage of the name of the Adapter. maybe this is something silly. .Alejandro From bunk@stusta.de Thu Jun 2 13:39:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:39:28 -0700 (PDT) Received: from mailout.stusta.mhn.de (emailhub.stusta.mhn.de [141.84.69.5]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j52KdMXq023360 for ; Thu, 2 Jun 2005 13:39:23 -0700 Received: (qmail 922 invoked from network); 2 Jun 2005 20:38:25 -0000 Received: from r063144.stusta.swh.mhn.de (10.150.63.144) by mailhub.stusta.mhn.de with SMTP; 2 Jun 2005 20:38:25 -0000 Received: by r063144.stusta.swh.mhn.de (Postfix, from userid 1000) id 942E4AFA78; Thu, 2 Jun 2005 22:38:23 +0200 (CEST) Date: Thu, 2 Jun 2005 22:38:23 +0200 From: Adrian Bunk To: Stephen Hemminger Cc: Baruch Even , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 2.6.12-rc5-mm2: "bic unavailable using TCP reno" messages Message-ID: <20050602203823.GI4992@stusta.de> References: <20050601022824.33c8206e.akpm@osdl.org> <20050602121511.GE4992@stusta.de> <429F1079.5070701@ev-en.org> <20050602103805.6beb4f4e@dxpl.pdx.osdl.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050602103805.6beb4f4e@dxpl.pdx.osdl.net> User-Agent: Mutt/1.5.9i X-archive-position: 1986 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@stusta.de Precedence: bulk X-list: netdev Content-Length: 1619 Lines: 56 On Thu, Jun 02, 2005 at 10:38:05AM -0700, Stephen Hemminger wrote: > On Thu, 02 Jun 2005 14:58:17 +0100 > Baruch Even wrote: > > >... > > Your right, the sysctl handler should be smarter, but that is not the problem here. > The problem is that the default value is set to be BIC to be compatible with earlier kernels. > Since 75% of the world isn't smart enough to figure out how to use sysctl, there is a > question of what the default should be, and what to do if that is missing. > > One version had a messy ifdef chain to try and avoid the warning: > > char sysctl_tcp_congestion_control[] = > #if defined(CONFIG_TCP_BIC) > "bic" > #elif defined(CONFIG_TCP_HTCP) > "htcp" > #else > "reno" > #endif > ; > > but that was ugly. > > Another possibility is putting it in as yet another config value at kernel build time. >... One thing that currently makes all solutions harder (and the #ifdef example above not ugly but simply wrong) is that you allow modular congestion control options for the always static net support. Is this really required? The IO schedulers have a similar problem, and they are using the #ifdef approach for selecting the default. One approach is to actually choose the default using #ifdef's. You could also do any kind of runtime selection, but please don't print the warning more than once. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From mchan@broadcom.com Thu Jun 2 13:55:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 13:55:44 -0700 (PDT) Received: from MMS1.broadcom.com (mms1.broadcom.com [216.31.210.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52KtfXq024547 for ; Thu, 2 Jun 2005 13:55:41 -0700 Received: from 10.10.64.121 by MMS1.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Thu, 02 Jun 2005 13:54:34 -0700 X-Server-Uuid: 146C3151-C1DE-4F71-9D02-C3BE503878DD Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Thu, 2 Jun 2005 13:54:32 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BBU46339; Thu, 2 Jun 2005 13:54:28 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id NAA08432; Thu, 2 Jun 2005 13:54:28 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Thu, 2 Jun 2005 20:54:27 +0000 Received: from rh4 by nt-irva-0741; 02 Jun 2005 12:56:53 -0700 Subject: Re: Locking model for NAPI drivers From: "Michael Chan" To: "David S. Miller" cc: netdev@oss.sgi.com In-Reply-To: <1117661650.4310.62.camel@rh4> References: <20050531.154847.63995530.davem@davemloft.net> <1117658019.4310.58.camel@rh4> <20050601.152134.120445266.davem@davemloft.net> <1117661650.4310.62.camel@rh4> Date: Thu, 02 Jun 2005 12:56:52 -0700 Message-ID: <1117742212.22670.24.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6E81AD802U44899064-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 1987 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 1057 Lines: 27 On Wed, 2005-06-01 at 14:34 -0700, Michael Chan wrote: > On Wed, 2005-06-01 at 15:21 -0700, David S. Miller wrote: > > Since the caller shuts down NAPI ->poll(), after setting the SYNC bit > > we can just check the MAILBOX register, and if a '1' is there just > > return. Does one need to mask out the upper bits of the regiser in > > order to avoid seeing the IRQ tag in such a comparison? > > > No, just check for the value 1 since that's the value we use to disable > interrupts. The value read back will always be 1 if 1 was previously > written to it. > One more race condition: CPU1 CPU2 tg3_poll() __netif_rx_complete() tg3_netif_stop() netif_poll_disable() tg3_full_lock() tg3_irq_quiesce() tg3_restart_ints() BUG_ON(tp->irq_state) This race condition is somewhat harmless but I think we need to take care of it for correctness. Any simple ways to fix it? From john.ronciak@intel.com Thu Jun 2 14:22:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 14:22:46 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52LMRXq026133 for ; Thu, 2 Jun 2005 14:22:27 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j52LK7HM032086; Thu, 2 Jun 2005 21:20:07 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j52LK5go030673; Thu, 2 Jun 2005 21:20:05 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs041.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060214200507391 ; Thu, 02 Jun 2005 14:20:05 -0700 Received: from orsmsx408.amr.corp.intel.com ([192.168.65.52]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 2 Jun 2005 14:19:56 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: RFC: NAPI packet weighting patch Date: Thu, 2 Jun 2005 14:19:55 -0700 Message-ID: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: RFC: NAPI packet weighting patch Thread-Index: AcVnbmGZxxYUID7BQ5qgE6xlxM/aIwASbefA From: "Ronciak, John" To: , "Jon Mason" Cc: "David S. Miller" , "Williams, Mitch A" , , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" X-OriginalArrivalTime: 02 Jun 2005 21:19:56.0276 (UTC) FILETIME=[D3124340:01C567B8] X-Scanned-By: MIMEDefang 2.44 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j52LMRXq026133 X-archive-position: 1988 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john.ronciak@intel.com Precedence: bulk X-list: netdev Content-Length: 4140 Lines: 100 The DRR algorithm assumes a perfect world, where hardware resources are infinite, packets arrive continuously (or separated by very long delays), there are no bus latencies, and CPU speed is infinite. The real world is much messier: hardware starves for resources if it's not serviced quickly enough, packets arrive at inconvenient intervals (especially at 10 and 100 Mbps speeds), and buses and CPUs are slow. Thus, the driver should have the intelligence built into it to make an "intelligent" choice on what the weight should be for that driver/hardware. The calculation in the driver should take into account all the factors that the driver has access to. These include link speed, bus type and speed, processor speed and some amount of actual device FIFO size and latency smarts. The driver would use all of the factors to come up with a weight to prevent it from dropping frames and not to starve out other devices in the system or hinder performance. It seems to us that the driver is the one that know best and should try to come up with a reasonable value for weight based on its own knowledge of the hardware. This has been showing up in our NAPI test data which Mitch is currently scrubbing for release. It shows that there is a need for either better default static weight numbers or for them to be calculated based on some system dynamic variables. We would like to see the latter tried but the only problem is that each driver would have to make its own calculations, and it may not have access to all of the system-wide data it would need to make a proper calculation. Even with a more intelligent driver, we still would like to see some mechanism for the weight to be changed at runtime, such as with Stephen's sysfs patch. This would allow a sysadmin (or user-space app) to tune the system based on statistical data that isn't available to the individual driver. Cheers, John > -----Original Message----- > From: jamal [mailto:hadi@cyberus.ca] > Sent: Thursday, June 02, 2005 5:27 AM > To: Jon Mason > Cc: David S. Miller; Williams, Mitch A; shemminger@osdl.org; > netdev@oss.sgi.com; Robert.Olsson@data.slu.se; Ronciak, John; > Venkatesan, Ganesh; Brandeburg, Jesse > Subject: Re: RFC: NAPI packet weighting patch > > > On Tue, 2005-31-05 at 18:28 -0500, Jon Mason wrote: > > On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote: > > > From: Jon Mason > > > Date: Tue, 31 May 2005 17:07:54 -0500 > > > > > > > Of course some performace analysis would have to be > done to determine the > > > > optimal numbers for each speed/duplexity setting per driver. > > > > > > per cpu speed, per memory bus speed, per I/O bus speed, > and add in other > > > complications such as NUMA > > > > > > My point is that whatever experimental number you come up > with will be > > > good for that driver on your systems, not necessarily for others. > > > > > > Even within a system, whatever number you select will be the wrong > > > thing to use if one starts a continuous I/O stream to the SATA > > > controller in the next PCI slot, for example. > > > > > > We keep getting bitten by this, as the Altix perf data > continually shows, > > > and we need to absolutely stop thinking this way. > > > > > > The way to go is to make selections based upon observed events and > > > mesaurements. > > > > I'm not arguing against a /proc entry to tune dev->weight > for those sysadmins > > advanced enough to do that. I am arguing that we can make > the driver smarter > > (at little/no cost) for "out of the box" users. > > > > What is the point of making the driver "smarter"? > Recall, the algorithm used to schedule the netdevices is based on an > extension of Weighted Round Robin from Varghese et al known > as DRR (ask > gooogle for details). > The idea is to provide fairness amongst many drivers. As an > example, if > you have a gige driver it shouldnt be taking all the resources at the > expense of starving the fastether driver. > If the admin wants one driver to be more "important" than the other, > s/he will make sure it has a higher weight. > > cheers, > jamal > > From shemminger@osdl.org Thu Jun 2 14:33:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 14:33:14 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52LX1Xq027235 for ; Thu, 2 Jun 2005 14:33:01 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52LVQjA018683 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 14:31:26 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52LVQL9019198; Thu, 2 Jun 2005 14:31:26 -0700 Date: Thu, 2 Jun 2005 14:31:26 -0700 From: Stephen Hemminger To: "Ronciak, John" Cc: , "Jon Mason" , "David S. Miller" , "Williams, Mitch A" , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050602143126.7c302cfd@dxpl.pdx.osdl.net> In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> References: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1989 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 2869 Lines: 57 On Thu, 2 Jun 2005 14:19:55 -0700 "Ronciak, John" wrote: > The DRR algorithm assumes a perfect world, where hardware resources are > infinite, packets arrive continuously (or separated by very long > delays), there are no bus latencies, and CPU speed is infinite. > > The real world is much messier: hardware starves for resources if it's > not serviced quickly enough, packets arrive at inconvenient intervals > (especially at 10 and 100 Mbps speeds), and buses and CPUs are slow. > > Thus, the driver should have the intelligence built into it to make an > "intelligent" choice on what the weight should be for that > driver/hardware. The calculation in the driver should take into account > all the factors that the driver has access to. These include link > speed, bus type and speed, processor speed and some amount of actual > device FIFO size and latency smarts. The driver would use all of the > factors to come up with a weight to prevent it from dropping frames and > not to starve out other devices in the system or hinder performance. It > seems to us that the driver is the one that know best and should try to > come up with a reasonable value for weight based on its own knowledge of > the hardware. This is like saying each CPU vendor should write their own process scheduler for Linux. Now with NUMA and HT, it is getting almost that bad but we still try and keep it CPU neutral. For networking the problem is worse, the "right" choice depends on workload and relationship between components in the system. I can't see how you could ever expect a driver specific solution. > This has been showing up in our NAPI test data which Mitch is currently > scrubbing for release. It shows that there is a need for either better > default static weight numbers or for them to be calculated based on some > system dynamic variables. We would like to see the latter tried but the > only problem is that each driver would have to make its own > calculations, and it may not have access to all of the system-wide data > it would need to make a proper calculation. And for other workloads, and other systems (think about the Altix with long access latencies), your numbers will be wrong. Perhaps we need to quit trying for a perfect solution and just get a "good enough" one that works. Let's keep the intelligence out of the driver. Most of the existing smart drivers end up looking like crap and don't work that well. > Even with a more intelligent driver, we still would like to see some > mechanism for the weight to be changed at runtime, such as with > Stephen's sysfs patch. This would allow a sysadmin (or user-space app) > to tune the system based on statistical data that isn't available to the > individual driver. > It will be yet another knob that all except the benchmark tweakers can ignore (hopefully). From mmporter@cox.net Thu Jun 2 14:35:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 14:35:08 -0700 (PDT) Received: from fed1rmmtao06.cox.net (fed1rmmtao06.cox.net [68.230.241.33]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52LZ2Xq028054 for ; Thu, 2 Jun 2005 14:35:03 -0700 Received: from liberty.homelinux.org ([68.2.41.86]) by fed1rmmtao06.cox.net (InterMail vM.6.01.04.00 201-2131-118-20041027) with ESMTP id <20050602213405.WKPG19494.fed1rmmtao06.cox.net@liberty.homelinux.org>; Thu, 2 Jun 2005 17:34:05 -0400 Received: (from mmporter@localhost) by liberty.homelinux.org (8.9.3/8.9.3/Debian 8.9.3-21) id OAA26210; Thu, 2 Jun 2005 14:34:04 -0700 Date: Thu, 2 Jun 2005 14:34:04 -0700 From: Matt Porter To: torvalds@osdl.org, akpm@osdl.org, jgarzik@pobox.com Cc: linux-kernel@vger.kernel.org, linuxppc-embedded@ozlabs.org, netdev@oss.sgi.com Subject: [PATCH][5/5] RapidIO support: net driver over messaging Message-ID: <20050602143404.F24818@cox.net> References: <20050602140359.B24818@cox.net> <20050602141247.C24818@cox.net> <20050602141946.D24818@cox.net> <20050602142509.E24818@cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20050602142509.E24818@cox.net>; from mporter@kernel.crashing.org on Thu, Jun 02, 2005 at 02:25:10PM -0700 X-archive-position: 1990 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mporter@kernel.crashing.org Precedence: bulk X-list: netdev Content-Length: 17633 Lines: 670 Adds an "Ethernet" driver which sends Ethernet packets over the standard RapidIO messaging. This depends on the core RIO patch for mailbox/doorbell access. Signed-off-by: Matt Porter Index: drivers/net/Kconfig =================================================================== --- 711ec47634f5d5ded866eaa965a0f7dadcbc65f4/drivers/net/Kconfig (mode:100644) +++ 8bdd37ff79724c95795ed39c28588a94e1f13e60/drivers/net/Kconfig (mode:100644) @@ -2185,6 +2185,20 @@ tristate "iSeries Virtual Ethernet driver support" depends on NETDEVICES && PPC_ISERIES +config RIONET + tristate "RapidIO Ethernet over messaging driver support" + depends on NETDEVICES && RAPIDIO + +config RIONET_TX_SIZE + int "Number of outbound queue entries" + depends on RIONET + default "128" + +config RIONET_RX_SIZE + int "Number of inbound queue entries" + depends on RIONET + default "128" + config FDDI bool "FDDI driver support" depends on NETDEVICES && (PCI || EISA) Index: drivers/net/Makefile =================================================================== --- 711ec47634f5d5ded866eaa965a0f7dadcbc65f4/drivers/net/Makefile (mode:100644) +++ 8bdd37ff79724c95795ed39c28588a94e1f13e60/drivers/net/Makefile (mode:100644) @@ -58,6 +58,7 @@ obj-$(CONFIG_VIA_RHINE) += via-rhine.o obj-$(CONFIG_VIA_VELOCITY) += via-velocity.o obj-$(CONFIG_ADAPTEC_STARFIRE) += starfire.o +obj-$(CONFIG_RIONET) += rionet.o # # end link order section Index: drivers/net/rionet.c =================================================================== --- /dev/null (tree:711ec47634f5d5ded866eaa965a0f7dadcbc65f4) +++ 8bdd37ff79724c95795ed39c28588a94e1f13e60/drivers/net/rionet.c (mode:100644) @@ -0,0 +1,622 @@ +/* + * rionet - Ethernet driver over RapidIO messaging services + * + * Copyright 2005 MontaVista Software, Inc. + * Matt Porter + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#define DRV_NAME "rionet" +#define DRV_VERSION "0.1" +#define DRV_AUTHOR "Matt Porter " +#define DRV_DESC "Ethernet over RapidIO" + +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); +MODULE_LICENSE("GPL"); + +#define RIONET_DEFAULT_MSGLEVEL 0 +#define RIONET_DOORBELL_JOIN 0x1000 +#define RIONET_DOORBELL_LEAVE 0x1001 + +#define RIONET_MAILBOX 0 + +#define RIONET_TX_RING_SIZE CONFIG_RIONET_TX_SIZE +#define RIONET_RX_RING_SIZE CONFIG_RIONET_RX_SIZE + +LIST_HEAD(rionet_peers); + +struct rionet_private { + struct rio_mport *mport; + struct sk_buff *rx_skb[RIONET_RX_RING_SIZE]; + struct sk_buff *tx_skb[RIONET_TX_RING_SIZE]; + struct net_device_stats stats; + int rx_slot; + int tx_slot; + int tx_cnt; + int ack_slot; + spinlock_t lock; + u32 msg_enable; +}; + +struct rionet_peer { + struct list_head node; + struct rio_dev *rdev; + struct resource *res; +}; + +static int rionet_check = 0; +static int rionet_capable = 1; +static struct net_device *sndev = NULL; + +/* + * This is a fast lookup table for for translating TX + * Ethernet packets into a destination RIO device. It + * could be made into a hash table to save memory depending + * on system trade-offs. + */ +static struct rio_dev *rionet_active[RIO_MAX_ROUTE_ENTRIES]; + +#define is_rionet_capable(pef, src_ops, dst_ops) \ + ((pef & RIO_PEF_INB_MBOX) && \ + (pef & RIO_PEF_INB_DOORBELL) && \ + (src_ops & RIO_SRC_OPS_DOORBELL) && \ + (dst_ops & RIO_DST_OPS_DOORBELL)) +#define dev_rionet_capable(dev) \ + is_rionet_capable(dev->pef, dev->src_ops, dev->dst_ops) + +#define RIONET_MAC_MATCH(x) (*(u32 *)x == 0x00010001) +#define RIONET_GET_DESTID(x) (*(u16 *)(x + 4)) + +static struct net_device_stats *rionet_stats(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + return &rnet->stats; +} + +static int rionet_rx_clean(struct net_device *ndev) +{ + int i; + int error = 0; + struct rionet_private *rnet = ndev->priv; + void *data; + + i = rnet->rx_slot; + + do { + if (!rnet->rx_skb[i]) { + rnet->stats.rx_dropped++; + continue; + } + + if (!(data = rio_get_inb_message(rnet->mport, RIONET_MAILBOX))) + break; + + rnet->rx_skb[i]->data = data; + skb_put(rnet->rx_skb[i], RIO_MAX_MSG_SIZE); + rnet->rx_skb[i]->dev = sndev; + rnet->rx_skb[i]->protocol = + eth_type_trans(rnet->rx_skb[i], sndev); + error = netif_rx(rnet->rx_skb[i]); + + if (error == NET_RX_DROP) { + rnet->stats.rx_dropped++; + } else if (error == NET_RX_BAD) { + if (netif_msg_rx_err(rnet)) + printk(KERN_WARNING "%s: bad rx packet\n", + DRV_NAME); + rnet->stats.rx_errors++; + } else { + rnet->stats.rx_packets++; + rnet->stats.rx_bytes += RIO_MAX_MSG_SIZE; + } + + } while ((i = (i + 1) % RIONET_RX_RING_SIZE) != rnet->rx_slot); + + return i; +} + +static void rionet_rx_fill(struct net_device *ndev, int end) +{ + int i; + struct rionet_private *rnet = ndev->priv; + + i = rnet->rx_slot; + do { + rnet->rx_skb[i] = dev_alloc_skb(RIO_MAX_MSG_SIZE); + + if (!rnet->rx_skb[i]) + break; + + rio_add_inb_buffer(rnet->mport, RIONET_MAILBOX, + rnet->rx_skb[i]->data); + } while ((i = (i + 1) % RIONET_RX_RING_SIZE) != end); + + rnet->rx_slot = i; +} + +static int rionet_queue_tx_msg(struct sk_buff *skb, struct net_device *ndev, + struct rio_dev *rdev) +{ + struct rionet_private *rnet = ndev->priv; + + rio_add_outb_message(rnet->mport, rdev, 0, skb->data, skb->len); + rnet->tx_skb[rnet->tx_slot] = skb; + + rnet->stats.tx_packets++; + rnet->stats.tx_bytes += skb->len; + + if (++rnet->tx_cnt == RIONET_TX_RING_SIZE) + netif_stop_queue(ndev); + + if (++rnet->tx_slot == RIONET_TX_RING_SIZE) + rnet->tx_slot = 0; + + if (netif_msg_tx_queued(rnet)) + printk(KERN_INFO "%s: queued skb %8.8x len %8.8x\n", DRV_NAME, + (u32) skb, skb->len); + + return 0; +} + +static int rionet_start_xmit(struct sk_buff *skb, struct net_device *ndev) +{ + int i; + struct rionet_private *rnet = ndev->priv; + struct ethhdr *eth = (struct ethhdr *)skb->data; + u16 destid; + + spin_lock_irq(&rnet->lock); + + if ((rnet->tx_cnt + 1) > RIONET_TX_RING_SIZE) { + netif_stop_queue(ndev); + spin_unlock_irq(&rnet->lock); + return -EBUSY; + } + + if (eth->h_dest[0] & 0x01) { + /* + * XXX Need to delay queuing if ring max is reached, + * flush additional packets in tx_event() before + * awakening the queue. We can easily exceed ring + * size with a large number of nodes or even a + * small number where the ring is relatively full + * on entrance to hard_start_xmit. + */ + for (i = 0; i < RIO_MAX_ROUTE_ENTRIES; i++) + if (rionet_active[i]) + rionet_queue_tx_msg(skb, ndev, + rionet_active[i]); + } else if (RIONET_MAC_MATCH(eth->h_dest)) { + destid = RIONET_GET_DESTID(eth->h_dest); + if (rionet_active[destid]) + rionet_queue_tx_msg(skb, ndev, rionet_active[destid]); + } + + spin_unlock_irq(&rnet->lock); + + return 0; +} + +static int rionet_set_mac_address(struct net_device *ndev, void *p) +{ + struct sockaddr *addr = p; + + if (!is_valid_ether_addr(addr->sa_data)) + return -EADDRNOTAVAIL; + + memcpy(ndev->dev_addr, addr->sa_data, ndev->addr_len); + + return 0; +} + +static int rionet_change_mtu(struct net_device *ndev, int new_mtu) +{ + struct rionet_private *rnet = ndev->priv; + + if (netif_msg_drv(rnet)) + printk(KERN_WARNING + "%s: rionet_change_mtu(): not implemented\n", DRV_NAME); + + return 0; +} + +static void rionet_set_multicast_list(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + + if (netif_msg_drv(rnet)) + printk(KERN_WARNING + "%s: rionet_set_multicast_list(): not implemented\n", + DRV_NAME); +} + +static void rionet_dbell_event(struct rio_mport *mport, u16 sid, u16 tid, + u16 info) +{ + struct net_device *ndev = sndev; + struct rionet_private *rnet = ndev->priv; + struct rionet_peer *peer; + + if (netif_msg_intr(rnet)) + printk(KERN_INFO "%s: doorbell sid %4.4x tid %4.4x info %4.4x", + DRV_NAME, sid, tid, info); + if (info == RIONET_DOORBELL_JOIN) { + if (!rionet_active[sid]) { + list_for_each_entry(peer, &rionet_peers, node) { + if (peer->rdev->destid == sid) + rionet_active[sid] = peer->rdev; + } + rio_mport_send_doorbell(mport, sid, + RIONET_DOORBELL_JOIN); + } + } else if (info == RIONET_DOORBELL_LEAVE) { + rionet_active[sid] = NULL; + } else { + if (netif_msg_intr(rnet)) + printk(KERN_WARNING "%s: unhandled doorbell\n", + DRV_NAME); + } +} + +static void rionet_inb_msg_event(struct rio_mport *mport, int mbox, int slot) +{ + int n; + struct net_device *ndev = sndev; + struct rionet_private *rnet = (struct rionet_private *)ndev->priv; + + if (netif_msg_intr(rnet)) + printk(KERN_INFO "%s: inbound message event, mbox %d slot %d\n", + DRV_NAME, mbox, slot); + + spin_lock(&rnet->lock); + if ((n = rionet_rx_clean(ndev)) != rnet->rx_slot) + rionet_rx_fill(ndev, n); + spin_unlock(&rnet->lock); +} + +static void rionet_outb_msg_event(struct rio_mport *mport, int mbox, int slot) +{ + struct net_device *ndev = sndev; + struct rionet_private *rnet = ndev->priv; + + spin_lock(&rnet->lock); + + if (netif_msg_intr(rnet)) + printk(KERN_INFO + "%s: outbound message event, mbox %d slot %d\n", + DRV_NAME, mbox, slot); + + while (rnet->tx_cnt && (rnet->ack_slot != slot)) { + /* dma unmap single */ + dev_kfree_skb_irq(rnet->tx_skb[rnet->ack_slot]); + rnet->tx_skb[rnet->ack_slot] = NULL; + if (++rnet->ack_slot == RIONET_TX_RING_SIZE) + rnet->ack_slot = 0; + rnet->tx_cnt--; + } + + if (rnet->tx_cnt < RIONET_TX_RING_SIZE) + netif_wake_queue(ndev); + + spin_unlock(&rnet->lock); +} + +static int rionet_open(struct net_device *ndev) +{ + int i, rc = 0; + struct rionet_peer *peer, *tmp; + u32 pwdcsr; + struct rionet_private *rnet = ndev->priv; + + if (netif_msg_ifup(rnet)) + printk(KERN_INFO "%s: open\n", DRV_NAME); + + if ((rc = rio_request_inb_dbell(rnet->mport, + RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE, + rionet_dbell_event)) < 0) + goto out; + + if ((rc = rio_request_inb_mbox(rnet->mport, + RIONET_MAILBOX, + RIONET_RX_RING_SIZE, + rionet_inb_msg_event)) < 0) + goto out; + + if ((rc = rio_request_outb_mbox(rnet->mport, + RIONET_MAILBOX, + RIONET_TX_RING_SIZE, + rionet_outb_msg_event)) < 0) + goto out; + + /* Initialize inbound message ring */ + for (i = 0; i < RIONET_RX_RING_SIZE; i++) + rnet->rx_skb[i] = NULL; + rnet->rx_slot = 0; + rionet_rx_fill(ndev, 0); + + rnet->tx_slot = 0; + rnet->tx_cnt = 0; + rnet->ack_slot = 0; + + spin_lock_init(&rnet->lock); + + rnet->msg_enable = RIONET_DEFAULT_MSGLEVEL; + + netif_carrier_on(ndev); + netif_start_queue(ndev); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + if (!(peer->res = rio_request_outb_dbell(peer->rdev, + RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE))) + { + printk(KERN_ERR "%s: error requesting doorbells\n", + DRV_NAME); + continue; + } + + /* + * If device has initialized inbound doorbells, + * send a join message + */ + rio_read_config_32(peer->rdev, RIO_WRITE_PORT_CSR, &pwdcsr); + if (pwdcsr & RIO_DOORBELL_AVAIL) + rio_send_doorbell(peer->rdev, RIONET_DOORBELL_JOIN); + } + + out: + return rc; +} + +static int rionet_close(struct net_device *ndev) +{ + struct rionet_private *rnet = (struct rionet_private *)ndev->priv; + struct rionet_peer *peer, *tmp; + int i; + + if (netif_msg_ifup(rnet)) + printk(KERN_INFO "%s: close\n", DRV_NAME); + + netif_stop_queue(ndev); + netif_carrier_off(ndev); + + for (i = 0; i < RIONET_RX_RING_SIZE; i++) + if (rnet->rx_skb[i]) + kfree_skb(rnet->rx_skb[i]); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + if (rionet_active[peer->rdev->destid]) { + rio_send_doorbell(peer->rdev, RIONET_DOORBELL_LEAVE); + rionet_active[peer->rdev->destid] = NULL; + } + rio_release_outb_dbell(peer->rdev, peer->res); + } + + rio_release_inb_dbell(rnet->mport, RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE); + rio_release_inb_mbox(rnet->mport, RIONET_MAILBOX); + rio_release_outb_mbox(rnet->mport, RIONET_MAILBOX); + + return 0; +} + +static void rionet_remove(struct rio_dev *rdev) +{ + struct net_device *ndev = NULL; + struct rionet_peer *peer, *tmp; + + unregister_netdev(ndev); + kfree(ndev); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + list_del(&peer->node); + kfree(peer); + } +} + +static int rionet_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd) +{ + return -EOPNOTSUPP; +} + +static void rionet_get_drvinfo(struct net_device *ndev, + struct ethtool_drvinfo *info) +{ + struct rionet_private *rnet = ndev->priv; + + strcpy(info->driver, DRV_NAME); + strcpy(info->version, DRV_VERSION); + strcpy(info->fw_version, "n/a"); + sprintf(info->bus_info, "RIO master port %d", rnet->mport->id); +} + +static u32 rionet_get_msglevel(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + + return rnet->msg_enable; +} + +static void rionet_set_msglevel(struct net_device *ndev, u32 value) +{ + struct rionet_private *rnet = ndev->priv; + + rnet->msg_enable = value; +} + +static u32 rionet_get_link(struct net_device *ndev) +{ + return netif_carrier_ok(ndev); +} + +static struct ethtool_ops rionet_ethtool_ops = { + .get_drvinfo = rionet_get_drvinfo, + .get_msglevel = rionet_get_msglevel, + .set_msglevel = rionet_set_msglevel, + .get_link = rionet_get_link, +}; + +static int rionet_setup_netdev(struct rio_mport *mport) +{ + int rc = 0; + struct net_device *ndev = NULL; + struct rionet_private *rnet; + u16 device_id; + + /* Allocate our net_device structure */ + ndev = alloc_etherdev(sizeof(struct rionet_private)); + if (ndev == NULL) { + printk(KERN_INFO "%s: could not allocate ethernet device.\n", + DRV_NAME); + rc = -ENOMEM; + goto out; + } + + /* + * XXX hack, store point a static at ndev so we can get it... + * Perhaps need an array of these that the handler can + * index via the mbox number. + */ + sndev = ndev; + + /* Set up private area */ + rnet = (struct rionet_private *)ndev->priv; + rnet->mport = mport; + + /* Set the default MAC address */ + device_id = rio_local_get_device_id(mport); + ndev->dev_addr[0] = 0x00; + ndev->dev_addr[1] = 0x01; + ndev->dev_addr[2] = 0x00; + ndev->dev_addr[3] = 0x01; + ndev->dev_addr[4] = device_id >> 8; + ndev->dev_addr[5] = device_id & 0xff; + + /* Fill in the driver function table */ + ndev->open = &rionet_open; + ndev->hard_start_xmit = &rionet_start_xmit; + ndev->stop = &rionet_close; + ndev->get_stats = &rionet_stats; + ndev->change_mtu = &rionet_change_mtu; + ndev->set_mac_address = &rionet_set_mac_address; + ndev->set_multicast_list = &rionet_set_multicast_list; + ndev->do_ioctl = &rionet_ioctl; + SET_ETHTOOL_OPS(ndev, &rionet_ethtool_ops); + + ndev->mtu = RIO_MAX_MSG_SIZE - 14; + + SET_MODULE_OWNER(ndev); + + rc = register_netdev(ndev); + if (rc != 0) + goto out; + + printk("%s: %s %s Version %s, MAC %02x:%02x:%02x:%02x:%02x:%02x\n", + ndev->name, + DRV_NAME, + DRV_DESC, + DRV_VERSION, + ndev->dev_addr[0], ndev->dev_addr[1], ndev->dev_addr[2], + ndev->dev_addr[3], ndev->dev_addr[4], ndev->dev_addr[5]); + + out: + return rc; +} + +/* + * XXX Make multi-net safe + */ +static int rionet_probe(struct rio_dev *rdev, const struct rio_device_id *id) +{ + int rc = -ENODEV; + u32 lpef, lsrc_ops, ldst_ops; + struct rionet_peer *peer; + + /* If local device is not rionet capable, give up quickly */ + if (!rionet_capable) + goto out; + + /* + * First time through, make sure local device is rionet + * capable, setup netdev, and set flags so this is skipped + * on later probes + */ + if (!rionet_check) { + rio_local_read_config_32(rdev->net->hport, RIO_PEF_CAR, &lpef); + rio_local_read_config_32(rdev->net->hport, RIO_SRC_OPS_CAR, + &lsrc_ops); + rio_local_read_config_32(rdev->net->hport, RIO_DST_OPS_CAR, + &ldst_ops); + if (!is_rionet_capable(lpef, lsrc_ops, ldst_ops)) { + printk(KERN_ERR + "%s: local device is not network capable\n", + DRV_NAME); + rionet_check = 1; + rionet_capable = 0; + goto out; + } + + rc = rionet_setup_netdev(rdev->net->hport); + rionet_check = 1; + } + + /* + * If the remote device has mailbox/doorbell capabilities, + * add it to the peer list. + */ + if (dev_rionet_capable(rdev)) { + if (!(peer = kmalloc(sizeof(struct rionet_peer), GFP_KERNEL))) { + rc = -ENOMEM; + goto out; + } + peer->rdev = rdev; + list_add_tail(&peer->node, &rionet_peers); + } + + out: + return rc; +} + +static struct rio_device_id rionet_id_table[] = { + {RIO_DEVICE(RIO_ANY_ID, RIO_ANY_ID)} +}; + +static struct rio_driver rionet_driver = { + .name = "rionet", + .id_table = rionet_id_table, + .probe = rionet_probe, + .remove = rionet_remove, +}; + +static int __init rionet_init(void) +{ + return rio_register_driver(&rionet_driver); +} + +static void __exit rionet_exit(void) +{ + rio_unregister_driver(&rionet_driver); +} + +module_init(rionet_init); +module_exit(rionet_exit); From davem@davemloft.net Thu Jun 2 14:41:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 14:41:19 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52LfFXq028972 for ; Thu, 2 Jun 2005 14:41:15 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DdxQ3-00058E-Ju; Thu, 02 Jun 2005 14:40:03 -0700 Date: Thu, 02 Jun 2005 14:40:03 -0700 (PDT) Message-Id: <20050602.144003.35660495.davem@davemloft.net> To: shemminger@osdl.org Cc: john.ronciak@intel.com, hadi@cyberus.ca, jdmason@us.ibm.com, mitch.a.williams@intel.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <20050602143126.7c302cfd@dxpl.pdx.osdl.net> References: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> <20050602143126.7c302cfd@dxpl.pdx.osdl.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1991 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1077 Lines: 25 From: Stephen Hemminger Date: Thu, 2 Jun 2005 14:31:26 -0700 > For networking the problem is worse, the "right" choice depends on workload > and relationship between components in the system. I can't see how you could > ever expect a driver specific solution. I totally agree, even the mere concept of driver-centric decisions in this area is pretty bogus. > And for other workloads, and other systems (think about the Altix with > long access latencies), your numbers will be wrong. Perhaps we need > to quit trying for a perfect solution and just get a "good enough" one > that works. I don't understand why nobody is investigating doing this stuff by generic measurements that the core kernel can perform. The generic ->poll() runner code can say, wow it took N-usec to process M packets, perhaps I should adjust the weight. I haven't seen one concrete suggestion along those lines, yet that is where the answer to this kind of stuff is. Those kinds of solutions are completely CPU, memory, I/O bus, network device, and workload independant. From jdmason@us.ibm.com Thu Jun 2 14:53:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 14:53:06 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52Lr3Xq000575 for ; Thu, 2 Jun 2005 14:53:03 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j52Lq6mD244468 for ; Thu, 2 Jun 2005 17:52:06 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j52Lq5Jj038004 for ; Thu, 2 Jun 2005 15:52:05 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j52Lq4EK018604 for ; Thu, 2 Jun 2005 15:52:05 -0600 Received: from [192.168.0.29] (dreadnought.austin.ibm.com [9.53.90.32]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j52Lq4KL018571; Thu, 2 Jun 2005 15:52:04 -0600 From: Jon Mason Organization: IBM To: Stephen Hemminger Subject: Re: RFC: NAPI packet weighting patch Date: Thu, 2 Jun 2005 16:51:48 -0500 User-Agent: KMail/1.7.2 Cc: "Ronciak, John" , hadi@cyberus.ca, "David S. Miller" , "Williams, Mitch A" , netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" References: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> <20050602143126.7c302cfd@dxpl.pdx.osdl.net> In-Reply-To: <20050602143126.7c302cfd@dxpl.pdx.osdl.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506021651.49013.jdmason@us.ibm.com> X-archive-position: 1992 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jdmason@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1355 Lines: 26 On Thursday 02 June 2005 04:31 pm, Stephen Hemminger wrote: <...> > For networking the problem is worse, the "right" choice depends on workload > and relationship between components in the system. I can't see how you > could ever expect a driver specific solution. I think there is a way for a generic driver NAPI enhancement. That is to modify the weight dependent on link speed. Here is the problem as I see it, NAPI enablement for slow media speeds causes unneeded strain on the system. This is because of the "weight" of NAPI. Lets look at e1000 as an example. Currently the NAPI weight is 64, regardless of link media speed. This weight is probably fine for a gigabit link, but for 10/100 this is way to large. Thus causing interrupts to be enabled/disabled after every poll/interrupt. Lots of overhead, and not very smart. Why not have the driver set the weight to 16/32 respectively for the weight (or better yet, have someone run numbers to find weight that are closer to what the adapter can actually use)? While these numbers may not be optimal for every system, this is much better that the current system, and would only require 5 or so extra lines of code per NAPI enabled driver. For those who want to have an optimal weight for their tuned system, let them use the /proc entry that is being proposed. Thanks, Jon From shemminger@osdl.org Thu Jun 2 15:06:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 15:06:48 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52M6jXq001487 for ; Thu, 2 Jun 2005 15:06:45 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j52M5hjA021673 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 2 Jun 2005 15:05:43 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j52M5hJ8021042; Thu, 2 Jun 2005 15:05:43 -0700 Date: Thu, 2 Jun 2005 15:05:43 -0700 From: Stephen Hemminger To: Matt Porter Cc: torvalds@osdl.org, akpm@osdl.org, jgarzik@pobox.com, linux-kernel@vger.kernel.org, linuxppc-embedded@ozlabs.org, netdev@oss.sgi.com Subject: Re: [PATCH][5/5] RapidIO support: net driver over messaging Message-ID: <20050602150543.7e4326b6@dxpl.pdx.osdl.net> In-Reply-To: <20050602143404.F24818@cox.net> References: <20050602140359.B24818@cox.net> <20050602141247.C24818@cox.net> <20050602141946.D24818@cox.net> <20050602142509.E24818@cox.net> <20050602143404.F24818@cox.net> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 1993 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 3600 Lines: 131 How much is this like ethernet? does it still do ARP? Can it do promiscious receive? > +LIST_HEAD(rionet_peers); Does this have to be global? Not sure about the locking of this stuff, are you relying on the RTNL? > + > +static int rionet_change_mtu(struct net_device *ndev, int new_mtu) > +{ > + struct rionet_private *rnet = ndev->priv; > + > + if (netif_msg_drv(rnet)) > + printk(KERN_WARNING > + "%s: rionet_change_mtu(): not implemented\n", DRV_NAME); > + > + return 0; > +} If you can allow any mtu then don't need this at all. Or if you are limited then better return an error for bad values. > +static void rionet_set_multicast_list(struct net_device *ndev) > +{ > + struct rionet_private *rnet = ndev->priv; > + > + if (netif_msg_drv(rnet)) > + printk(KERN_WARNING > + "%s: rionet_set_multicast_list(): not implemented\n", > + DRV_NAME); > +} If you can't handle it then just leave dev->set_multicast_list as NULL and all attempts to add or delete will get -EINVAL > + > +static int rionet_open(struct net_device *ndev) > +{ > + /* Initialize inbound message ring */ > + for (i = 0; i < RIONET_RX_RING_SIZE; i++) > + rnet->rx_skb[i] = NULL; > + rnet->rx_slot = 0; > + rionet_rx_fill(ndev, 0); > + > + rnet->tx_slot = 0; > + rnet->tx_cnt = 0; > + rnet->ack_slot = 0; > + > + spin_lock_init(&rnet->lock); > + > + rnet->msg_enable = RIONET_DEFAULT_MSGLEVEL; Better to do all initialization of the per device data in the place it is allocated (rio_setup_netdev) > + > +static int rionet_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd) > +{ > + return -EOPNOTSUPP; > +} Unneeded, if dev->do_ioctl is NULL, then all private ioctl's will return -EINVAL that is what you want. > + > +static u32 rionet_get_link(struct net_device *ndev) > +{ > + return netif_carrier_ok(ndev); > +} Use ethtool_op_get_link > + > +static int rionet_setup_netdev(struct rio_mport *mport) > +{ > + int rc = 0; > + struct net_device *ndev = NULL; > + struct rionet_private *rnet; > + u16 device_id; > + > + /* Allocate our net_device structure */ > + ndev = alloc_etherdev(sizeof(struct rionet_private)); > + if (ndev == NULL) { > + printk(KERN_INFO "%s: could not allocate ethernet device.\n", > + DRV_NAME); > + rc = -ENOMEM; > + goto out; > + } > + > + /* > + * XXX hack, store point a static at ndev so we can get it... > + * Perhaps need an array of these that the handler can > + * index via the mbox number. > + */ > + sndev = ndev; > + > + /* Set up private area */ > + rnet = (struct rionet_private *)ndev->priv; > + rnet->mport = mport; > + > + /* Set the default MAC address */ > + device_id = rio_local_get_device_id(mport); > + ndev->dev_addr[0] = 0x00; > + ndev->dev_addr[1] = 0x01; > + ndev->dev_addr[2] = 0x00; > + ndev->dev_addr[3] = 0x01; > + ndev->dev_addr[4] = device_id >> 8; > + ndev->dev_addr[5] = device_id & 0xff; > + > + /* Fill in the driver function table */ > + ndev->open = &rionet_open; > + ndev->hard_start_xmit = &rionet_start_xmit; > + ndev->stop = &rionet_close; > + ndev->get_stats = &rionet_stats; > + ndev->change_mtu = &rionet_change_mtu; > + ndev->set_mac_address = &rionet_set_mac_address; > + ndev->set_multicast_list = &rionet_set_multicast_list; > + ndev->do_ioctl = &rionet_ioctl; > + SET_ETHTOOL_OPS(ndev, &rionet_ethtool_ops); > + > + ndev->mtu = RIO_MAX_MSG_SIZE - 14; > + > + SET_MODULE_OWNER(ndev); Can you set any ndev->features to get better performance. Can you take >32bit data addresses? then set HIGHDMA You are doing your on locking, can you use LLTX? Does the hardware support scatter gather? From davem@davemloft.net Thu Jun 2 15:13:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 15:13:27 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52MDOXq002149 for ; Thu, 2 Jun 2005 15:13:24 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DdxvA-0005GI-CT; Thu, 02 Jun 2005 15:12:12 -0700 Date: Thu, 02 Jun 2005 15:12:12 -0700 (PDT) Message-Id: <20050602.151212.35014607.davem@davemloft.net> To: jdmason@us.ibm.com Cc: shemminger@osdl.org, john.ronciak@intel.com, hadi@cyberus.ca, mitch.a.williams@intel.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <200506021651.49013.jdmason@us.ibm.com> References: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> <20050602143126.7c302cfd@dxpl.pdx.osdl.net> <200506021651.49013.jdmason@us.ibm.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1994 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 880 Lines: 20 From: Jon Mason Date: Thu, 2 Jun 2005 16:51:48 -0500 > Why not have the driver set the weight to 16/32 respectively for the > weight (or better yet, have someone run numbers to find weight that > are closer to what the adapter can actually use)? While these > numbers may not be optimal for every system, this is much better > that the current system, and would only require 5 or so extra lines > of code per NAPI enabled driver. Why do this when we can adjust the weight in one spot, namely the upper level NAPI ->poll() running loop? It can measure the overhead, how many packets processed, etc. and make intelligent decisions based upon that. This is a CPU speed, memory speed, I/O bus speed, and link speed agnostic solution. The driver need not take any part in this, and the scheme will dynamically adjust to resource usage changes in the system. From Robert.Olsson@data.slu.se Thu Jun 2 15:17:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 15:17:14 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52MH4Xq002804 for ; Thu, 2 Jun 2005 15:17:05 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j52MFs9E022315; Fri, 3 Jun 2005 00:15:54 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 45CA6EE3F0; Fri, 3 Jun 2005 00:15:51 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17055.34070.718986.664873@robur.slu.se> Date: Fri, 3 Jun 2005 00:15:50 +0200 To: Jon Mason Cc: Stephen Hemminger , "Ronciak, John" , hadi@cyberus.ca, "David S. Miller" , "Williams, Mitch A" , netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <200506021651.49013.jdmason@us.ibm.com> References: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> <20050602143126.7c302cfd@dxpl.pdx.osdl.net> <200506021651.49013.jdmason@us.ibm.com> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-archive-position: 1995 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Content-Length: 1819 Lines: 43 Differentiate the meaning of weight a bit. Let weight only limit the number of pkts we deliver per ->poll Have some other mechanism or threshold to control when interrupts are to be turned on. The first approximation for this could be to poll as long as we see any pkt on the RX ring. As interrupt seems expensive on all platforms. Cheers. --ro Jon Mason writes: > On Thursday 02 June 2005 04:31 pm, Stephen Hemminger wrote: > <...> > > For networking the problem is worse, the "right" choice depends on workload > > and relationship between components in the system. I can't see how you > > could ever expect a driver specific solution. > > I think there is a way for a generic driver NAPI enhancement. That is to > modify the weight dependent on link speed. > > Here is the problem as I see it, NAPI enablement for slow media speeds causes > unneeded strain on the system. This is because of the "weight" of NAPI. > Lets look at e1000 as an example. Currently the NAPI weight is 64, > regardless of link media speed. This weight is probably fine for a gigabit > link, but for 10/100 this is way to large. Thus causing interrupts to be > enabled/disabled after every poll/interrupt. Lots of overhead, and not very > smart. Why not have the driver set the weight to 16/32 respectively for the > weight (or better yet, have someone run numbers to find weight that are > closer to what the adapter can actually use)? While these numbers may not be > optimal for every system, this is much better that the current system, and > would only require 5 or so extra lines of code per NAPI enabled driver. > > For those who want to have an optimal weight for their tuned system, let them > use the /proc entry that is being proposed. > > Thanks, > Jon From jdmason@us.ibm.com Thu Jun 2 15:21:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 15:21:04 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52ML0Xq003409 for ; Thu, 2 Jun 2005 15:21:00 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j52MK3mD022896 for ; Thu, 2 Jun 2005 18:20:03 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j52MK2uC222176 for ; Thu, 2 Jun 2005 16:20:02 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j52MK1bg015726 for ; Thu, 2 Jun 2005 16:20:02 -0600 Received: from [192.168.0.29] (dreadnought.austin.ibm.com [9.53.90.32]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j52MK1hV015713; Thu, 2 Jun 2005 16:20:01 -0600 From: Jon Mason Organization: IBM To: "David S. Miller" Subject: Re: RFC: NAPI packet weighting patch Date: Thu, 2 Jun 2005 17:19:46 -0500 User-Agent: KMail/1.7.2 Cc: shemminger@osdl.org, john.ronciak@intel.com, hadi@cyberus.ca, mitch.a.williams@intel.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com References: <468F3FDA28AA87429AD807992E22D07E0450BFD0@orsmsx408> <200506021651.49013.jdmason@us.ibm.com> <20050602.151212.35014607.davem@davemloft.net> In-Reply-To: <20050602.151212.35014607.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506021719.47459.jdmason@us.ibm.com> X-archive-position: 1996 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jdmason@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1047 Lines: 23 On Thursday 02 June 2005 05:12 pm, David S. Miller wrote: > From: Jon Mason > Date: Thu, 2 Jun 2005 16:51:48 -0500 > > > Why not have the driver set the weight to 16/32 respectively for the > > weight (or better yet, have someone run numbers to find weight that > > are closer to what the adapter can actually use)? While these > > numbers may not be optimal for every system, this is much better > > that the current system, and would only require 5 or so extra lines > > of code per NAPI enabled driver. > > Why do this when we can adjust the weight in one spot, > namely the upper level NAPI ->poll() running loop? > > It can measure the overhead, how many packets processed, etc. > and make intelligent decisions based upon that. This is a CPU > speed, memory speed, I/O bus speed, and link speed agnostic > solution. > > The driver need not take any part in this, and the scheme will > dynamically adjust to resource usage changes in the system. Yes, a much better idea to do this generically. I 100% agree with you. From hadi@cyberus.ca Thu Jun 2 15:22:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 15:22:54 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52MMkXq003771 for ; Thu, 2 Jun 2005 15:22:46 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1Ddy4U-0008Hr-T6 for netdev@oss.sgi.com; Thu, 02 Jun 2005 18:21:50 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DdpmN-0002Jn-DX; Thu, 02 Jun 2005 09:30:35 -0400 Subject: Re: PATCH: explicit typing WAS(Re: PATCH: rtnetlink explicit flags setting From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: tgraf@suug.ch, netdev@oss.sgi.com In-Reply-To: <1117717493.6050.29.camel@localhost.localdomain> References: <1117197157.6688.24.camel@localhost.localdomain> <20050531.144338.112623594.davem@davemloft.net> <20050531222646.GK15391@postel.suug.ch> <20050531.153125.95894437.davem@davemloft.net> <1117717493.6050.29.camel@localhost.localdomain> Content-Type: text/plain Organization: unknown Date: Thu, 02 Jun 2005 09:30:32 -0400 Message-Id: <1117719032.6050.50.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 1997 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 344 Lines: 17 I should say this patch is against net-2.6.13.git as of 6am this morning. cheers, jamal On Thu, 2005-02-06 at 09:04 -0400, jamal wrote: > ------------- > This patch converts "unsigned flags" to use more explict types like u16 > instead and incrementally introduces NLMSG_NEW(). > > Signed-off-by: Jamal Hadi Salim > From ravinandan.arakali@neterion.com Thu Jun 2 16:20:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 16:20:39 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52NKZXq010483 for ; Thu, 2 Jun 2005 16:20:36 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j52NJ5OC005380; Thu, 2 Jun 2005 19:19:05 -0400 (EDT) Received: from rarakali ([10.16.16.57]) by guinness.s2io.com (8.12.6/8.12.6) with SMTP id j52NJ1VG016944; Thu, 2 Jun 2005 19:19:02 -0400 (EDT) From: "Ravinandan Arakali" To: "'David S. Miller'" Cc: , , , , , Subject: RE: [PATCH 2.6.12-rc4] IPv4/IPv6: UDP Large Send Offload feature Date: Thu, 2 Jun 2005 16:18:55 -0700 Message-ID: <003201c567c9$73322240$3910100a@pc.s2io.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2911.0) In-Reply-To: <20050527.120215.26278001.davem@davemloft.net> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Importance: Normal X-Scanned-By: MIMEDefang 2.34 X-archive-position: 1998 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ravinandan.arakali@neterion.com Precedence: bulk X-list: netdev Content-Length: 1822 Lines: 52 David, Since there seems to be pros and cons for both the approaches, we are planning to submit two separate patches(one for each approach). These patches also include the ethtool changes. In terms of performance, we did not observe any diff between the two approaches although the first approach(using SG) minimizes coalescing in driver. Also, some changes will be required in the ethtool user-level utility. I'm not sure if this is the right forum to submit patches for the ethtool utility as well.. Thanks, Ravi -----Original Message----- From: David S. Miller [mailto:davem@davemloft.net] Sent: Friday, May 27, 2005 12:02 PM To: ravinandan.arakali@neterion.com Cc: jgarzik@pobox.com; netdev@oss.sgi.com; raghavendra.koushik@neterion.com; leonid.grossman@neterion.com; ananda.raju@neterion.com; rapuru.sriram@neterion.com Subject: Re: [PATCH 2.6.12-rc4] IPv4/IPv6: UDP Large Send Offload feature From: "Ravinandan Arakali" Date: Fri, 27 May 2005 09:32:00 -0700 > Thanks for the quick feedback. > At that time when we considered using skb_shinfo(skb)->fraglist, > it contained fragments of MTU size. So, for a 60k udp datagram > and 1500 MTU we will have 60k/1500 = 45 fragments which is > more than MAX_SKB_FRAGS(18). > > However we will relook at fraglist for the possibility of increasing > frag size to >MTU. MAX_SKB_FRAGS controls the limit of skb_shinfo(skb)->frags[] entries, not how many SKBs may be chained via skb_shinfo(skb)->fraglist, there is no limit on the latter. Note that there is much coalescing that can be performed on the SKB list data areas, particularly if UDP sendfile() is being used. But such coalescing is messy to be performing inside of the drivers. It may end up being the case that your approach ends up being a better one for these reasons. From davem@davemloft.net Thu Jun 2 16:23:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 16:23:15 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52NNCXq010825 for ; Thu, 2 Jun 2005 16:23:13 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Ddz0m-0006ic-Pq; Thu, 02 Jun 2005 16:22:04 -0700 Date: Thu, 02 Jun 2005 16:22:04 -0700 (PDT) Message-Id: <20050602.162204.68041633.davem@davemloft.net> To: ravinandan.arakali@neterion.com Cc: jgarzik@pobox.com, netdev@oss.sgi.com, raghavendra.koushik@neterion.com, leonid.grossman@neterion.com, ananda.raju@neterion.com, rapuru.sriram@neterion.com Subject: Re: [PATCH 2.6.12-rc4] IPv4/IPv6: UDP Large Send Offload feature From: "David S. Miller" In-Reply-To: <003201c567c9$73322240$3910100a@pc.s2io.com> References: <20050527.120215.26278001.davem@davemloft.net> <003201c567c9$73322240$3910100a@pc.s2io.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 1999 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 831 Lines: 20 From: "Ravinandan Arakali" Date: Thu, 2 Jun 2005 16:18:55 -0700 > Since there seems to be pros and cons for both the approaches, we are > planning > to submit two separate patches(one for each approach). These patches also > include the ethtool changes. In terms of performance, we did not observe any > diff between the two approaches although the first approach(using SG) > minimizes > coalescing in driver. Ok. I think minimizing driver specific work is probably going to make the SG approach more desirable, but we'll see. > Also, some changes will be required in the ethtool user-level utility. > I'm not sure if this is the right forum to submit patches for the ethtool > utility as well.. Making sure jgarzik@pobox.com gets the patch is usually the way to go wrt. ethtool submissions. From davem@davemloft.net Thu Jun 2 16:37:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 16:37:41 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52NbaXq012155 for ; Thu, 2 Jun 2005 16:37:36 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DdzEi-0007AN-Q4; Thu, 02 Jun 2005 16:36:28 -0700 Date: Thu, 02 Jun 2005 16:36:28 -0700 (PDT) Message-Id: <20050602.163628.01205145.davem@davemloft.net> To: hch@lst.de Cc: netdev@oss.sgi.com Subject: Re: [PATCH] shaper.c: fix locking From: "David S. Miller" In-Reply-To: <20050601052149.GA11935@lst.de> References: <20050527115450.GA19469@lst.de> <20050531.144114.78710204.davem@davemloft.net> <20050601052149.GA11935@lst.de> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2001 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 927 Lines: 22 From: Christoph Hellwig Date: Wed, 1 Jun 2005 07:21:50 +0200 > On Tue, May 31, 2005 at 02:41:14PM -0700, David S. Miller wrote: > > From: Christoph Hellwig > > Subject: [PATCH] shaper.c: fix locking > > Date: Fri, 27 May 2005 13:54:50 +0200 > > > > > o use a semaphore instead of an opencoded and racy lock > > > o move locking out of shaper_kick and into the callers - most just > > > released the lock before calling shaper_kick > > > o remove in_interrupt() tests. from ->close we can always block, from > > > ->hard_start_xmit and timer context never > > > > Do you really want to use a semaphore for a lock taken > > %99 of the time in software IRQ context, which obviously > > cannot sleep? > > I want to change as little as possible from the previous variant ;-) Fair enough, patch applied. If this driver breaks as a result of these changes, you get to keep the pieces ok? :-) From davem@davemloft.net Thu Jun 2 16:36:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 16:36:25 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52NaJXq011978 for ; Thu, 2 Jun 2005 16:36:19 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DdzDU-00073m-CE; Thu, 02 Jun 2005 16:35:12 -0700 Date: Thu, 02 Jun 2005 16:35:12 -0700 (PDT) Message-Id: <20050602.163512.10298458.davem@davemloft.net> To: baruch@ev-en.org Cc: netdev@oss.sgi.com, shemminger@osdl.org, doug.leith@nuim.ie Subject: Re: Comparison of several congestion control algorithms From: "David S. Miller" In-Reply-To: <4298E045.9050009@ev-en.org> References: <4298E045.9050009@ev-en.org> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2000 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1163 Lines: 24 From: Baruch Even Date: Sat, 28 May 2005 22:19:01 +0100 > I wanted to point you to a comparison of congestion control algorithm > done at the Hamilton Institute. These experiments compare Scalable-TCP, > High-Speed TCP, FAST-TCP, BIC-TCP, H-TCP and Standard TCP. They compared > fairness, compatibility with TCP and link utilisation. > > You can find the results and a report at http://hamilton.ie/net/eval/ Nice work, I enjoyed this paper very much. There is something that none of these papers mention, but is essential for interpreting results. Did you use interfaces with TSO enabled? There is a very serious congestion window growth bug with TSO enabled in the current 2.6.x tree. The problem is due to congestion window validation. When we build TSO frames, even if we have packets to send, we may defer a few frames until the full TSO packet can go out. But this causes the congestion window validation checks in tcp_ack() to not pass, and thus the congestion window does not grow. I am going to have this fixed, but for now people should do congestion window algorithm tests with TSO explicitly disabled on their interfaces. From baruch@ev-en.org Thu Jun 2 16:51:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 16:51:14 -0700 (PDT) Received: from galon.ev-en.org (rrcs-24-123-59-149.central.biz.rr.com [24.123.59.149]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52NpAXq013555 for ; Thu, 2 Jun 2005 16:51:10 -0700 Received: by galon.ev-en.org (Postfix, from userid 105) id 1AFEF11A953; Fri, 3 Jun 2005 02:50:12 +0300 (IDT) Received: from [10.220.3.66] (hamilton.nuim.ie [149.157.192.252]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by galon.ev-en.org (Postfix) with ESMTP id 3141E11A951; Fri, 3 Jun 2005 02:50:08 +0300 (IDT) Message-ID: <429F9B2F.8030507@ev-en.org> Date: Fri, 03 Jun 2005 00:50:07 +0100 From: Baruch Even User-Agent: Debian Thunderbird 1.0.2 (X11/20050331) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" Cc: netdev@oss.sgi.com, shemminger@osdl.org, doug.leith@nuim.ie Subject: Re: Comparison of several congestion control algorithms References: <4298E045.9050009@ev-en.org> <20050602.163512.10298458.davem@davemloft.net> In-Reply-To: <20050602.163512.10298458.davem@davemloft.net> X-Enigmail-Version: 0.91.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-archive-position: 2002 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: baruch@ev-en.org Precedence: bulk X-list: netdev Content-Length: 1075 Lines: 28 David S. Miller wrote: > From: Baruch Even > Date: Sat, 28 May 2005 22:19:01 +0100 > > >>I wanted to point you to a comparison of congestion control algorithm >>done at the Hamilton Institute. These experiments compare Scalable-TCP, >>High-Speed TCP, FAST-TCP, BIC-TCP, H-TCP and Standard TCP. They compared >> fairness, compatibility with TCP and link utilisation. >> >>You can find the results and a report at http://hamilton.ie/net/eval/ > > > Nice work, I enjoyed this paper very much. > > There is something that none of these papers mention, but is essential > for interpreting results. Did you use interfaces with TSO enabled? I did not do these experiments myself, but to the best of my knowledge, none of the experiments done so far in Hamilton have used the TSO feature. This is in part because of the start of the work that was based on 2.4 kernels and even as far as the 2.6.6 kernel which had disabled TSO once it saw SACKs. This made TSO unusable for our needs. AFAIK, the tests reported in that document used kernel 2.6.6. Baruch From davem@davemloft.net Thu Jun 2 16:54:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 16:54:46 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j52NsfXq014133 for ; Thu, 2 Jun 2005 16:54:41 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DdzVO-0000j6-1S; Thu, 02 Jun 2005 16:53:42 -0700 Date: Thu, 02 Jun 2005 16:53:41 -0700 (PDT) Message-Id: <20050602.165341.63126720.davem@davemloft.net> To: baruch@ev-en.org Cc: netdev@oss.sgi.com, shemminger@osdl.org, doug.leith@nuim.ie Subject: Re: Comparison of several congestion control algorithms From: "David S. Miller" In-Reply-To: <429F9B2F.8030507@ev-en.org> References: <4298E045.9050009@ev-en.org> <20050602.163512.10298458.davem@davemloft.net> <429F9B2F.8030507@ev-en.org> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2003 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 689 Lines: 18 From: Baruch Even Date: Fri, 03 Jun 2005 00:50:07 +0100 > This is in part because of the start of the work that was based on 2.4 > kernels and even as far as the 2.6.6 kernel which had disabled TSO once > it saw SACKs. This made TSO unusable for our needs. > > AFAIK, the tests reported in that document used kernel 2.6.6. Sure SACKs turn off TSO currently, but you'll have them enabled at the beginning until the first loss and this affects how fast the cwnd will grow. If you have e1000 cards, for example, you're getting TSO enabled by default. You really need to look into this, as it has a real and very non-trivial effect on all of the results you obtained. From john.ronciak@intel.com Thu Jun 2 17:14:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 17:14:47 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j530EYXq015670 for ; Thu, 2 Jun 2005 17:14:35 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j530CDM3022374; Fri, 3 Jun 2005 00:12:13 GMT Received: from orsmsxvs041.jf.intel.com (orsmsxvs041.jf.intel.com [192.168.65.54]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j530CCh0023812; Fri, 3 Jun 2005 00:12:13 GMT Received: from orsmsx332.amr.corp.intel.com ([192.168.65.60]) by orsmsxvs041.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060217121301969 ; Thu, 02 Jun 2005 17:12:13 -0700 Received: from orsmsx408.amr.corp.intel.com ([192.168.65.52]) by orsmsx332.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 2 Jun 2005 17:11:21 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: RFC: NAPI packet weighting patch Date: Thu, 2 Jun 2005 17:11:20 -0700 Message-ID: <468F3FDA28AA87429AD807992E22D07E0450BFDB@orsmsx408> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: RFC: NAPI packet weighting patch Thread-Index: AcVnwTswxSKQFuwsSBqFR1SjFJna0QADuSdA From: "Ronciak, John" To: "Jon Mason" , "David S. Miller" Cc: , , "Williams, Mitch A" , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" X-OriginalArrivalTime: 03 Jun 2005 00:11:21.0645 (UTC) FILETIME=[C5A105D0:01C567D0] X-Scanned-By: MIMEDefang 2.44 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j530EYXq015670 X-archive-position: 2005 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john.ronciak@intel.com Precedence: bulk X-list: netdev Content-Length: 1961 Lines: 53 I like this idea as well but I do an issue with it. How would this stack code find out that the weight is too high and pacekts are being dropped (not being polled fast enough)? It would have to check the controller stats to see the error count increasing for some period. I'm not sure this is workable unless we have some sort of feedback which the driver could send up (or set) saying that this is happening and the dynamic weight code could take into acount. Comments? Cheers, John > -----Original Message----- > From: Jon Mason [mailto:jdmason@us.ibm.com] > Sent: Thursday, June 02, 2005 3:20 PM > To: David S. Miller > Cc: shemminger@osdl.org; Ronciak, John; hadi@cyberus.ca; > Williams, Mitch A; netdev@oss.sgi.com; > Robert.Olsson@data.slu.se; Venkatesan, Ganesh; Brandeburg, Jesse > Subject: Re: RFC: NAPI packet weighting patch > > > On Thursday 02 June 2005 05:12 pm, David S. Miller wrote: > > From: Jon Mason > > Date: Thu, 2 Jun 2005 16:51:48 -0500 > > > > > Why not have the driver set the weight to 16/32 > respectively for the > > > weight (or better yet, have someone run numbers to find > weight that > > > are closer to what the adapter can actually use)? While these > > > numbers may not be optimal for every system, this is much better > > > that the current system, and would only require 5 or so > extra lines > > > of code per NAPI enabled driver. > > > > Why do this when we can adjust the weight in one spot, > > namely the upper level NAPI ->poll() running loop? > > > > It can measure the overhead, how many packets processed, etc. > > and make intelligent decisions based upon that. This is a CPU > > speed, memory speed, I/O bus speed, and link speed agnostic > > solution. > > > > The driver need not take any part in this, and the scheme will > > dynamically adjust to resource usage changes in the system. > > Yes, a much better idea to do this generically. I 100% agree > with you. > From hadi@cyberus.ca Thu Jun 2 17:14:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 17:14:39 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j530EYXq015668 for ; Thu, 2 Jun 2005 17:14:35 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1Ddzoj-0001X1-Ur for netdev@oss.sgi.com; Thu, 02 Jun 2005 20:13:41 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1Ddq7c-0005rn-D5; Thu, 02 Jun 2005 09:52:32 -0400 Subject: PATCH: ioctl send PID in netlink events From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: netdev Content-Type: multipart/mixed; boundary="=-Vf8rMgMoYxExZv+wVDZr" Organization: unknown Date: Thu, 02 Jun 2005 09:52:29 -0400 Message-Id: <1117720349.6050.59.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 X-archive-position: 2004 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 4221 Lines: 131 --=-Vf8rMgMoYxExZv+wVDZr Content-Type: text/plain Content-Transfer-Encoding: 7bit This is where i was trying to get to ;-> This patch is on top of the earlier one i sent for explicit types. I still have to think about how to best do IPV6 routes as well as ARP and NDISC. If anyone has suggestions or wants to tackle them let me know, the v6 route is not going to be a pretty one i think. cheers, jamal This patch ensures that netlink events created as a result of programns using ioctls (such as ifconfig, route etc) contains the correct PID of those events. Signed-off-by: Jamal Hadi Salim --=-Vf8rMgMoYxExZv+wVDZr Content-Disposition: attachment; filename=ifconf_pid_p Content-Type: text/plain; name=ifconf_pid_p; charset=UTF-8 Content-Transfer-Encoding: 7bit net/core/rtnetlink.c: needs update net/ipv4/devinet.c: needs update net/ipv4/fib_semantics.c: needs update net/ipv6/addrconf.c: needs update Index: net/core/rtnetlink.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/core/rtnetlink.c (mode:100644) +++ uncommitted/net/core/rtnetlink.c (mode:100644) @@ -452,7 +452,7 @@ if (!skb) return; - if (rtnetlink_fill_ifinfo(skb, dev, type, 0, 0, change, 0) < 0) { + if (rtnetlink_fill_ifinfo(skb, dev, type, current->pid, 0, change, 0) < 0) { kfree_skb(skb); return; } Index: net/ipv4/devinet.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/devinet.c (mode:100644) +++ uncommitted/net/ipv4/devinet.c (mode:100644) @@ -236,6 +236,7 @@ struct in_ifaddr *promote = NULL; struct in_ifaddr *ifa1 = *ifap; + printk("inet_del_ifa: pid %d\n",current->pid); ASSERT_RTNL(); /* 1. Deleting primary ifaddr forces deletion all secondaries @@ -305,6 +306,7 @@ ASSERT_RTNL(); + printk("inet_insert_ifa: pid %d\n",current->pid); if (!ifa->ifa_local) { inet_free_ifa(ifa); return 0; @@ -1112,7 +1114,7 @@ if (!skb) netlink_set_err(rtnl, 0, RTMGRP_IPV4_IFADDR, ENOBUFS); - else if (inet_fill_ifaddr(skb, ifa, 0, 0, event, 0) < 0) { + else if (inet_fill_ifaddr(skb, ifa, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV4_IFADDR, EINVAL); } else { Index: net/ipv4/fib_semantics.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/fib_semantics.c (mode:100644) +++ uncommitted/net/ipv4/fib_semantics.c (mode:100644) @@ -276,7 +276,7 @@ struct nlmsghdr *n, struct netlink_skb_parms *req) { struct sk_buff *skb; - u32 pid = req ? req->pid : 0; + u32 pid = req ? req->pid : n->nlmsg_pid; int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); skb = alloc_skb(size, GFP_KERNEL); @@ -1035,7 +1035,7 @@ } nl->nlmsg_flags = NLM_F_REQUEST; - nl->nlmsg_pid = 0; + nl->nlmsg_pid = current->pid; nl->nlmsg_seq = 0; nl->nlmsg_len = NLMSG_LENGTH(sizeof(*rtm)); if (cmd == SIOCDELRT) { Index: net/ipv6/addrconf.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv6/addrconf.c (mode:100644) +++ uncommitted/net/ipv6/addrconf.c (mode:100644) @@ -2872,7 +2872,7 @@ netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFADDR, ENOBUFS); return; } - if (inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0) < 0) { + if (inet6_fill_ifaddr(skb, ifa, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFADDR, EINVAL); return; @@ -3007,7 +3007,7 @@ netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFINFO, ENOBUFS); return; } - if (inet6_fill_ifinfo(skb, idev, 0, 0, event, 0) < 0) { + if (inet6_fill_ifinfo(skb, idev, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFINFO, EINVAL); return; @@ -3064,7 +3064,7 @@ netlink_set_err(rtnl, 0, RTMGRP_IPV6_PREFIX, ENOBUFS); return; } - if (inet6_fill_prefix(skb, idev, pinfo, 0, 0, event, 0) < 0) { + if (inet6_fill_prefix(skb, idev, pinfo, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV6_PREFIX, EINVAL); return; --=-Vf8rMgMoYxExZv+wVDZr-- From davem@davemloft.net Thu Jun 2 17:19:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 17:19:24 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j530JKXq016790 for ; Thu, 2 Jun 2005 17:19:21 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Ddzt6-0001j3-Rq; Thu, 02 Jun 2005 17:18:12 -0700 Date: Thu, 02 Jun 2005 17:18:12 -0700 (PDT) Message-Id: <20050602.171812.48807872.davem@davemloft.net> To: john.ronciak@intel.com Cc: jdmason@us.ibm.com, shemminger@osdl.org, hadi@cyberus.ca, mitch.a.williams@intel.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450BFDB@orsmsx408> References: <468F3FDA28AA87429AD807992E22D07E0450BFDB@orsmsx408> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2006 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 827 Lines: 15 From: "Ronciak, John" Date: Thu, 2 Jun 2005 17:11:20 -0700 > I like this idea as well but I do an issue with it. How would this > stack code find out that the weight is too high and pacekts are being > dropped (not being polled fast enough)? It would have to check the > controller stats to see the error count increasing for some period. I'm > not sure this is workable unless we have some sort of feedback which the > driver could send up (or set) saying that this is happening and the > dynamic weight code could take into acount. What more do you need other than checking the statistics counter? The drop statistics (the ones we care about) are incremented in real time by the ->poll() code, so it's not like we have to trigger some asynchronous event to get a current version of the number. From ravinandan.arakali@neterion.com Thu Jun 2 17:51:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 17:51:40 -0700 (PDT) Received: from linux.site (adsl-67-120-213-161.dsl.sntc01.pacbell.net [67.120.213.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j530paXq018846 for ; Thu, 2 Jun 2005 17:51:36 -0700 Received: by linux.site (Postfix, from userid 0) id 28C4B7B99F; Thu, 2 Jun 2005 17:43:58 -0700 (PDT) To: davem@davemloft.net, jgarzik@pobox.com, netdev@oss.sgi.com Cc: raghavendra.koushik@neterion.com, ravinandan.arakali@neterion.com, leonid.grossman@neterion.com, ananda.raju@neterion.com, rapuru.sriram@neterion.com From: ravinandan.arakali@neterion.com Subject: [PATCH 2.6.12-rc4] ethtool: Support for UDP Large Send Offload Message-Id: <20050603004358.28C4B7B99F@linux.site> Date: Thu, 2 Jun 2005 17:43:58 -0700 (PDT) X-archive-position: 2009 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ravinandan.arakali@neterion.com Precedence: bulk X-list: netdev Content-Length: 3847 Lines: 136 Hi, Attached below is a patch on ethtool utility to support USO(UDP Large Send Offload). Pls review the patch. Usage: 1. To view USO setting # ethtool -k 2. To set/unset USO # ethtool -K uso on|off Signed-off-by: Ananda Raju Signed-off-by: Ravinandan Arakali --- diff -uNr ethtool-3/ethtool-copy.h ethtool-3_uso/ethtool-copy.h --- ethtool-3/ethtool-copy.h 2005-01-28 01:50:26.000000000 +0545 +++ ethtool-3_uso/ethtool-copy.h 2005-06-02 23:06:48.000000000 +0545 @@ -283,6 +283,8 @@ #define ETHTOOL_GSTATS 0x0000001d /* get NIC-specific statistics */ #define ETHTOOL_GTSO 0x0000001e /* Get TSO enable (ethtool_value) */ #define ETHTOOL_STSO 0x0000001f /* Set TSO enable (ethtool_value) */ +#define ETHTOOL_GUSO 0x00000020 /* Get USO enable (ethtool_value) */ +#define ETHTOOL_SUSO 0x00000021 /* Set USO enable (ethtool_value) */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET diff -uNr ethtool-3/ethtool.c ethtool-3_uso/ethtool.c --- ethtool-3/ethtool.c 2005-01-28 04:19:29.000000000 +0545 +++ ethtool-3_uso/ethtool.c 2005-06-02 23:06:52.000000000 +0545 @@ -119,6 +119,7 @@ * [ tx on|off ] \ * [ sg on|off ] \ * [ tso on|off ] + * [ uso on|off ] * ethtool -r DEVNAME * ethtool -p DEVNAME [ %d ] * ethtool -t DEVNAME [ online|offline ] @@ -191,6 +192,7 @@ " [ tx on|off ] \\\n" " [ sg on|off ] \\\n" " [ tso on|off ]\n" + " [ uso on|off ]\n" " ethtool -r DEVNAME\n" " ethtool -p DEVNAME [ %%d ]\n" " ethtool -t DEVNAME [online|(offline)]\n" @@ -236,6 +238,7 @@ static int off_csum_tx_wanted = -1; static int off_sg_wanted = -1; static int off_tso_wanted = -1; +static int off_uso_wanted = -1; static struct ethtool_pauseparam epause; static int gpause_changed = 0; @@ -339,6 +342,7 @@ { "tx", CMDL_BOOL, &off_csum_tx_wanted, NULL }, { "sg", CMDL_BOOL, &off_sg_wanted, NULL }, { "tso", CMDL_BOOL, &off_tso_wanted, NULL }, + { "uso", CMDL_BOOL, &off_uso_wanted, NULL }, }; static struct cmdline_info cmdline_pause[] = { @@ -1184,17 +1188,19 @@ return 0; } -static int dump_offload (int rx, int tx, int sg, int tso) +static int dump_offload (int rx, int tx, int sg, int tso, int uso) { fprintf(stdout, "rx-checksumming: %s\n" "tx-checksumming: %s\n" "scatter-gather: %s\n" - "tcp segmentation offload: %s\n", + "tcp segmentation offload: %s\n" + "udp large send offload: %s\n", rx ? "on" : "off", tx ? "on" : "off", sg ? "on" : "off", - tso ? "on" : "off"); + tso ? "on" : "off", + uso ? "on" : "off"); return 0; } @@ -1458,7 +1464,7 @@ static int do_goffload(int fd, struct ifreq *ifr) { struct ethtool_value eval; - int err, allfail = 1, rx = 0, tx = 0, sg = 0, tso = 0; + int err, allfail = 1, rx = 0, tx = 0, sg = 0, tso = 0, uso = 0; fprintf(stdout, "Offload parameters for %s:\n", devname); @@ -1502,12 +1508,22 @@ allfail = 0; } + eval.cmd = ETHTOOL_GUSO; + ifr->ifr_data = (caddr_t)&eval; + err = ioctl(fd, SIOCETHTOOL, ifr); + if (err) + perror("Cannot get device udp large send offload settings"); + else { + uso = eval.data; + allfail = 0; + } + if (allfail) { fprintf(stdout, "no offload info available\n"); return 83; } - return dump_offload(rx, tx, sg, tso); + return dump_offload(rx, tx, sg, tso, uso); } static int do_soffload(int fd, struct ifreq *ifr) @@ -1562,6 +1578,17 @@ return 88; } } + if (off_uso_wanted >= 0) { + changed = 1; + eval.cmd = ETHTOOL_SUSO; + eval.data = (off_uso_wanted == 1); + ifr->ifr_data = (caddr_t)&eval; + err = ioctl(fd, SIOCETHTOOL, ifr); + if (err) { + perror("Cannot set device udp large send offload settings"); + return 89; + } + } if (!changed) { fprintf(stdout, "no offload settings changed\n"); } From ravinandan.arakali@neterion.com Thu Jun 2 17:48:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 17:48:52 -0700 (PDT) Received: from linux.site (adsl-67-120-213-161.dsl.sntc01.pacbell.net [67.120.213.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j530mlXq018431 for ; Thu, 2 Jun 2005 17:48:48 -0700 Received: by linux.site (Postfix, from userid 0) id BAB6A7B990; Thu, 2 Jun 2005 17:41:06 -0700 (PDT) To: davem@davemloft.net, jgarzik@pobox.com, netdev@oss.sgi.com Cc: raghavendra.koushik@neterion.com, ravinandan.arakali@neterion.com, leonid.grossman@neterion.com, ananda.raju@neterion.com, rapuru.sriram@neterion.com From: ravinandan.arakali@neterion.com Subject: [PATCH 2.6.12-rc4] IPv4/IPv6: USO v2, Scatter-gather approach Message-Id: <20050603004106.BAB6A7B990@linux.site> Date: Thu, 2 Jun 2005 17:41:06 -0700 (PDT) X-archive-position: 2007 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ravinandan.arakali@neterion.com Precedence: bulk X-list: netdev Content-Length: 14893 Lines: 444 Hi, Attached below is version 2 of kernel patch for UDP Large send offload feature. This patch uses the "Scatter-Gather" approach. It also incorporates David Miller's comments on the first version. Also, below is a "how-to" on changes required in network drivers to use the USO interface. UDP Large Send Offload (USO) Interface: -------------------------------------- USO is a feature wherein the Linux kernel network stack will offload the IP fragmentation functionality of large UDP datagram to hardware. This will reduce the overhead of stack in fragmenting the large UDP datagram to MTU sized packets. 1) Drivers indicate their capability of USO using dev->features |= NETIF_F_USO | NETIF_F_HW_CSUM | NETIF_F_SG NETIF_F_HW_CSUM is required for USO over ipv6. 2) USO packet will be submitted for transmission using driver xmit routine. USO packet will have a non-zero value for "skb_shinfo(skb)->uso_size" skb_shinfo(skb)->uso_size will indicate the length of data part in each IP fragment going out of the adapter after IP fragmentation by hardware. skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[] contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW indicating that hardware has to do checksum calculation. Hardware should compute the UDP checksum of complete datagram and also ip header checksum of each fragmented IP packet. For IPV6 the USO provides the fragment identification-id in skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating IPv6 fragments. Signed-off-by: Ananda Raju Signed-off-by: Ravinandan Arakali --- diff -uNr linux-2.6.12-rc4.org/include/linux/ethtool.h linux-2.6.12-rc4/include/linux/ethtool.h --- linux-2.6.12-rc4.org/include/linux/ethtool.h 2005-06-01 19:56:58.000000000 +0545 +++ linux-2.6.12-rc4/include/linux/ethtool.h 2005-06-01 19:51:47.000000000 +0545 @@ -260,6 +260,8 @@ int ethtool_op_set_sg(struct net_device *dev, u32 data); u32 ethtool_op_get_tso(struct net_device *dev); int ethtool_op_set_tso(struct net_device *dev, u32 data); +u32 ethtool_op_get_uso(struct net_device *dev); +int ethtool_op_set_uso(struct net_device *dev, u32 data); /** * ðtool_ops - Alter and report network device settings @@ -289,6 +291,8 @@ * set_sg: Turn scatter-gather on or off * get_tso: Report whether TCP segmentation offload is enabled * set_tso: Turn TCP segmentation offload on or off + * get_uso: Report whether UDP large send offload is enabled + * set_uso: Turn UDP large send offload on or off * self_test: Run specified self-tests * get_strings: Return a set of strings that describe the requested objects * phys_id: Identify the device @@ -353,6 +357,8 @@ void (*get_ethtool_stats)(struct net_device *, struct ethtool_stats *, u64 *); int (*begin)(struct net_device *); void (*complete)(struct net_device *); + u32 (*get_uso)(struct net_device *); + int (*set_uso)(struct net_device *, u32); }; /* CMDs currently supported */ @@ -388,6 +394,8 @@ #define ETHTOOL_GSTATS 0x0000001d /* get NIC-specific statistics */ #define ETHTOOL_GTSO 0x0000001e /* Get TSO enable (ethtool_value) */ #define ETHTOOL_STSO 0x0000001f /* Set TSO enable (ethtool_value) */ +#define ETHTOOL_GUSO 0x00000020 /* Get USO enable (ethtool_value) */ +#define ETHTOOL_SUSO 0x00000021 /* Set USO enable (ethtool_value) */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET diff -uNr linux-2.6.12-rc4.org/include/linux/netdevice.h linux-2.6.12-rc4/include/linux/netdevice.h --- linux-2.6.12-rc4.org/include/linux/netdevice.h 2005-05-25 17:18:11.000000000 +0545 +++ linux-2.6.12-rc4/include/linux/netdevice.h 2005-06-01 14:33:12.000000000 +0545 @@ -414,6 +414,7 @@ #define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */ #define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */ #define NETIF_F_LLTX 4096 /* LockLess TX */ +#define NETIF_F_USO 8192 /* Can offload UDP Large Send*/ /* Called after device is detached from network. */ void (*uninit)(struct net_device *dev); diff -uNr linux-2.6.12-rc4.org/include/linux/skbuff.h linux-2.6.12-rc4/include/linux/skbuff.h --- linux-2.6.12-rc4.org/include/linux/skbuff.h 2005-05-25 17:18:20.000000000 +0545 +++ linux-2.6.12-rc4/include/linux/skbuff.h 2005-06-01 15:18:44.000000000 +0545 @@ -135,6 +135,8 @@ atomic_t dataref; unsigned int nr_frags; unsigned short tso_size; + unsigned short uso_size; + unsigned int ip6_frag_id; unsigned short tso_segs; struct sk_buff *frag_list; skb_frag_t frags[MAX_SKB_FRAGS]; diff -uNr linux-2.6.12-rc4.org/include/net/sock.h linux-2.6.12-rc4/include/net/sock.h --- linux-2.6.12-rc4.org/include/net/sock.h 2005-05-25 17:18:44.000000000 +0545 +++ linux-2.6.12-rc4/include/net/sock.h 2005-05-25 20:28:14.000000000 +0545 @@ -1296,5 +1296,11 @@ return -ENODEV; } #endif +struct sk_buff *sock_append_data(struct sock *sk, + int getfrag(void *from, char *to, int offset, int len, + int odd, struct sk_buff *skb), + void *from, int length, int transhdrlen, + int hh_len, int fragheaderlen, + unsigned int flags,int *err); #endif /* _SOCK_H */ diff -uNr linux-2.6.12-rc4.org/net/core/dev.c linux-2.6.12-rc4/net/core/dev.c --- linux-2.6.12-rc4.org/net/core/dev.c 2005-06-01 14:35:01.000000000 +0545 +++ linux-2.6.12-rc4/net/core/dev.c 2005-06-01 19:46:03.000000000 +0545 @@ -2793,6 +2793,18 @@ dev->name); dev->features &= ~NETIF_F_TSO; } + if (dev->features & NETIF_F_USO) { + if (!(dev->features & NETIF_F_HW_CSUM)) { + printk("%s: Dropping NETIF_F_USO since no ", dev->name); + printk("NETIF_F_HW_CSUM feature.\n"); + dev->features &= ~NETIF_F_USO; + } + if (!(dev->features & NETIF_F_SG)) { + printk("%s: Dropping NETIF_F_USO since no ", dev->name); + printk("NETIF_F_SG feature.\n"); + dev->features &= ~NETIF_F_USO; + } + } /* * nil rebuild_header routine, diff -uNr linux-2.6.12-rc4.org/net/core/ethtool.c linux-2.6.12-rc4/net/core/ethtool.c --- linux-2.6.12-rc4.org/net/core/ethtool.c 2005-06-01 19:48:31.000000000 +0545 +++ linux-2.6.12-rc4/net/core/ethtool.c 2005-06-01 23:02:39.000000000 +0545 @@ -72,6 +72,21 @@ return 0; } +u32 ethtool_op_get_uso(struct net_device *dev) +{ + return (dev->features & NETIF_F_USO) != 0; +} + +int ethtool_op_set_uso(struct net_device *dev, u32 data) +{ + if (data) + dev->features |= NETIF_F_USO; + else + dev->features &= ~NETIF_F_USO; + + return 0; +} + /* Handlers for each ethtool command */ static int ethtool_get_settings(struct net_device *dev, void __user *useraddr) @@ -460,6 +475,9 @@ err = dev->ethtool_ops->set_tso(dev, 0); if (err) return err; + err = dev->ethtool_ops->set_uso(dev, 0); + if (err) + return err; } return dev->ethtool_ops->set_sg(dev, data); @@ -548,6 +566,39 @@ return dev->ethtool_ops->set_tso(dev, edata.data); } +static int ethtool_get_uso(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata = { ETHTOOL_GTSO }; + + if (!dev->ethtool_ops->get_uso) + return -EOPNOTSUPP; + + edata.data = dev->ethtool_ops->get_uso(dev); + + if (copy_to_user(useraddr, &edata, sizeof(edata))) + return -EFAULT; + return 0; +} + +static int ethtool_set_uso(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata; + + if (!dev->ethtool_ops->set_uso) + return -EOPNOTSUPP; + + if (copy_from_user(&edata, useraddr, sizeof(edata))) + return -EFAULT; + + if (edata.data && !(dev->features & NETIF_F_SG)) + return -EINVAL; + + if (edata.data && !(dev->features & NETIF_F_HW_CSUM)) + return -EINVAL; + + return dev->ethtool_ops->set_uso(dev, edata.data); +} + static int ethtool_self_test(struct net_device *dev, char __user *useraddr) { struct ethtool_test test; @@ -795,6 +846,12 @@ case ETHTOOL_GSTATS: rc = ethtool_get_stats(dev, useraddr); break; + case ETHTOOL_GUSO: + rc = ethtool_get_uso(dev, useraddr); + break; + case ETHTOOL_SUSO: + rc = ethtool_set_uso(dev, useraddr); + break; default: rc = -EOPNOTSUPP; } @@ -817,3 +874,6 @@ EXPORT_SYMBOL(ethtool_op_set_sg); EXPORT_SYMBOL(ethtool_op_set_tso); EXPORT_SYMBOL(ethtool_op_set_tx_csum); +EXPORT_SYMBOL(ethtool_op_set_uso); +EXPORT_SYMBOL(ethtool_op_get_uso); + diff -uNr linux-2.6.12-rc4.org/net/core/skbuff.c linux-2.6.12-rc4/net/core/skbuff.c --- linux-2.6.12-rc4.org/net/core/skbuff.c 2005-05-25 20:25:35.000000000 +0545 +++ linux-2.6.12-rc4/net/core/skbuff.c 2005-06-01 14:34:27.000000000 +0545 @@ -159,6 +159,8 @@ skb_shinfo(skb)->tso_size = 0; skb_shinfo(skb)->tso_segs = 0; skb_shinfo(skb)->frag_list = NULL; + skb_shinfo(skb)->uso_size = 0; + skb_shinfo(skb)->ip6_frag_id = 0; out: return skb; nodata: diff -uNr linux-2.6.12-rc4.org/net/core/sock.c linux-2.6.12-rc4/net/core/sock.c --- linux-2.6.12-rc4.org/net/core/sock.c 2005-05-25 20:25:47.000000000 +0545 +++ linux-2.6.12-rc4/net/core/sock.c 2005-06-01 19:40:03.000000000 +0545 @@ -1401,6 +1401,102 @@ EXPORT_SYMBOL(proto_unregister); +/* + * sock_append_data - append the user data to a skb, + * sk - sock structure which contains skbs for transmission + * getfrag - The function to be called to get the data from the user. + * from - pointer to user message iov + * length - length of the iov message + * transhdrlen - transport header length + * hh_len - hardware header length + * fragheaderlen - length of the IP header + * flags - iov message flags + * err - Error code returned + * + * This procedure will allocate a skb enough to hold protocol headers and + * append the user data in the fragment part of the skb and add the skb to + * socket write queue + */ +struct sk_buff *sock_append_data(struct sock *sk, + int getfrag(void *from, char *to, int offset, int len, + int odd, struct sk_buff *skb), + void *from, int length, int transhdrlen, + int hh_len, int fragheaderlen, + unsigned int flags,int *err) +{ + struct sk_buff *skb; + int frg_cnt = 0; + skb_frag_t *frag = NULL; + struct page *page = NULL; + int copy, left; + int offset = 0; + + if (skb_queue_len(&sk->sk_write_queue)) { + *err = -EOPNOTSUPP; + return NULL; + } + + skb = sock_alloc_send_skb(sk, + hh_len + fragheaderlen + transhdrlen + 20, + (flags & MSG_DONTWAIT), err); + if (skb == NULL) { + *err = -ENOMEM; + return NULL; + } + /* reserve space for Hardware header */ + skb_reserve(skb, hh_len); + /* create space for UDP/IP header */ + skb_put(skb,fragheaderlen + transhdrlen); + /* initialize network header pointer */ + skb->nh.raw = skb->data; + /* initialize protocol header pointer */ + skb->h.raw = skb->data + fragheaderlen; + skb->ip_summed = CHECKSUM_HW; + skb->csum = 0; + do { + copy = length; + if (frg_cnt >= MAX_SKB_FRAGS) { + *err = -EFAULT; + kfree_skb(skb); + return NULL; + } + page = alloc_pages(sk->sk_allocation, 0); + if (page == NULL) { + *err = -ENOMEM; + kfree_skb(skb); + return NULL; + } + sk->sk_sndmsg_page = page; + sk->sk_sndmsg_off = 0; + skb_fill_page_desc(skb, frg_cnt, page, 0, 0); + skb->truesize += PAGE_SIZE; + atomic_add(PAGE_SIZE, &sk->sk_wmem_alloc); + frg_cnt = skb_shinfo(skb)->nr_frags; + frag = &skb_shinfo(skb)->frags[frg_cnt - 1]; + left = PAGE_SIZE - frag->page_offset; + if (copy > left) + copy = left; + if (getfrag(from, page_address(frag->page)+ + frag->page_offset+frag->size, + offset, copy, 0, skb) < 0) { + *err = -EFAULT; + kfree_skb(skb); + return NULL; + } + sk->sk_sndmsg_off += copy; + frag->size += copy; + skb->len += copy; + skb->data_len += copy; + offset += copy; + length -= copy; + page = NULL; + } while (length > 0); + __skb_queue_tail(&sk->sk_write_queue, skb); + *err = 0; + return skb; +} +EXPORT_SYMBOL(sock_append_data); + #ifdef CONFIG_PROC_FS static inline struct proto *__proto_head(void) { diff -uNr linux-2.6.12-rc4.org/net/ipv4/ip_output.c linux-2.6.12-rc4/net/ipv4/ip_output.c --- linux-2.6.12-rc4.org/net/ipv4/ip_output.c 2005-05-25 20:26:07.000000000 +0545 +++ linux-2.6.12-rc4/net/ipv4/ip_output.c 2005-06-02 22:04:59.000000000 +0545 @@ -291,7 +291,8 @@ { IP_INC_STATS(IPSTATS_MIB_OUTREQUESTS); - if (skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->tso_size) + if (skb->len > dst_mtu(skb->dst) && + !(skb_shinfo(skb)->uso_size || skb_shinfo(skb)->tso_size)) return ip_fragment(skb, ip_finish_output); else return ip_finish_output(skb); @@ -789,6 +790,28 @@ inet->cork.length += length; + if (((length > mtu) && (sk->sk_protocol == IPPROTO_UDP)) && + (rt->u.dst.dev->features & NETIF_F_USO)) { + /* There is support for UDP large send offload by network + * device, so create one single skb packet containing complete + * udp datagram + */ + skb = sock_append_data(sk, getfrag, from, + (length - transhdrlen), transhdrlen, + hh_len, fragheaderlen, flags, &err); + if (skb != NULL) { + /* specify the length of each IP datagram fragment*/ + skb_shinfo(skb)->uso_size = (mtu - fragheaderlen); + return 0; + } else if (err == -EOPNOTSUPP) { + /* There is not enough support do UPD LSO, + * so follow normal path + */ + err = 0; + } else + goto error; + } + /* So, what's going on in the loop below? * * We use calculated fragment length to generate chained skb, diff -uNr linux-2.6.12-rc4.org/net/ipv6/ip6_output.c linux-2.6.12-rc4/net/ipv6/ip6_output.c --- linux-2.6.12-rc4.org/net/ipv6/ip6_output.c 2005-05-25 20:26:17.000000000 +0545 +++ linux-2.6.12-rc4/net/ipv6/ip6_output.c 2005-06-02 22:05:24.000000000 +0545 @@ -147,7 +147,8 @@ int ip6_output(struct sk_buff *skb) { - if (skb->len > dst_mtu(skb->dst) || dst_allfrag(skb->dst)) + if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->uso_size) || + dst_allfrag(skb->dst)) return ip6_fragment(skb, ip6_output2); else return ip6_output2(skb); @@ -898,6 +899,33 @@ */ inet->cork.length += length; + if (((length > mtu) && (sk->sk_protocol == IPPROTO_UDP)) && + (rt->u.dst.dev->features & NETIF_F_USO)) { + + /* There is support for UDP large send offload by network + * device, so create one single skb packet containing complete + * udp datagram + */ + skb = sock_append_data(sk, getfrag, from, + (length - transhdrlen), transhdrlen, + hh_len, fragheaderlen, flags, &err); + if (skb != NULL) { + struct frag_hdr fhdr; + + /* specify the length of each IP datagram fragment*/ + skb_shinfo(skb)->uso_size = (mtu - fragheaderlen - + sizeof(struct frag_hdr)); + ipv6_select_ident(skb, &fhdr); + skb_shinfo(skb)->ip6_frag_id = fhdr.identification; + return 0; + } else if (err == -EOPNOTSUPP){ + /* There is not enough support for UDP LSO, + * so follow normal path + */ + err = 0; + } else + goto error; + } if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) goto alloc_new_skb; From ravinandan.arakali@neterion.com Thu Jun 2 17:51:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 17:51:32 -0700 (PDT) Received: from linux.site (adsl-67-120-213-161.dsl.sntc01.pacbell.net [67.120.213.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j530pTXq018773 for ; Thu, 2 Jun 2005 17:51:29 -0700 Received: by linux.site (Postfix, from userid 0) id 41BFB7B990; Thu, 2 Jun 2005 17:43:51 -0700 (PDT) To: davem@davemloft.net, jgarzik@pobox.com, netdev@oss.sgi.com Cc: raghavendra.koushik@neterion.com, ravinandan.arakali@neterion.com, leonid.grossman@neterion.com, ananda.raju@neterion.com, rapuru.sriram@neterion.com From: ravinandan.arakali@neterion.com Subject: [PATCH 2.6.12-rc4] IPv4/IPv6: USO v2, fragment list approach Message-Id: <20050603004351.41BFB7B990@linux.site> Date: Thu, 2 Jun 2005 17:43:51 -0700 (PDT) X-archive-position: 2008 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ravinandan.arakali@neterion.com Precedence: bulk X-list: netdev Content-Length: 11495 Lines: 322 Hi, Attached below is version 2 of kernel patch for UDP Large send offload feature. This patch uses the "fragment list" approach. It also incorporates David Miller's comments on the first version. Also, below is a "how-to" on changes required in network drivers to use the USO interface. UDP Large Send Offload (USO) Interface: --------------------------------------- USO is a feature wherein the Linux kernel network stack will offload the IP fragmentation functionality of large UDP datagram to hardware. This will reduce the overhead of stack in fragmenting the large UDP datagram to MTU sized packets. 1) Drivers indicate their capability of USO using dev->features |= NETIF_F_USO | NETIF_F_HW_CSUM | NETIF_F_FRAGLIST NETIF_F_HW_CSUM is required for USO over IPv6. 2) USO packet will be submitted for transmission using driver xmit routine. USO packet will have a non zero value for "skb_shinfo(skb)->uso_size" skb_shinfo(skb)->uso_size indicates the length of data part in each IP fragment going out of the adapter after IP fragmentation by hardware. skb->data and skb_shinfo(skb)->frag_list will contain complete large UDP datagram. The driver is required to traverse each skb in skb_shinfo(skb)->frag_list to get complete UDP packet. The skb->ip_summed will be set to CHECKSUM_HW indicating that hardware has to perform checksum calculation. Hardware should compute the UDP checksum of complete UDP datagram and also ip header checksum of each fragmented IP packet. For IPV6 the USO provides the fragment identification id in skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating IPv6 fragments. Signed-off-by: Ananda Raju Signed-off-by: Ravinandan Arakali --- diff -uNr linux-2.6.12-rc4.org/include/linux/ethtool.h linux-2.6.12-rc4/include/linux/ethtool.h --- linux-2.6.12-rc4.org/include/linux/ethtool.h 2005-06-02 16:55:51.000000000 +0545 +++ linux-2.6.12-rc4/include/linux/ethtool.h 2005-06-02 16:56:46.000000000 +0545 @@ -260,6 +260,8 @@ int ethtool_op_set_sg(struct net_device *dev, u32 data); u32 ethtool_op_get_tso(struct net_device *dev); int ethtool_op_set_tso(struct net_device *dev, u32 data); +u32 ethtool_op_get_uso(struct net_device *dev); +int ethtool_op_set_uso(struct net_device *dev, u32 data); /** * ðtool_ops - Alter and report network device settings @@ -289,6 +291,8 @@ * set_sg: Turn scatter-gather on or off * get_tso: Report whether TCP segmentation offload is enabled * set_tso: Turn TCP segmentation offload on or off + * get_uso: Report whether UDP large send offload is enabled + * set_uso: Turn UDP large send offload on or off * self_test: Run specified self-tests * get_strings: Return a set of strings that describe the requested objects * phys_id: Identify the device @@ -353,6 +357,8 @@ void (*get_ethtool_stats)(struct net_device *, struct ethtool_stats *, u64 *); int (*begin)(struct net_device *); void (*complete)(struct net_device *); + u32 (*get_uso)(struct net_device *); + int (*set_uso)(struct net_device *, u32); }; /* CMDs currently supported */ @@ -388,6 +394,8 @@ #define ETHTOOL_GSTATS 0x0000001d /* get NIC-specific statistics */ #define ETHTOOL_GTSO 0x0000001e /* Get TSO enable (ethtool_value) */ #define ETHTOOL_STSO 0x0000001f /* Set TSO enable (ethtool_value) */ +#define ETHTOOL_GUSO 0x00000020 /* Get USO enable (ethtool_value) */ +#define ETHTOOL_SUSO 0x00000021 /* Set USO enable (ethtool_value) */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET diff -uNr linux-2.6.12-rc4.org/include/linux/netdevice.h linux-2.6.12-rc4/include/linux/netdevice.h --- linux-2.6.12-rc4.org/include/linux/netdevice.h 2005-05-27 23:22:46.000000000 +0545 +++ linux-2.6.12-rc4/include/linux/netdevice.h 2005-05-31 10:02:02.000000000 +0545 @@ -414,6 +414,7 @@ #define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */ #define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */ #define NETIF_F_LLTX 4096 /* LockLess TX */ +#define NETIF_F_USO 8192 /* Can offload UDP Large Send*/ /* Called after device is detached from network. */ void (*uninit)(struct net_device *dev); diff -uNr linux-2.6.12-rc4.org/include/linux/skbuff.h linux-2.6.12-rc4/include/linux/skbuff.h --- linux-2.6.12-rc4.org/include/linux/skbuff.h 2005-05-27 23:22:46.000000000 +0545 +++ linux-2.6.12-rc4/include/linux/skbuff.h 2005-06-02 20:27:43.000000000 +0545 @@ -136,6 +136,8 @@ unsigned int nr_frags; unsigned short tso_size; unsigned short tso_segs; + unsigned short uso_size; + unsigned int ip6_frag_id; struct sk_buff *frag_list; skb_frag_t frags[MAX_SKB_FRAGS]; }; diff -uNr linux-2.6.12-rc4.org/net/core/dev.c linux-2.6.12-rc4/net/core/dev.c --- linux-2.6.12-rc4.org/net/core/dev.c 2005-05-28 01:49:18.000000000 +0545 +++ linux-2.6.12-rc4/net/core/dev.c 2005-05-31 22:57:22.000000000 +0545 @@ -2793,6 +2793,18 @@ dev->name); dev->features &= ~NETIF_F_TSO; } + if (dev->features & NETIF_F_USO) { + if(!(dev->features & NETIF_F_FRAGLIST)) { + printk("%s: Dropping NETIF_F_USO since no ", dev->name); + printk("NETIF_F_FRAGLIST feature.\n"); + dev->features &= ~NETIF_F_USO; + } + if(!(dev->features & NETIF_F_HW_CSUM)) { + printk("%s: Dropping NETIF_F_USO since no ", dev->name); + printk("NETIF_F_HW_CSUM feature.\n"); + dev->features &= ~NETIF_F_USO; + } + } /* * nil rebuild_header routine, diff -uNr linux-2.6.12-rc4.org/net/core/ethtool.c linux-2.6.12-rc4/net/core/ethtool.c --- linux-2.6.12-rc4.org/net/core/ethtool.c 2005-06-02 16:55:32.000000000 +0545 +++ linux-2.6.12-rc4/net/core/ethtool.c 2005-06-02 21:53:16.000000000 +0545 @@ -72,6 +72,21 @@ return 0; } +u32 ethtool_op_get_uso(struct net_device *dev) +{ + return (dev->features & NETIF_F_USO) != 0; +} + +int ethtool_op_set_uso(struct net_device *dev, u32 data) +{ + if (data) + dev->features |= NETIF_F_USO; + else + dev->features &= ~NETIF_F_USO; + + return 0; +} + /* Handlers for each ethtool command */ static int ethtool_get_settings(struct net_device *dev, void __user *useraddr) @@ -548,6 +563,39 @@ return dev->ethtool_ops->set_tso(dev, edata.data); } +static int ethtool_get_uso(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata = { ETHTOOL_GTSO }; + + if (!dev->ethtool_ops->get_uso) + return -EOPNOTSUPP; + + edata.data = dev->ethtool_ops->get_uso(dev); + + if (copy_to_user(useraddr, &edata, sizeof(edata))) + return -EFAULT; + return 0; +} + +static int ethtool_set_uso(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata; + + if (!dev->ethtool_ops->set_uso) + return -EOPNOTSUPP; + + if (copy_from_user(&edata, useraddr, sizeof(edata))) + return -EFAULT; + + if (edata.data && !(dev->features & NETIF_F_FRAGLIST)) + return -EINVAL; + + if (edata.data && !(dev->features & NETIF_F_HW_CSUM)) + return -EINVAL; + + return dev->ethtool_ops->set_uso(dev, edata.data); +} + static int ethtool_self_test(struct net_device *dev, char __user *useraddr) { struct ethtool_test test; @@ -795,6 +843,12 @@ case ETHTOOL_GSTATS: rc = ethtool_get_stats(dev, useraddr); break; + case ETHTOOL_GUSO: + rc = ethtool_get_uso(dev, useraddr); + break; + case ETHTOOL_SUSO: + rc = ethtool_set_uso(dev, useraddr); + break; default: rc = -EOPNOTSUPP; } @@ -817,3 +871,6 @@ EXPORT_SYMBOL(ethtool_op_set_sg); EXPORT_SYMBOL(ethtool_op_set_tso); EXPORT_SYMBOL(ethtool_op_set_tx_csum); +EXPORT_SYMBOL(ethtool_op_set_uso); +EXPORT_SYMBOL(ethtool_op_get_uso); + diff -uNr linux-2.6.12-rc4.org/net/core/skbuff.c linux-2.6.12-rc4/net/core/skbuff.c --- linux-2.6.12-rc4.org/net/core/skbuff.c 2005-05-27 23:22:46.000000000 +0545 +++ linux-2.6.12-rc4/net/core/skbuff.c 2005-06-02 20:27:27.000000000 +0545 @@ -159,6 +159,8 @@ skb_shinfo(skb)->tso_size = 0; skb_shinfo(skb)->tso_segs = 0; skb_shinfo(skb)->frag_list = NULL; + skb_shinfo(skb)->ip6_frag_id = 0; + skb_shinfo(skb)->uso_size = 0; out: return skb; nodata: diff -uNr linux-2.6.12-rc4.org/net/ipv4/ip_output.c linux-2.6.12-rc4/net/ipv4/ip_output.c --- linux-2.6.12-rc4.org/net/ipv4/ip_output.c 2005-05-27 23:22:46.000000000 +0545 +++ linux-2.6.12-rc4/net/ipv4/ip_output.c 2005-05-31 15:55:39.000000000 +0545 @@ -291,7 +291,8 @@ { IP_INC_STATS(IPSTATS_MIB_OUTREQUESTS); - if (skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->tso_size) + if (skb->len > dst_mtu(skb->dst) && + !(skb_shinfo(skb)->tso_size || skb_shinfo(skb)->uso_size)) return ip_fragment(skb, ip_finish_output); else return ip_finish_output(skb); @@ -768,7 +769,6 @@ mtu = inet->cork.fragsize; } hh_len = LL_RESERVED_SPACE(rt->u.dst.dev); - fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0); maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen; @@ -864,6 +864,12 @@ skb->ip_summed = csummode; skb->csum = 0; skb_reserve(skb, hh_len); + if ((!offset) && (length > mtu) && + (sk->sk_protocol == IPPROTO_UDP) && + (rt->u.dst.dev->features & NETIF_F_USO)) { + skb_shinfo(skb)->uso_size = mtu - fragheaderlen; + skb->ip_summed = CHECKSUM_HW; + } /* * Find where to start putting bytes. diff -uNr linux-2.6.12-rc4.org/net/ipv4/udp.c linux-2.6.12-rc4/net/ipv4/udp.c --- linux-2.6.12-rc4.org/net/ipv4/udp.c 2005-05-27 23:23:55.000000000 +0545 +++ linux-2.6.12-rc4/net/ipv4/udp.c 2005-05-31 21:14:44.000000000 +0545 @@ -424,9 +424,10 @@ goto send; } - if (skb_queue_len(&sk->sk_write_queue) == 1) { + if ((skb_queue_len(&sk->sk_write_queue) == 1) || + (skb_shinfo(skb)->uso_size)) { /* - * Only one fragment on the socket. + * Only one fragment on the socket or it is udp lso skb. */ if (skb->ip_summed == CHECKSUM_HW) { skb->csum = offsetof(struct udphdr, check); diff -uNr linux-2.6.12-rc4.org/net/ipv6/ip6_output.c linux-2.6.12-rc4/net/ipv6/ip6_output.c --- linux-2.6.12-rc4.org/net/ipv6/ip6_output.c 2005-05-27 23:22:46.000000000 +0545 +++ linux-2.6.12-rc4/net/ipv6/ip6_output.c 2005-06-02 20:27:55.000000000 +0545 @@ -147,7 +147,8 @@ int ip6_output(struct sk_buff *skb) { - if (skb->len > dst_mtu(skb->dst) || dst_allfrag(skb->dst)) + if ((skb->len > dst_mtu(skb->dst) || dst_allfrag(skb->dst)) && + !skb_shinfo(skb)->uso_size) return ip6_fragment(skb, ip6_output2); else return ip6_output2(skb); @@ -977,6 +978,19 @@ skb->csum = 0; /* reserve for fragmentation */ skb_reserve(skb, hh_len+sizeof(struct frag_hdr)); + if ((!offset) && (length > mtu) && + (sk->sk_protocol == IPPROTO_UDP) && + (rt->u.dst.dev->features & NETIF_F_USO)) { + struct frag_hdr fhdr; + + skb_shinfo(skb)->uso_size = + (mtu - fragheaderlen - + sizeof(struct frag_hdr)); + skb->ip_summed = CHECKSUM_HW; + ipv6_select_ident(skb, &fhdr); + skb_shinfo(skb)->ip6_frag_id = + fhdr.identification; + } /* * Find where to start putting bytes diff -uNr linux-2.6.12-rc4.org/net/ipv6/udp.c linux-2.6.12-rc4/net/ipv6/udp.c --- linux-2.6.12-rc4.org/net/ipv6/udp.c 2005-05-27 23:24:12.000000000 +0545 +++ linux-2.6.12-rc4/net/ipv6/udp.c 2005-05-31 17:32:31.000000000 +0545 @@ -590,7 +590,8 @@ goto send; } - if (skb_queue_len(&sk->sk_write_queue) == 1) { + if ((skb_queue_len(&sk->sk_write_queue) == 1) || + (skb_shinfo(skb)->uso_size)) { skb->csum = csum_partial((char *)uh, sizeof(struct udphdr), skb->csum); uh->check = csum_ipv6_magic(&fl->fl6_src, From tgraf@suug.ch Thu Jun 2 18:01:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 18:01:42 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5311dXq021235 for ; Thu, 2 Jun 2005 18:01:39 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 3FBB51C0EF; Fri, 3 Jun 2005 03:00:59 +0200 (CEST) Date: Fri, 3 Jun 2005 03:00:59 +0200 From: Thomas Graf To: jamal Cc: "David S. Miller" , netdev Subject: Re: PATCH: ioctl send PID in netlink events Message-ID: <20050603010059.GU15391@postel.suug.ch> References: <1117720349.6050.59.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1117720349.6050.59.camel@localhost.localdomain> X-archive-position: 2010 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1453 Lines: 48 > Index: net/ipv4/devinet.c > =================================================================== > --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/devinet.c (mode:100644) > +++ uncommitted/net/ipv4/devinet.c (mode:100644) > @@ -236,6 +236,7 @@ > struct in_ifaddr *promote = NULL; > struct in_ifaddr *ifa1 = *ifap; > > + printk("inet_del_ifa: pid %d\n",current->pid); > ASSERT_RTNL(); > > /* 1. Deleting primary ifaddr forces deletion all secondaries > @@ -305,6 +306,7 @@ > > ASSERT_RTNL(); > > + printk("inet_insert_ifa: pid %d\n",current->pid); > if (!ifa->ifa_local) { > inet_free_ifa(ifa); > return 0; Don't you want to remove these? > Index: net/ipv4/fib_semantics.c > =================================================================== > --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/fib_semantics.c (mode:100644) > +++ uncommitted/net/ipv4/fib_semantics.c (mode:100644) > @@ -276,7 +276,7 @@ > struct nlmsghdr *n, struct netlink_skb_parms *req) > { > struct sk_buff *skb; > - u32 pid = req ? req->pid : 0; > + u32 pid = req ? req->pid : n->nlmsg_pid; > int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); > > skb = alloc_skb(size, GFP_KERNEL); > @@ -1035,7 +1035,7 @@ > } > > nl->nlmsg_flags = NLM_F_REQUEST; > - nl->nlmsg_pid = 0; > + nl->nlmsg_pid = current->pid; > nl->nlmsg_seq = 0; > nl->nlmsg_len = NLMSG_LENGTH(sizeof(*rtm)); > if (cmd == SIOCDELRT) { Neat ;-> From hadi@cyberus.ca Thu Jun 2 18:38:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 18:38:44 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j531cfXq023346 for ; Thu, 2 Jun 2005 18:38:41 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1De18A-0000Ng-QE for netdev@oss.sgi.com; Thu, 02 Jun 2005 21:37:50 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1De181-0000zG-1e; Thu, 02 Jun 2005 21:37:41 -0400 Subject: Re: PATCH: ioctl send PID in netlink events From: jamal Reply-To: hadi@cyberus.ca To: Thomas Graf Cc: "David S. Miller" , netdev In-Reply-To: <20050603010059.GU15391@postel.suug.ch> References: <1117720349.6050.59.camel@localhost.localdomain> <20050603010059.GU15391@postel.suug.ch> Content-Type: text/plain Organization: unknown Date: Thu, 02 Jun 2005 21:37:35 -0400 Message-Id: <1117762655.6095.3.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2011 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 1809 Lines: 62 On Fri, 2005-03-06 at 03:00 +0200, Thomas Graf wrote: > > Index: net/ipv4/devinet.c > > =================================================================== > > --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/devinet.c (mode:100644) > > +++ uncommitted/net/ipv4/devinet.c (mode:100644) > > @@ -236,6 +236,7 @@ > > struct in_ifaddr *promote = NULL; > > struct in_ifaddr *ifa1 = *ifap; > > > > + printk("inet_del_ifa: pid %d\n",current->pid); > > ASSERT_RTNL(); > > > > /* 1. Deleting primary ifaddr forces deletion all secondaries > > @@ -305,6 +306,7 @@ > > > > ASSERT_RTNL(); > > > > + printk("inet_insert_ifa: pid %d\n",current->pid); > > if (!ifa->ifa_local) { > > inet_free_ifa(ifa); > > return 0; > > Don't you want to remove these? > > Yes, how did those get there? ;-> > > Index: net/ipv4/fib_semantics.c > > =================================================================== > > --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/fib_semantics.c (mode:100644) > > +++ uncommitted/net/ipv4/fib_semantics.c (mode:100644) > > @@ -276,7 +276,7 @@ > > struct nlmsghdr *n, struct netlink_skb_parms *req) > > { > > struct sk_buff *skb; > > - u32 pid = req ? req->pid : 0; > > + u32 pid = req ? req->pid : n->nlmsg_pid; > > int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); > > > > skb = alloc_skb(size, GFP_KERNEL); > > @@ -1035,7 +1035,7 @@ > > } > > > > nl->nlmsg_flags = NLM_F_REQUEST; > > - nl->nlmsg_pid = 0; > > + nl->nlmsg_pid = current->pid; > > nl->nlmsg_seq = 0; > > nl->nlmsg_len = NLMSG_LENGTH(sizeof(*rtm)); > > if (cmd == SIOCDELRT) { > > Neat ;-> The second one could probably use the new macros. Maybe i will wait until Dave puts this in his tree and send a small change; else you could send it. cheers, jamal From hadi@cyberus.ca Thu Jun 2 19:37:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 19:37:29 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j532bMXq027358 for ; Thu, 2 Jun 2005 19:37:27 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1De22u-00056A-E8 for netdev@oss.sgi.com; Thu, 02 Jun 2005 22:36:28 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1De22p-0002eo-L3; Thu, 02 Jun 2005 22:36:23 -0400 Subject: Re: [PATCH] shaper.c: fix locking From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: hch@lst.de, netdev@oss.sgi.com In-Reply-To: <20050602.163628.01205145.davem@davemloft.net> References: <20050527115450.GA19469@lst.de> <20050531.144114.78710204.davem@davemloft.net> <20050601052149.GA11935@lst.de> <20050602.163628.01205145.davem@davemloft.net> Content-Type: text/plain Organization: unknown Date: Thu, 02 Jun 2005 22:36:17 -0400 Message-Id: <1117766177.6095.51.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2013 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 249 Lines: 10 On Thu, 2005-02-06 at 16:36 -0700, David S. Miller wrote: > Fair enough, patch applied. If this driver breaks as a result of > these changes, you get to keep the pieces ok? :-) The question is anyone really using this driver? ;-> cheers, jamal From hadi@cyberus.ca Thu Jun 2 19:33:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 19:33:45 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j532XfXq027002 for ; Thu, 2 Jun 2005 19:33:41 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1De1zI-0000o5-El for netdev@oss.sgi.com; Thu, 02 Jun 2005 22:32:44 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1De1zG-00029R-2M; Thu, 02 Jun 2005 22:32:42 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, mitch.a.williams@intel.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <20050602.171812.48807872.davem@davemloft.net> References: <468F3FDA28AA87429AD807992E22D07E0450BFDB@orsmsx408> <20050602.171812.48807872.davem@davemloft.net> Content-Type: text/plain Organization: unknown Date: Thu, 02 Jun 2005 22:32:33 -0400 Message-Id: <1117765954.6095.49.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2012 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 2544 Lines: 54 On Thu, 2005-02-06 at 17:18 -0700, David S. Miller wrote: > From: "Ronciak, John" > Date: Thu, 2 Jun 2005 17:11:20 -0700 > > > I like this idea as well but I do an issue with it. How would this > > stack code find out that the weight is too high and pacekts are being > > dropped (not being polled fast enough)? It would have to check the > > controller stats to see the error count increasing for some period. I'm > > not sure this is workable unless we have some sort of feedback which the > > driver could send up (or set) saying that this is happening and the > > dynamic weight code could take into acount. > > What more do you need other than checking the statistics counter? The > drop statistics (the ones we care about) are incremented in real time > by the ->poll() code, so it's not like we have to trigger some > asynchronous event to get a current version of the number. I am reading through all the emails and I think either the problem is not being clearly stated or not understood. I was going to say "or i am on crack "- but I know i am clean ;-> Heres what i think i saw as a flow of events: Someone posted a theory that if you happen to reduce the weight (iirc the reduction was via a shift) then the DRR would give less CPU time cycle to the driver - Whats the big suprise there? thats DRR design intent. Stephen has a patch which allows people to reduce the weight. DRR provides fairness. If you have 10 NICs coming at different wire rates, the weights provide a fairness quota without caring about what those speeds are. So it doesnt make any sense IMO to have the weight based on what the NIC speed is. Infact i claim it is _nonsense_. You dont need to factor speed. And the claim that DRR is not real world is blasphemous. Having said that: I have a feeling that issue which is which is being waded around is the amount that the softirq chews in the CPU (unfortunately a well known issue) and to some extent the packet flow a specific driver chews depending on the path it takes. In other words, for DRR algorithm to enhance the fairness it should consider not only fairness in the amounts of packets the driver injects into the system but also the amount of CPU that driver chews. At the moment we lump all drivers together as far as the CPU cycles are concerned. If we could narrow it down to this, then i think there is something that could lead to meaningful discussion. This, however, does not eradicate the need for DRR and is absolutely not driver specific. cheers, jamal From raghunathan.venkatesan@wipro.com Thu Jun 2 20:03:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 20:03:05 -0700 (PDT) Received: from wip-ec-wd.wipro.com (wip-ec-wd.wipro.com [203.101.113.39]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53331Xq029634 for ; Thu, 2 Jun 2005 20:03:01 -0700 Received: from wip-ec-wd.wipro.com (localhost.wipro.com [127.0.0.1]) by localhost (Postfix) with ESMTP id B83EC205E8; Fri, 3 Jun 2005 08:23:02 +0530 (IST) Received: from blr-ec-bh01.wipro.com (unknown [10.201.50.91]) by wip-ec-wd.wipro.com (Postfix) with ESMTP id 9C493205E5; Fri, 3 Jun 2005 08:23:02 +0530 (IST) Received: from chn-snr-bh2.wipro.com ([10.145.50.92]) by blr-ec-bh01.wipro.com with Microsoft SMTPSVC(6.0.3790.211); Fri, 3 Jun 2005 08:31:47 +0530 Received: from CHN-SNR-MBX01.wipro.com ([10.145.50.181]) by chn-snr-bh2.wipro.com with Microsoft SMTPSVC(6.0.3790.0); Fri, 3 Jun 2005 08:32:04 +0530 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: Unable to handle kernel paging request at virtual address 04000460 Date: Fri, 3 Jun 2005 08:28:34 +0530 Message-ID: <438662DA48DCAA41B1DF648BD4BD76C0E98682@CHN-SNR-MBX01.wipro.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Unable to handle kernel paging request at virtual address 04000460 Thread-Index: AcVnmr7o+HCu/Cf9S06Mjf+yNyDZmwATUH0g From: To: Cc: , , , X-OriginalArrivalTime: 03 Jun 2005 03:02:04.0645 (UTC) FILETIME=[9EEEC950:01C567E8] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j53331Xq029634 X-archive-position: 2014 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: raghunathan.venkatesan@wipro.com Precedence: bulk X-list: netdev Content-Length: 1431 Lines: 40 Hi Stephen, I appreciate you response. We'll get deeper into the problem after turning on these debugs. Thanks, Raghu -----Original Message----- From: Stephen Hemminger [mailto:shemminger@osdl.org] Sent: Thursday, June 02, 2005 11:14 PM To: Raghunathan Venkatesan (WT01 - EMBEDDED & PRODUCT ENGINEERING SOLUTIONS) Cc: davem@davemloft.net; linux-net@vger.kernel.org; netdev@oss.sgi.com; linux@der-keiler.de Subject: Re: Unable to handle kernel paging request at virtual address 04000460 On Thu, 2 Jun 2005 09:20:21 +0530 wrote: > Hi David, > I understand that the linux community may not be able to debug it for > me. All I require is if people have seen similar problems (the > problems we face are w.r.t to kfree_skb and skb_drop_fraglist crashing > due to some reason, which could be a Memory Management issue or some > thing we are not aware of), then let us know the patches, so that we > can try them out here. Turn on Debug memory allocations, spinlock debugging, sleep-inside-spinlock checking, and preempt, it will help your debugging. If you are not building your own kernel from source learn how. You are probably freeing memory twice, or not doing ref counting properly or other locking issues. Since it is your code, good luck debugging it, if you want the community help it needs to be open source code that is available for download or be in the kernel.org kernel. From hadi@cyberus.ca Thu Jun 2 20:34:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 20:34:02 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j533XxXq002309 for ; Thu, 2 Jun 2005 20:33:59 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1De2vl-0007ca-IK for netdev@oss.sgi.com; Thu, 02 Jun 2005 23:33:09 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1De2vg-0004O7-AG; Thu, 02 Jun 2005 23:33:04 -0400 Subject: Re: PATCH: ioctl send PID in netlink events From: jamal Reply-To: hadi@cyberus.ca To: Thomas Graf Cc: "David S. Miller" , netdev In-Reply-To: <1117762655.6095.3.camel@localhost.localdomain> References: <1117720349.6050.59.camel@localhost.localdomain> <20050603010059.GU15391@postel.suug.ch> <1117762655.6095.3.camel@localhost.localdomain> Content-Type: multipart/mixed; boundary="=-Ufypw+g9dyMzRlXKv19C" Organization: unknown Date: Thu, 02 Jun 2005 23:32:55 -0400 Message-Id: <1117769575.6095.91.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 X-archive-position: 2015 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 3665 Lines: 113 --=-Ufypw+g9dyMzRlXKv19C Content-Type: text/plain Content-Transfer-Encoding: 7bit Dave, If you havent applied that patch to net-2.6.13 heres one that removes those extrenous printks. On Thu, 2005-02-06 at 21:37 -0400, jamal wrote: > The second one could probably use the new macros. > Maybe i will wait until Dave puts this in his tree and send a small > change; else you could send it. > Actually cant be done, sorry i lied ;-> cheers, jamal --=-Ufypw+g9dyMzRlXKv19C Content-Disposition: attachment; filename=ifconf-2 Content-Type: text/plain; name=ifconf-2; charset=utf-8 Content-Transfer-Encoding: 7bit net/core/rtnetlink.c: needs update net/ipv4/devinet.c: needs update net/ipv4/fib_semantics.c: needs update net/ipv6/addrconf.c: needs update Index: net/core/rtnetlink.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/core/rtnetlink.c (mode:100644) +++ uncommitted/net/core/rtnetlink.c (mode:100644) @@ -452,7 +452,7 @@ if (!skb) return; - if (rtnetlink_fill_ifinfo(skb, dev, type, 0, 0, change, 0) < 0) { + if (rtnetlink_fill_ifinfo(skb, dev, type, current->pid, 0, change, 0) < 0) { kfree_skb(skb); return; } Index: net/ipv4/devinet.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/devinet.c (mode:100644) +++ uncommitted/net/ipv4/devinet.c (mode:100644) @@ -1112,7 +1112,7 @@ if (!skb) netlink_set_err(rtnl, 0, RTMGRP_IPV4_IFADDR, ENOBUFS); - else if (inet_fill_ifaddr(skb, ifa, 0, 0, event, 0) < 0) { + else if (inet_fill_ifaddr(skb, ifa, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV4_IFADDR, EINVAL); } else { Index: net/ipv4/fib_semantics.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv4/fib_semantics.c (mode:100644) +++ uncommitted/net/ipv4/fib_semantics.c (mode:100644) @@ -276,7 +276,7 @@ struct nlmsghdr *n, struct netlink_skb_parms *req) { struct sk_buff *skb; - u32 pid = req ? req->pid : 0; + u32 pid = req ? req->pid : n->nlmsg_pid; int size = NLMSG_SPACE(sizeof(struct rtmsg)+256); skb = alloc_skb(size, GFP_KERNEL); @@ -1035,7 +1035,7 @@ } nl->nlmsg_flags = NLM_F_REQUEST; - nl->nlmsg_pid = 0; + nl->nlmsg_pid = current->pid; nl->nlmsg_seq = 0; nl->nlmsg_len = NLMSG_LENGTH(sizeof(*rtm)); if (cmd == SIOCDELRT) { Index: net/ipv6/addrconf.c =================================================================== --- e4f7366a04d973a42a948d3b4175d66e9adf143e/net/ipv6/addrconf.c (mode:100644) +++ uncommitted/net/ipv6/addrconf.c (mode:100644) @@ -2872,7 +2872,7 @@ netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFADDR, ENOBUFS); return; } - if (inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0) < 0) { + if (inet6_fill_ifaddr(skb, ifa, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFADDR, EINVAL); return; @@ -3007,7 +3007,7 @@ netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFINFO, ENOBUFS); return; } - if (inet6_fill_ifinfo(skb, idev, 0, 0, event, 0) < 0) { + if (inet6_fill_ifinfo(skb, idev, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV6_IFINFO, EINVAL); return; @@ -3064,7 +3064,7 @@ netlink_set_err(rtnl, 0, RTMGRP_IPV6_PREFIX, ENOBUFS); return; } - if (inet6_fill_prefix(skb, idev, pinfo, 0, 0, event, 0) < 0) { + if (inet6_fill_prefix(skb, idev, pinfo, current->pid, 0, event, 0) < 0) { kfree_skb(skb); netlink_set_err(rtnl, 0, RTMGRP_IPV6_PREFIX, EINVAL); return; --=-Ufypw+g9dyMzRlXKv19C-- From davem@davemloft.net Thu Jun 2 22:10:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 22:10:03 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j535A1Xq007191 for ; Thu, 2 Jun 2005 22:10:01 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1De4QW-0002C8-6G; Thu, 02 Jun 2005 22:09:00 -0700 Date: Thu, 02 Jun 2005 22:09:00 -0700 (PDT) Message-Id: <20050602.220900.92343575.davem@davemloft.net> To: hadi@cyberus.ca Cc: tgraf@suug.ch, netdev@oss.sgi.com Subject: Re: PATCH: ioctl send PID in netlink events From: "David S. Miller" In-Reply-To: <1117769575.6095.91.camel@localhost.localdomain> References: <20050603010059.GU15391@postel.suug.ch> <1117762655.6095.3.camel@localhost.localdomain> <1117769575.6095.91.camel@localhost.localdomain> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2017 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 192 Lines: 7 From: jamal Date: Thu, 02 Jun 2005 23:32:55 -0400 > If you havent applied that patch to net-2.6.13 heres one that removes > those extrenous printks. Applied, thanks Jamal. From davem@davemloft.net Thu Jun 2 22:12:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 22:12:21 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j535CIXq007864 for ; Thu, 2 Jun 2005 22:12:18 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1De4Sm-0002Dl-1U; Thu, 02 Jun 2005 22:11:20 -0700 Date: Thu, 02 Jun 2005 22:11:19 -0700 (PDT) Message-Id: <20050602.221119.105431518.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: netdev@oss.sgi.com Subject: Re: [SCTP] Replace spin_lock_irqsave with spin_lock_bh From: "David S. Miller" In-Reply-To: <20050602095459.GA26638@gondor.apana.org.au> References: <20050602094404.GA10316@gondor.apana.org.au> <20050602095459.GA26638@gondor.apana.org.au> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2019 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 334 Lines: 10 From: Herbert Xu Date: Thu, 2 Jun 2005 19:54:59 +1000 > The call in question is only called from recvmsg which means that > IRQs aren't disabled. Therefore it is safe to replace it with > spin_lock_bh. > > Signed-off-by: Herbert Xu Also applied to net-2.6.13, thanks. From davem@davemloft.net Thu Jun 2 22:11:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 22:11:33 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j535BSXq007449 for ; Thu, 2 Jun 2005 22:11:29 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1De4Rv-0002Cn-3Z; Thu, 02 Jun 2005 22:10:27 -0700 Date: Thu, 02 Jun 2005 22:10:26 -0700 (PDT) Message-Id: <20050602.221026.112287995.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: netdev@oss.sgi.com Subject: Re: [IPV4/IPV6] Replace spin_lock_irq with spin_lock_bh From: "David S. Miller" In-Reply-To: <20050602094404.GA10316@gondor.apana.org.au> References: <20050602094404.GA10316@gondor.apana.org.au> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2018 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 570 Lines: 14 From: Herbert Xu Date: Thu, 2 Jun 2005 19:44:04 +1000 > In light of my recent patch to net/ipv4/udp.c that replaced the > spin_lock_irq calls on the receive queue lock with spin_lock_bh, > here is a similar patch for all other occurences of spin_lock_irq > on receive/error queue locks in IPv4 and IPv6. > > In these stacks, we know that they can only be entered from user > or softirq context. Therefore it's safe to disable BH only. > > Signed-off-by: Herbert Xu Applied to net-2.6.13, thanks Herbert. From davem@davemloft.net Thu Jun 2 22:09:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 22:10:00 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5359pXq007173 for ; Thu, 2 Jun 2005 22:09:52 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1De4QD-0002Bu-Br; Thu, 02 Jun 2005 22:08:41 -0700 Date: Thu, 02 Jun 2005 22:08:41 -0700 (PDT) Message-Id: <20050602.220841.48530513.davem@davemloft.net> To: hadi@cyberus.ca Cc: tgraf@suug.ch, netdev@oss.sgi.com Subject: Re: PATCH: explicit typing WAS(Re: PATCH: rtnetlink explicit flags setting From: "David S. Miller" In-Reply-To: <1117717493.6050.29.camel@localhost.localdomain> References: <20050531222646.GK15391@postel.suug.ch> <20050531.153125.95894437.davem@davemloft.net> <1117717493.6050.29.camel@localhost.localdomain> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2016 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 276 Lines: 9 From: jamal Date: Thu, 02 Jun 2005 09:04:52 -0400 > This patch converts "unsigned flags" to use more explict types like u16 > instead and incrementally introduces NLMSG_NEW(). > > Signed-off-by: Jamal Hadi Salim Applied, thanks Jamal. From kostodo@gmail.com Thu Jun 2 22:46:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 22:46:35 -0700 (PDT) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.195]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j535kSXq011247 for ; Thu, 2 Jun 2005 22:46:28 -0700 Received: by rproxy.gmail.com with SMTP id z35so260962rne for ; Thu, 02 Jun 2005 22:45:30 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=KQ5UPSnKzwwBGN3AFNeRIlX1sOp1cCO/MNatNpmKTAhYoLC1XeEAx3av3TlIqQZvlkMGDolXZe03jUYIGpqA2y7A+mmAtgXvqbB4ZKDzv8Mnnd86y8t+3XSeSlrwz8nxwsfXDtAkC3KJaI1bQsz25lDItg+8J98GonoMfPcqz1c= Received: by 10.38.88.3 with SMTP id l3mr730055rnb; Thu, 02 Jun 2005 22:45:30 -0700 (PDT) Received: by 10.38.208.46 with HTTP; Thu, 2 Jun 2005 22:45:30 -0700 (PDT) Message-ID: Date: Fri, 3 Jun 2005 09:45:30 +0400 From: Kosta Todorovic Reply-To: Kosta Todorovic To: Ben Greear Subject: Re: Network card driver problem (znb.o/tulip) Cc: jgarzik@pobox.com, tulip-users@lists.sourceforge.net, netdev@oss.sgi.com In-Reply-To: <428E0B3B.1090507@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline References: <428CC958.1080909@candelatech.com> <428E0B3B.1090507@candelatech.com> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j535kSXq011247 X-archive-position: 2020 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kostodo@gmail.com Precedence: bulk X-list: netdev Content-Length: 5281 Lines: 152 I'm not too concerned with backward compatibility. I see silicom-usa provide both a Broadcom and Intel based chipsets. Is there any reason in particular that you reccomended Broadcom? And can standard kernel drivers be used for these cards? I've had bad expirience with custom manufactorer drivers once they discontinue development and support for their card. How reliable is Silicom-usa? As a management decision, who would you purchase 10 quad port cards from and which kinds of cards would u get? Thanks, K On 5/20/05, Ben Greear wrote: > Kosta Todorovic wrote: > > 2 more questions: > > > > 1) Is there anything special I will need to compile in terms of the > > linux kernel for 64-bit PCI bus mode (PCI-X) ? (Currently I'm using > > kernel 2.4.x but that is because my current card drivers do not > > support 2.6.x) > > Nothing special...2.4 and 2.6 kernels since way back will work just fine. > > > 2) The machine actually has a PCI extension with 9 other PCI-X slots. > > The current cards are 64-bit (pci-x) but as a test i'm planning on > > replacing them with DLinks DFE-580tx's. Unfortunately these are 32-bit > > cards (legacy pci). How will these 4 ports work in 32-bit mode? What > > will the effect be on the speed? > > If you put a 33Mhz NIC in a PCI-X bus it makes the entire bus run at > 33Mhz speed. > > If you do want full backwards compatibility, get the 'universal' 4-port > broadcom NIC from silicom-usa. It works fine in 32-bit PCI busses, and > though I haven't personally tested it, it should work fine in PCI-X > busses at high speed as well. > > Ben > > > > > > > > > On 5/19/05, Ben Greear wrote: > > > >>Kosta Todorovic wrote: > >> > >>>Whats the best 4-port NIC currently available? I'm interested in > >>>purchasing 10 4-port NICs as a replacement for my current cards. > >>> > >>>I am looking for 10/100Mbps and a good driver for linux (2.4.x and > >>>2.6.x). Preferably a mainstream company but thats not priority. > >>> > >>>Could the community please recommend the best card available? Money is > >>>not an issue since im really interested in the best of the best. > >> > >>Get an Intel 4-port GigE NIC. It will do 10/100/1000, and if you really > >>want to use all 4 ports at even 100Mbps, you need the 64-bit PCI bus... > >> > >>I have been getting mine from silicom-usa.com lately. They also have > >>6-port NICs, and 4-port broadcom GigE nics that can be used in 32-bit > >>PCI slots. (The Intel 4-port NICs will only work in 64-bit PCI slots.) > >> > >>If you really want 10/100 nics, try the p430tx from aei: > >>http://www.aei-it.com/hardware/fastenet/p430tx.htm > >> > >>These are like the old DFE570tx NICs, and use the tulip driver. They > >>are almost as expensive as the GigE NICs though... > >> > >>Thanks, > >>Ben > >> > >> > >>>Any suggestions? > >>> > >>>Regards, > >>>Kosta > >>> > >>> > >>> > >>>On 3/11/05, Kosta Todorovic wrote: > >>> > >>> > >>>>My company has recently purchased several ZNYX ZX274 network cards. > >>>>These cards are Four Channel, 10/100 PCI Adapters. They use Intel chipsets. > >>>> > >>>>Unfortunately there exists no drivers for linux amd64 architecture. > >>>>There are 32bit drivers found at: > >>>>http://www.znyx.com/support/drivers/ZX374_drivers.htm but naturally > >>>>they wont compile under my amd64 system. > >>>> > >>>>The driver itself is called znb.o and can be downloaded from ZNYX's > >>>>website. I spoke to support staff there but they told me they have > >>>>discontinued support and development for this series of cards. > >>>> > >>>>The system I am running gentoo and have tried both 2.4.x and 2.6.x > >>>>kernels but no luck. > >>>> > >>>>Unfortunately there is NO 64bit drivers available for ANY platform. not even MS. > >>>> > >>>>Does anyone know of a customised znb.o driver built for amd64? > >>>>Is there any chance of anyone modifying the source code of the driver > >>>>to compile under a amd64 system? > >>>> > >>>>I've noticed that "tulip" drivers get loaded as a module at boot time. > >>>>but they dont function correctly. (lets you start the device and > >>>>attach ips but cant talk through it) > >>>> > >>>>Is there any variants of the tulip driver that will work for this? > >>>> > >>>>Help much appreciated. > >>>> > >>>> > >>>>/proc/pci extract for network cards: > >>>> > >>>> Bus 5, device 5, function 0: > >>>> Ethernet controller: Digital Equipment Corporation DECchip > >>>>21142/43 (#30) (rev 65). > >>>> IRQ 30. > >>>> Master Capable. Latency=128. Min Gnt=20.Max Lat=40. > >>>> I/O at 0x0 [0x7f]. > >>>> Non-prefetchable 32 bit memory at 0xfa1ff400 [0xfa1ff7ff]. > >>>> Bus 5, device 4, function 0: > >>>> Ethernet controller: Digital Equipment Corporation DECchip > >>>>21142/43 (#29) (rev 65). > >>>> IRQ 29. > >>>> Master Capable. No bursts. Min Gnt=20.Max Lat=40. > >>>> I/O at 0x0 [0x7f]. > >>>> Non-prefetchable 32 bit memory at 0xf9f00000 [0xf9f003ff]. > >>>> > >>> > >>> > >> > >>-- > >>Ben Greear > >>Candela Technologies Inc http://www.candelatech.com > >> > >> > > > > > > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > From greearb@candelatech.com Thu Jun 2 22:54:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 22:55:00 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j535suXq012007 for ; Thu, 2 Jun 2005 22:54:56 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j536RJ5I026433; Thu, 2 Jun 2005 23:27:20 -0700 Message-ID: <429FF071.8040707@candelatech.com> Date: Thu, 02 Jun 2005 22:53:53 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kosta Todorovic CC: jgarzik@pobox.com, tulip-users@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: Network card driver problem (znb.o/tulip) References: <428CC958.1080909@candelatech.com> <428E0B3B.1090507@candelatech.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2021 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1133 Lines: 30 Kosta Todorovic wrote: > I'm not too concerned with backward compatibility. I see silicom-usa > provide both a Broadcom and Intel based chipsets. > > Is there any reason in particular that you reccomended Broadcom? And > can standard kernel drivers be used for these cards? I've had bad > expirience with custom manufactorer drivers once they discontinue > development and support for their card. The BCM NICs will work in a normal 32-bit PCI bus..the 4-port Intels will not. If you have 64-bit PCI-X, then I'd get Intel..but that's just because I've used them longer...I have no reason to believe the BCM is inferior at this time. > How reliable is Silicom-usa? > > As a management decision, who would you purchase 10 quad port cards > from and which kinds of cards would u get? Heh, I've already purchased more than 10 from silicom, and have shipped them all over the world. So far...no complaints! But, if you don't need the BCM, you can get good ole Intel quad GigE NICs from www.newegg.com and a million other places. Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From kostodo@gmail.com Thu Jun 2 23:00:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 23:00:39 -0700 (PDT) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.205]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5360aXq014080 for ; Thu, 2 Jun 2005 23:00:36 -0700 Received: by rproxy.gmail.com with SMTP id z35so262012rne for ; Thu, 02 Jun 2005 22:59:38 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=MqhClYLMKtWNDtSO9HOZE3iKsNC79hPqIupJwBH2jx6xreuo5TRlp9QQFHhzyGLH5SLAhrmumGFo4fiZ1vFXdn/aagDSzJDm86MD91MT/9f8Xb6GAGMIa+w9ejQB8Rhwa4zQcP1uiiE7mPh7ik91ChBO2Huj4Cq1uNzTH4RBzvI= Received: by 10.38.88.1 with SMTP id l1mr727356rnb; Thu, 02 Jun 2005 22:58:44 -0700 (PDT) Received: by 10.38.208.46 with HTTP; Thu, 2 Jun 2005 22:58:44 -0700 (PDT) Message-ID: Date: Fri, 3 Jun 2005 09:58:44 +0400 From: Kosta Todorovic Reply-To: Kosta Todorovic To: Ben Greear Subject: Re: Network card driver problem (znb.o/tulip) Cc: jgarzik@pobox.com, tulip-users@lists.sourceforge.net, netdev@oss.sgi.com In-Reply-To: <429FF071.8040707@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline References: <428CC958.1080909@candelatech.com> <428E0B3B.1090507@candelatech.com> <429FF071.8040707@candelatech.com> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5360aXq014080 X-archive-position: 2022 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kostodo@gmail.com Precedence: bulk X-list: netdev Content-Length: 1341 Lines: 36 So intel gigE nics use standard tulip linux drivers that come shipped with a vanilla kernel? On 6/3/05, Ben Greear wrote: > Kosta Todorovic wrote: > > I'm not too concerned with backward compatibility. I see silicom-usa > > provide both a Broadcom and Intel based chipsets. > > > > Is there any reason in particular that you reccomended Broadcom? And > > can standard kernel drivers be used for these cards? I've had bad > > expirience with custom manufactorer drivers once they discontinue > > development and support for their card. > > The BCM NICs will work in a normal 32-bit PCI bus..the 4-port Intels will > not. If you have 64-bit PCI-X, then I'd get Intel..but that's just because > I've used them longer...I have no reason to believe the BCM is inferior at > this time. > > > How reliable is Silicom-usa? > > > > As a management decision, who would you purchase 10 quad port cards > > from and which kinds of cards would u get? > > Heh, I've already purchased more than 10 from silicom, and have shipped > them all over the world. So far...no complaints! But, if you don't > need the BCM, you can get good ole Intel quad GigE NICs from www.newegg.com > and a million other places. > > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com > > From greearb@candelatech.com Thu Jun 2 23:26:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 02 Jun 2005 23:27:00 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j536QuXq015870 for ; Thu, 2 Jun 2005 23:26:56 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j536xJ5I026786; Thu, 2 Jun 2005 23:59:19 -0700 Message-ID: <429FF7F0.7050505@candelatech.com> Date: Thu, 02 Jun 2005 23:25:52 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kosta Todorovic CC: jgarzik@pobox.com, tulip-users@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: Network card driver problem (znb.o/tulip) References: <428CC958.1080909@candelatech.com> <428E0B3B.1090507@candelatech.com> <429FF071.8040707@candelatech.com> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2023 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 369 Lines: 15 Kosta Todorovic wrote: > So intel gigE nics use standard tulip linux drivers that come shipped > with a vanilla kernel? No..forget about tulip. It uses standard e1000 driver shipped with vanilla kernel. The BCM chipsets use standard drivers in the kernel as well. Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From jbenc@suse.cz Fri Jun 3 02:34:43 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 02:34:46 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j539YgXq029261 for ; Fri, 3 Jun 2005 02:34:43 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 1EC99628312; Fri, 3 Jun 2005 11:33:44 +0200 (CEST) Date: Fri, 3 Jun 2005 11:33:43 +0200 From: Jiri Benc To: Cc: , Subject: Re: [PATCH] ieee80211: Update generic definitions to latest specs. Message-ID: <20050603113343.55d19cfc@griffin.suse.cz> In-Reply-To: <20050602190232.340996282D7@mail.suse.cz> References: <20050602190232.340996282D7@mail.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2024 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 753 Lines: 24 On Thu, 2 Jun 2005 21:02:24 +0200, gwingerde@home.nl wrote: > I was thinking about that too, but couldn't find a proper shorter > version without losing the descriptive meaning. > > Do you have any suggestions to shorten them? Maybe we can lose a bit of descriptiveness and put comments above definitions instead? I can imagine names such as WLAN_STATUS_ASSOC_DENIED_NOSPECTRUM, WLAN_STATUS_ASSOC_DENIED_BAD_POWER, WLAN_STATUS_ASSOC_DENIED_BAD_SUPPCHANNS, WLAN_REASON_DISASSOC_BAD_POWER, and so on. Also WLAN_STATUS_ASSOC_DENIED_NOSHORT seems to be acceptable for me. More often used identifiers probably could have even shorter name - what about renaming IEEE80211_FCTL_PROTECTEDFRAME to IEEE80211_FCTL_PROTECTED? Thanks, -- Jiri Benc SUSE Labs From baruch@ev-en.org Fri Jun 3 06:43:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 06:44:06 -0700 (PDT) Received: from galon.ev-en.org (rrcs-24-123-59-149.central.biz.rr.com [24.123.59.149]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53DhvXq018401 for ; Fri, 3 Jun 2005 06:43:58 -0700 Received: by galon.ev-en.org (Postfix, from userid 105) id 3DBED11A953; Fri, 3 Jun 2005 16:42:59 +0300 (IDT) Received: from [10.220.3.66] (hamilton.nuim.ie [149.157.192.252]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by galon.ev-en.org (Postfix) with ESMTP id 5196D11A951; Fri, 3 Jun 2005 16:42:53 +0300 (IDT) Message-ID: <42A05E5C.9050408@ev-en.org> Date: Fri, 03 Jun 2005 14:42:52 +0100 From: Baruch Even User-Agent: Debian Thunderbird 1.0.2 (X11/20050331) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" Cc: netdev@oss.sgi.com, shemminger@osdl.org, doug.leith@nuim.ie Subject: Re: Comparison of several congestion control algorithms References: <4298E045.9050009@ev-en.org> <20050602.163512.10298458.davem@davemloft.net> <429F9B2F.8030507@ev-en.org> <20050602.165341.63126720.davem@davemloft.net> In-Reply-To: <20050602.165341.63126720.davem@davemloft.net> X-Enigmail-Version: 0.91.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-archive-position: 2025 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: baruch@ev-en.org Precedence: bulk X-list: netdev Content-Length: 936 Lines: 27 David S. Miller wrote: > From: Baruch Even > Date: Fri, 03 Jun 2005 00:50:07 +0100 > > >>This is in part because of the start of the work that was based on 2.4 >>kernels and even as far as the 2.6.6 kernel which had disabled TSO once >>it saw SACKs. This made TSO unusable for our needs. >> >>AFAIK, the tests reported in that document used kernel 2.6.6. > > > Sure SACKs turn off TSO currently, but you'll have them enabled > at the beginning until the first loss and this affects how fast > the cwnd will grow. > > If you have e1000 cards, for example, you're getting TSO enabled > by default. > > You really need to look into this, as it has a real and very > non-trivial effect on all of the results you obtained. I checked that now and ethtool -k shows TSO to be disabled after boot. Since all the test scripts are not playing with ethtool I can be sure that TSO was off during all of our tests. Baruch From jbenc@suse.cz Fri Jun 3 09:27:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:27:31 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GROXq031421 for ; Fri, 3 Jun 2005 09:27:24 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 8B6CE6282FC; Fri, 3 Jun 2005 18:26:25 +0200 (CEST) Date: Fri, 3 Jun 2005 18:26:25 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [0/9] ieee80211: Improvements to the layer Message-ID: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2026 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 456 Lines: 14 Following patches are nearly the same as were sent couple of days ago. However, they are against current netdev-2.6 tree and they contain some more fixes (TKIP compilation, new file for protocol layer functions). The HH_DATA_OFF bugfix is needed too (http://oss.sgi.com/projects/netdev/archive/2005-05/msg00962.html), it's not included here as it is in Linus' tree already. Also there are two patches from Adrian Bunk included. -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:29:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:29:23 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GTIXq031749 for ; Fri, 3 Jun 2005 09:29:18 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id EBDDB628305; Fri, 3 Jun 2005 18:28:19 +0200 (CEST) Date: Fri, 3 Jun 2005 18:28:19 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [1/9] ieee80211: remove pci.h #include's Message-ID: <20050603182819.44500c27@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2027 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 1527 Lines: 44 From: Adrian Bunk I was wondering why editing pci.h triggered the rebuild of three files under net/, and as far as I can see, there's no reason for these three files to #include pci.h . Signed-off-by: Adrian Bunk Signed-off-by: Jiri Benc --- linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_module.c.old 2005-04-30 23:23:14.000000000 +0200 +++ linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_module.c 2005-04-30 23:23:18.000000000 +0200 @@ -40,7 +40,6 @@ #include #include #include -#include #include #include #include --- linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_tx.c.old 2005-04-30 23:23:25.000000000 +0200 +++ linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_tx.c 2005-04-30 23:23:32.000000000 +0200 @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include --- linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_rx.c.old 2005-04-30 23:23:42.000000000 +0200 +++ linux-2.6.12-rc3-mm1-full/net/ieee80211/ieee80211_rx.c 2005-04-30 23:23:46.000000000 +0200 @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:30:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:30:22 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GUIXq032183 for ; Fri, 3 Jun 2005 09:30:18 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 5C4466282FC; Fri, 3 Jun 2005 18:29:20 +0200 (CEST) Date: Fri, 3 Jun 2005 18:29:20 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [2/9] ieee80211: fix recursive ipw2200 dependencies Message-ID: <20050603182920.689a269f@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2028 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 896 Lines: 32 From: Adrian Bunk This results in recursive dependencies: - IPW2200 depends on NET_RADIO - IPW2200 selects IEEE80211 - IEEE80211 selects NET_RADIO This patch fixes the IPW2200 dependencies in a way that they are similar to the IPW2100 dependencies. Signed-off-by: Adrian Bunk Signed-off-by: Jiri Benc --- linux-2.6.12-rc5-mm2-full/drivers/net/wireless/Kconfig.old 2005-06-02 22:04:02.000000000 +0200 +++ linux-2.6.12-rc5-mm2-full/drivers/net/wireless/Kconfig 2005-06-02 22:04:40.000000000 +0200 @@ -192,9 +192,8 @@ config IPW2200 tristate "Intel PRO/Wireless 2200BG and 2915ABG Network Connection" - depends on NET_RADIO && PCI + depends on IEEE80211 && PCI select FW_LOADER - select IEEE80211 ---help--- A driver for the Intel PRO/Wireless 2200BG and 2915ABG Network Connection adapters. -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:31:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:31:51 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GVkXq000509 for ; Fri, 3 Jun 2005 09:31:47 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 5E157628305; Fri, 3 Jun 2005 18:30:48 +0200 (CEST) Date: Fri, 3 Jun 2005 18:30:48 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [3/9] ieee80211: fix ipw 64bit compilation warnings Message-ID: <20050603183048.7786f98b@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2029 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 7028 Lines: 237 This patch fixes warnings when compiling ipw2100 and ipw2200 on x86_64. Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/drivers/net/wireless/ipw2200.c =================================================================== --- netdev.orig/drivers/net/wireless/ipw2200.c 2005-06-01 11:03:37.000000000 +0200 +++ netdev/drivers/net/wireless/ipw2200.c 2005-06-03 15:46:31.000000000 +0200 @@ -241,8 +241,8 @@ IPW_DEBUG_IO(" reg = 0x%8X : value = 0x%8X\n", reg, value); _ipw_write32(priv, CX2_INDIRECT_ADDR, reg & CX2_INDIRECT_ADDR_MASK); _ipw_write8(priv, CX2_INDIRECT_DATA, value); - IPW_DEBUG_IO(" reg = 0x%8X : value = 0x%8X\n", - (unsigned)(priv->hw_base + CX2_INDIRECT_DATA), + IPW_DEBUG_IO(" reg = 0x%8lX : value = 0x%8X\n", + (unsigned long)(priv->hw_base + CX2_INDIRECT_DATA), value); } @@ -508,7 +508,7 @@ /* verify we have enough room to store the value */ if (*len < sizeof(u32)) { IPW_DEBUG_ORD("ordinal buffer length too small, " - "need %d\n", sizeof(u32)); + "need %d\n", (int)sizeof(u32)); return -EINVAL; } @@ -541,7 +541,7 @@ /* verify we have enough room to store the value */ if (*len < sizeof(u32)) { IPW_DEBUG_ORD("ordinal buffer length too small, " - "need %d\n", sizeof(u32)); + "need %d\n", (int)sizeof(u32)); return -EINVAL; } @@ -1740,7 +1740,7 @@ u32 address = CX2_SHARED_SRAM_DMA_CONTROL + (sizeof(struct command_block) * index); IPW_DEBUG_FW(">> :\n"); - ipw_write_indirect(priv, address, (u8*)cb, sizeof(struct command_block)); + ipw_write_indirect(priv, address, (u8*)cb, (int)sizeof(struct command_block)); IPW_DEBUG_FW("<< :\n"); return 0; @@ -2342,11 +2342,11 @@ return -EINVAL; } - IPW_DEBUG_INFO("Loading firmware '%s' file v%d.%d (%d bytes)\n", + IPW_DEBUG_INFO("Loading firmware '%s' file v%d.%d (%ld bytes)\n", name, IPW_FW_MAJOR(header->version), IPW_FW_MINOR(header->version), - (*fw)->size - sizeof(struct fw_header)); + (long)(*fw)->size - sizeof(struct fw_header)); return 0; } @@ -2698,7 +2698,7 @@ q->bd = pci_alloc_consistent(dev,sizeof(q->bd[0])*count, &q->q.dma_addr); if (!q->bd) { IPW_ERROR("pci_alloc_consistent(%d) failed\n", - sizeof(q->bd[0]) * count); + (int)sizeof(q->bd[0]) * count); kfree(q->txb); q->txb = NULL; return -ENOMEM; @@ -3467,7 +3467,7 @@ } else { IPW_DEBUG_SCAN("Scan result of wrong size %d " "(should be %d)\n", - notif->size,sizeof(*x)); + notif->size, (int)sizeof(*x)); } break; } @@ -3483,7 +3483,7 @@ } else { IPW_ERROR("Scan completed of wrong size %d " "(should be %d)\n", - notif->size,sizeof(*x)); + notif->size, (int)sizeof(*x)); } priv->status &= ~(STATUS_SCANNING | STATUS_SCAN_ABORTING); @@ -3516,7 +3516,7 @@ } else { IPW_ERROR("Frag length of wrong size %d " "(should be %d)\n", - notif->size, sizeof(*x)); + notif->size, (int)sizeof(*x)); } break; } @@ -3533,7 +3533,7 @@ } else { IPW_ERROR("Link Deterioration of wrong size %d " "(should be %d)\n", - notif->size,sizeof(*x)); + notif->size, (int)sizeof(*x)); } break; } @@ -3552,7 +3552,7 @@ struct notif_beacon_state *x = ¬if->u.beacon_state; if (notif->size != sizeof(*x)) { IPW_ERROR("Beacon state of wrong size %d (should " - "be %d)\n", notif->size, sizeof(*x)); + "be %d)\n", notif->size, (int)sizeof(*x)); break; } @@ -3603,7 +3603,7 @@ } IPW_ERROR("TGi Tx Key of wrong size %d (should be %d)\n", - notif->size,sizeof(*x)); + notif->size, (int)sizeof(*x)); break; } @@ -3617,7 +3617,7 @@ } IPW_ERROR("Calibration of wrong size %d (should be %d)\n", - notif->size,sizeof(*x)); + notif->size, (int)sizeof(*x)); break; } @@ -3629,7 +3629,7 @@ } IPW_ERROR("Noise stat is wrong size %d (should be %d)\n", - notif->size, sizeof(u32)); + notif->size, (int)sizeof(u32)); break; } @@ -4823,7 +4823,7 @@ } /* Advance skb->data to the start of the actual payload */ - skb_reserve(rxb->skb, (u32)&pkt->u.frame.data[0] - (u32)pkt); + skb_reserve(rxb->skb, offsetof(struct ipw_rx_packet, u.frame.data)); /* Set the size of the skb to the size of the frame */ skb_put(rxb->skb, pkt->u.frame.length); Index: netdev/drivers/net/wireless/ipw2100.c =================================================================== --- netdev.orig/drivers/net/wireless/ipw2100.c 2005-06-01 11:03:37.000000000 +0200 +++ netdev/drivers/net/wireless/ipw2100.c 2005-06-03 15:43:53.000000000 +0200 @@ -494,7 +494,7 @@ IPW_DEBUG_WARNING(DRV_NAME ": ordinal buffer length too small, need %d\n", - IPW_ORD_TAB_1_ENTRY_SIZE); + (int)IPW_ORD_TAB_1_ENTRY_SIZE); return -EINVAL; } @@ -2302,7 +2302,7 @@ #endif IPW_DEBUG_INFO(DRV_NAME ": PCI latency error detected at " - "0x%04X.\n", i * sizeof(struct ipw2100_status)); + "0x%04X.\n", i * (int)sizeof(struct ipw2100_status)); #ifdef ACPI_CSTATE_LIMIT_DEFINED IPW_DEBUG_INFO(DRV_NAME ": Disabling C3 transitions.\n"); @@ -2398,7 +2398,7 @@ /* Make a copy of the frame so we can dump it to the logs if * ieee80211_rx fails */ memcpy(packet_data, packet->skb->data, - min(status->frame_size, IPW_RX_NIC_BUFFER_LENGTH)); + min_t(u32, status->frame_size, IPW_RX_NIC_BUFFER_LENGTH)); #endif if (!ieee80211_rx(priv->ieee, packet->skb, stats)) { @@ -2730,21 +2730,21 @@ { int i = txq->oldest; IPW_DEBUG_TX( - "TX%d V=%p P=%p T=%p L=%d\n", i, + "TX%d V=%p P=%04X T=%04X L=%d\n", i, &txq->drv[i], - (void*)txq->nic + i * sizeof(struct ipw2100_bd), - (void*)txq->drv[i].host_addr, + (u32)(txq->nic + i * sizeof(struct ipw2100_bd)), + txq->drv[i].host_addr, txq->drv[i].buf_length); if (packet->type == DATA) { i = (i + 1) % txq->entries; IPW_DEBUG_TX( - "TX%d V=%p P=%p T=%p L=%d\n", i, + "TX%d V=%p P=%04X T=%04X L=%d\n", i, &txq->drv[i], - (void*)txq->nic + i * - sizeof(struct ipw2100_bd), - (void*)txq->drv[i].host_addr, + (u32)(txq->nic + i * + sizeof(struct ipw2100_bd)), + (u32)txq->drv[i].host_addr, txq->drv[i].buf_length); } } @@ -4212,7 +4212,7 @@ { IPW_DEBUG_INFO("enter\n"); - IPW_DEBUG_INFO("initializing bd queue at virt=%p, phys=%08x\n", q->drv, q->nic); + IPW_DEBUG_INFO("initializing bd queue at virt=%p, phys=%08x\n", q->drv, (u32)q->nic); write_register(priv->net_dev, base, q->nic); write_register(priv->net_dev, size, q->entries); @@ -8431,8 +8431,8 @@ priv->net_dev->name, fw_name); return rc; } - IPW_DEBUG_INFO("firmware data %p size %d\n", fw->fw_entry->data, - fw->fw_entry->size); + IPW_DEBUG_INFO("firmware data %p size %ld\n", fw->fw_entry->data, + (long)fw->fw_entry->size); ipw2100_mod_firmware_load(fw); -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:32:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:32:51 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GWlXq000977 for ; Fri, 3 Jun 2005 09:32:48 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 6E2FD6282FC; Fri, 3 Jun 2005 18:31:49 +0200 (CEST) Date: Fri, 3 Jun 2005 18:31:49 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [4/9] ieee80211: ieee80211_device alignment fix and cleanup Message-ID: <20050603183149.228ab747@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2030 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 9617 Lines: 310 Changes to the ieee80211 layer: - fixes a serious alignment problem of the driver's private data - makes the drivers use the ieee80211_device instead of the net_device where appropriate (will ease further development of ieee80211 as a self-contained layer) Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/include/net/ieee80211.h =================================================================== --- netdev.orig/include/net/ieee80211.h 2005-06-01 11:05:06.000000000 +0200 +++ netdev/include/net/ieee80211.h 2005-06-03 13:20:46.000000000 +0200 @@ -704,15 +704,13 @@ int abg_ture; /* ABG flag */ /* Callback functions */ - void (*set_security)(struct net_device *dev, + void (*set_security)(struct ieee80211_device *ieee, struct ieee80211_security *sec); int (*hard_start_xmit)(struct ieee80211_txb *txb, - struct net_device *dev); - int (*reset_port)(struct net_device *dev); + struct ieee80211_device *ieee); + int (*reset_port)(struct ieee80211_device *ieee); - /* This must be the last item so that it points to the data - * allocated beyond this structure by alloc_ieee80211 */ - u8 priv[0]; + void *priv; }; #define IEEE_A (1<<0) @@ -720,9 +718,27 @@ #define IEEE_G (1<<2) #define IEEE_MODE_MASK (IEEE_A|IEEE_B|IEEE_G) -extern inline void *ieee80211_priv(struct net_device *dev) +static inline void *ieee80211_priv(struct ieee80211_device *ieee) { - return ((struct ieee80211_device *)netdev_priv(dev))->priv; + return (char *)ieee + + ((sizeof(struct ieee80211_device) + NETDEV_ALIGN_CONST) + & ~NETDEV_ALIGN_CONST); +} + +static inline void *ieee80211_dev_to_priv(struct net_device *dev) +{ + return (char *)dev + + ((sizeof(struct net_device) + NETDEV_ALIGN_CONST) + & ~NETDEV_ALIGN_CONST) + + ((sizeof(struct ieee80211_device) + NETDEV_ALIGN_CONST) + & ~NETDEV_ALIGN_CONST); +} + +static inline struct net_device *ieee80211_dev(struct ieee80211_device *ieee) +{ + return (struct net_device *)((char *)ieee - + ((sizeof(struct net_device) + NETDEV_ALIGN_CONST) + & ~NETDEV_ALIGN_CONST)); } extern inline int ieee80211_is_empty_essid(const char *essid, int essid_len) @@ -795,8 +811,8 @@ /* ieee80211.c */ -extern void free_ieee80211(struct net_device *dev); -extern struct net_device *alloc_ieee80211(int sizeof_priv); +extern void free_ieee80211(struct ieee80211_device *ieee); +extern struct ieee80211_device *alloc_ieee80211(int sizeof_priv); extern int ieee80211_set_encryption(struct ieee80211_device *ieee); Index: netdev/net/ieee80211/ieee80211_module.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_module.c 2005-06-03 13:20:40.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_module.c 2005-06-03 13:20:46.000000000 +0200 @@ -69,7 +69,7 @@ GFP_KERNEL); if (!ieee->networks) { printk(KERN_WARNING "%s: Out of memory allocating beacons\n", - ieee->dev->name); + ieee80211_dev(ieee)->name); return -ENOMEM; } @@ -98,23 +98,28 @@ } -struct net_device *alloc_ieee80211(int sizeof_priv) +struct ieee80211_device *alloc_ieee80211(int sizeof_priv) { struct ieee80211_device *ieee; struct net_device *dev; + int alloc_size; int err; IEEE80211_DEBUG_INFO("Initializing...\n"); - dev = alloc_etherdev(sizeof(struct ieee80211_device) + sizeof_priv); + alloc_size = ((sizeof(struct ieee80211_device) + NETDEV_ALIGN_CONST) + & ~NETDEV_ALIGN_CONST) + + sizeof_priv; + dev = alloc_etherdev(alloc_size); if (!dev) { IEEE80211_ERROR("Unable to network device.\n"); goto failed; } ieee = netdev_priv(dev); - dev->hard_start_xmit = ieee80211_xmit; - ieee->dev = dev; + ieee->priv = ieee80211_priv(ieee); + + dev->hard_start_xmit = ieee80211_xmit; err = ieee80211_networks_allocate(ieee); if (err) { @@ -147,7 +152,7 @@ ieee->privacy_invoked = 0; ieee->ieee802_1x = 1; - return dev; + return ieee; failed: if (dev) @@ -156,10 +161,8 @@ } -void free_ieee80211(struct net_device *dev) +void free_ieee80211(struct ieee80211_device *ieee) { - struct ieee80211_device *ieee = netdev_priv(dev); - int i; del_timer_sync(&ieee->crypt_deinit_timer); @@ -178,7 +181,7 @@ } ieee80211_networks_free(ieee); - free_netdev(dev); + free_netdev(ieee80211_dev(ieee)); } #ifdef CONFIG_IEEE80211_DEBUG Index: netdev/net/ieee80211/ieee80211_rx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_rx.c 2005-06-03 13:20:40.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_rx.c 2005-06-03 13:20:46.000000000 +0200 @@ -99,7 +99,7 @@ if (frag == 0) { /* Reserve enough space to fit maximum frame length */ - skb = dev_alloc_skb(ieee->dev->mtu + + skb = dev_alloc_skb(ieee80211_dev(ieee)->mtu + sizeof(struct ieee80211_hdr) + 8 /* LLC */ + 2 /* alignment */ + @@ -175,7 +175,7 @@ { if (ieee->iw_mode == IW_MODE_MASTER) { printk(KERN_DEBUG "%s: Master mode not yet suppported.\n", - ieee->dev->name); + ieee80211_dev(ieee)->name); return 0; /* hostap_update_sta_ps(ieee, (struct hostap_ieee80211_hdr *) @@ -233,7 +233,7 @@ static int ieee80211_is_eapol_frame(struct ieee80211_device *ieee, struct sk_buff *skb) { - struct net_device *dev = ieee->dev; + struct net_device *dev = ieee80211_dev(ieee); u16 fc, ethertype; struct ieee80211_hdr *hdr; u8 *pos; @@ -289,7 +289,7 @@ if (net_ratelimit()) { printk(KERN_DEBUG "%s: TKIP countermeasures: dropped " "received packet from " MAC_FMT "\n", - ieee->dev->name, MAC_ARG(hdr->addr2)); + ieee80211_dev(ieee)->name, MAC_ARG(hdr->addr2)); } return -1; } @@ -334,7 +334,7 @@ if (res < 0) { printk(KERN_DEBUG "%s: MSDU decryption/MIC verification failed" " (SA=" MAC_FMT " keyidx=%d)\n", - ieee->dev->name, MAC_ARG(hdr->addr2), keyidx); + ieee80211_dev(ieee)->name, MAC_ARG(hdr->addr2), keyidx); return -1; } @@ -348,7 +348,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct sk_buff *skb, struct ieee80211_rx_stats *rx_stats) { - struct net_device *dev = ieee->dev; + struct net_device *dev = ieee80211_dev(ieee); struct ieee80211_hdr *hdr; size_t hdrlen; u16 fc, type, stype, sc; @@ -1194,7 +1194,7 @@ IEEE80211_DEBUG_MGMT("received UNKNOWN (%d)\n", WLAN_FC_GET_STYPE(header->frame_ctl)); IEEE80211_WARNING("%s: Unknown management packet: %d\n", - ieee->dev->name, + ieee80211_dev(ieee)->name, WLAN_FC_GET_STYPE(header->frame_ctl)); break; } Index: netdev/net/ieee80211/ieee80211_tx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_tx.c 2005-06-03 13:20:40.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_tx.c 2005-06-03 13:20:46.000000000 +0200 @@ -171,7 +171,7 @@ if (net_ratelimit()) { printk(KERN_DEBUG "%s: TKIP countermeasures: dropped " "TX packet to " MAC_FMT "\n", - ieee->dev->name, MAC_ARG(header->addr1)); + ieee80211_dev(ieee)->name, MAC_ARG(header->addr1)); } return -1; } @@ -192,7 +192,7 @@ atomic_dec(&crypt->refcnt); if (res < 0) { printk(KERN_INFO "%s: Encryption failed: len=%d.\n", - ieee->dev->name, frag->len); + ieee80211_dev(ieee)->name, frag->len); ieee->ieee_stats.tx_discards++; return -1; } @@ -269,13 +269,13 @@ * creating it... */ if (!ieee->hard_start_xmit) { printk(KERN_WARNING "%s: No xmit handler.\n", - ieee->dev->name); + dev->name); goto success; } if (unlikely(skb->len < SNAP_SIZE + sizeof(u16))) { printk(KERN_WARNING "%s: skb too small (%d).\n", - ieee->dev->name, skb->len); + dev->name, skb->len); goto success; } @@ -371,7 +371,7 @@ txb = ieee80211_alloc_txb(nr_frags, frag_size, GFP_ATOMIC); if (unlikely(!txb)) { printk(KERN_WARNING "%s: Could not allocate TXB\n", - ieee->dev->name); + dev->name); goto failed; } txb->encrypted = encrypt; @@ -426,7 +426,7 @@ dev_kfree_skb_any(skb); if (txb) { - if ((*ieee->hard_start_xmit)(txb, dev) == 0) { + if ((*ieee->hard_start_xmit)(txb, ieee) == 0) { stats->tx_packets++; stats->tx_bytes += txb->payload_size; return 0; Index: netdev/net/ieee80211/ieee80211_wx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_wx.c 2005-06-01 11:05:14.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_wx.c 2005-06-03 13:20:46.000000000 +0200 @@ -252,7 +252,7 @@ union iwreq_data *wrqu, char *keybuf) { struct iw_point *erq = &(wrqu->encoding); - struct net_device *dev = ieee->dev; + struct net_device *dev = ieee80211_dev(ieee); struct ieee80211_security sec = { .flags = 0 }; @@ -402,7 +402,7 @@ sec.level = SEC_LEVEL_1; /* 40 and 104 bit WEP */ if (ieee->set_security) - ieee->set_security(dev, &sec); + ieee->set_security(ieee, &sec); /* Do not reset port if card is in Managed mode since resetting will * generate new IEEE 802.11 authentication which may end up in looping @@ -411,7 +411,7 @@ * the callbacks structures used to initialize the 802.11 stack. */ if (ieee->reset_on_keychange && ieee->iw_mode != IW_MODE_INFRA && - ieee->reset_port && ieee->reset_port(dev)) { + ieee->reset_port && ieee->reset_port(ieee)) { printk(KERN_DEBUG "%s: reset_port failed\n", dev->name); return -EINVAL; } -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:33:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:33:57 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GXqXq001489 for ; Fri, 3 Jun 2005 09:33:53 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 86D3B6282FC; Fri, 3 Jun 2005 18:32:54 +0200 (CEST) Date: Fri, 3 Jun 2005 18:32:54 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [5/9] ipw: fix after "ieee80211_device alignment fix" Message-ID: <20050603183254.03afaa81@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2031 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 30535 Lines: 1000 Fixes ipw2100 and ipw2200 after the API change (alignment, struct iee80211_device). Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/drivers/net/wireless/ipw2100.c =================================================================== --- netdev.orig/drivers/net/wireless/ipw2100.c 2005-06-01 11:03:37.000000000 +0200 +++ netdev/drivers/net/wireless/ipw2100.c 2005-06-03 11:57:33.000000000 +0200 @@ -1772,7 +1772,7 @@ /* Called by register_netdev() */ static int ipw2100_net_init(struct net_device *dev) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); return ipw2100_up(priv, 1); } @@ -3248,9 +3248,9 @@ return IRQ_NONE; } -static int ipw2100_tx(struct ieee80211_txb *txb, struct net_device *dev) +static int ipw2100_tx(struct ieee80211_txb *txb, struct ieee80211_device *ieee) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_priv(ieee); struct list_head *element; struct ipw2100_tx_packet *packet; unsigned long flags; @@ -3260,7 +3260,7 @@ if (!(priv->status & STATUS_ASSOCIATED)) { IPW_DEBUG_INFO("Can not transmit when not connected.\n"); priv->ieee->stats.tx_carrier_errors++; - netif_stop_queue(dev); + netif_stop_queue(ieee80211_dev(ieee)); goto fail_unlock; } @@ -3291,7 +3291,7 @@ return 0; fail_unlock: - netif_stop_queue(dev); + netif_stop_queue(ieee80211_dev(ieee)); spin_unlock_irqrestore(&priv->low_lock, flags); return 1; } @@ -5418,10 +5418,10 @@ ipw2100_configure_security(priv, 0); } -static void shim__set_security(struct net_device *dev, +static void shim__set_security(struct ieee80211_device *ieee, struct ieee80211_security *sec) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_priv(ieee); int i, force_update = 0; down(&priv->action_sem); @@ -5609,7 +5609,7 @@ * method as well) to talk to the firmware */ static int ipw2100_set_address(struct net_device *dev, void *p) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); struct sockaddr *addr = p; int err = 0; @@ -5637,7 +5637,7 @@ static int ipw2100_open(struct net_device *dev) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); unsigned long flags; IPW_DEBUG_INFO("dev->open\n"); @@ -5651,7 +5651,7 @@ static int ipw2100_close(struct net_device *dev) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); unsigned long flags; struct list_head *element; struct ipw2100_tx_packet *packet; @@ -5692,7 +5692,7 @@ */ static void ipw2100_tx_timeout(struct net_device *dev) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); priv->ieee->stats.tx_errors++; @@ -5715,7 +5715,7 @@ */ static struct net_device_stats *ipw2100_stats(struct net_device *dev) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); return &priv->ieee->stats; } @@ -5802,7 +5802,7 @@ } if (ieee->set_security) - ieee->set_security(ieee->dev, &sec); + ieee->set_security(ieee, &sec); else ret = -EOPNOTSUPP; @@ -5829,7 +5829,7 @@ } if (ieee->set_security) - ieee->set_security(ieee->dev, &sec); + ieee->set_security(ieee, &sec); else ret = -EOPNOTSUPP; @@ -5839,7 +5839,7 @@ static int ipw2100_wpa_set_param(struct net_device *dev, u8 name, u32 value){ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int ret=0; switch(name){ @@ -5878,7 +5878,7 @@ static int ipw2100_wpa_mlme(struct net_device *dev, int command, int reason){ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int ret=0; switch(command){ @@ -5920,8 +5920,8 @@ static int ipw2100_wpa_set_wpa_ie(struct net_device *dev, struct ipw2100_param *param, int plen){ - struct ipw2100_priv *priv = ieee80211_priv(dev); - struct ieee80211_device *ieee = priv->ieee; + struct ieee80211_device *ieee = netdev_priv(dev); + struct ipw2100_priv *priv = ieee80211_priv(ieee); u8 *buf; if (! ieee->wpa_enabled) @@ -5960,8 +5960,8 @@ struct ipw2100_param *param, int param_len){ int ret = 0; - struct ipw2100_priv *priv = ieee80211_priv(dev); - struct ieee80211_device *ieee = priv->ieee; + struct ieee80211_device *ieee = netdev_priv(dev); + struct ipw2100_priv *priv = ieee80211_priv(ieee); struct ieee80211_crypto_ops *ops; struct ieee80211_crypt_data **crypt; @@ -6081,7 +6081,7 @@ } done: if (ieee->set_security) - ieee->set_security(ieee->dev, &sec); + ieee->set_security(ieee, &sec); /* Do not reset port if card is in Managed mode since resetting will * generate new IEEE 802.11 authentication which may end up in looping @@ -6178,7 +6178,7 @@ static void ipw_ethtool_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); char fw_ver[64], ucode_ver[64]; strcpy(info->driver, DRV_NAME); @@ -6195,7 +6195,7 @@ static u32 ipw2100_ethtool_get_link(struct net_device *dev) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); return (priv->status & STATUS_ASSOCIATED) ? 1 : 0; } @@ -6288,12 +6288,14 @@ { struct ipw2100_priv *priv; struct net_device *dev; + struct ieee80211_device *ieee; - dev = alloc_ieee80211(sizeof(struct ipw2100_priv)); - if (!dev) + ieee = alloc_ieee80211(sizeof(struct ipw2100_priv)); + if (!ieee) return NULL; - priv = ieee80211_priv(dev); - priv->ieee = netdev_priv(dev); + dev = ieee80211_dev(ieee); + priv = ieee80211_priv(ieee); + priv->ieee = ieee; priv->pci_dev = pci_dev; priv->net_dev = dev; @@ -6477,7 +6479,7 @@ return err; } - priv = ieee80211_priv(dev); + priv = ieee80211_dev_to_priv(dev); pci_set_master(pci_dev); pci_set_drvdata(pci_dev, priv); @@ -6618,7 +6620,7 @@ ipw2100_queues_free(priv); sysfs_remove_group(&pci_dev->dev.kobj, &ipw2100_attribute_group); - free_ieee80211(dev); + free_ieee80211(netdev_priv(dev)); pci_set_drvdata(pci_dev, NULL); } @@ -6675,7 +6677,7 @@ if (dev->base_addr) iounmap((unsigned char *)dev->base_addr); - free_ieee80211(dev); + free_ieee80211(netdev_priv(dev)); } pci_release_regions(pci_dev); @@ -6918,7 +6920,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); if (!(priv->status & STATUS_ASSOCIATED)) strcpy(wrqu->name, "unassociated"); else @@ -6933,7 +6935,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); struct iw_freq *fwrq = &wrqu->freq; int err = 0; @@ -6984,7 +6986,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); wrqu->freq.e = 0; @@ -7005,7 +7007,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err = 0; IPW_DEBUG_WX("SET Mode -> %d \n", wrqu->mode); @@ -7048,7 +7050,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); wrqu->mode = priv->ieee->iw_mode; IPW_DEBUG_WX("GET Mode -> %d\n", wrqu->mode); @@ -7084,7 +7086,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); struct iw_range *range = (struct iw_range *)extra; u16 val; int i, level; @@ -7196,7 +7198,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err = 0; static const unsigned char any[] = { @@ -7251,7 +7253,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); /* If we are associated, trying to associate, or have a statically * configured BSSID then return that; otherwise return ANY */ @@ -7271,7 +7273,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); char *essid = ""; /* ANY */ int length = 0; int err = 0; @@ -7325,7 +7327,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); /* If we are associated, trying to associate, or have a statically * configured ESSID then return that; otherwise return ANY */ @@ -7353,7 +7355,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); if (wrqu->data.length > IW_ESSID_MAX_SIZE) return -E2BIG; @@ -7375,7 +7377,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); wrqu->data.length = strlen(priv->nick) + 1; memcpy(extra, priv->nick, wrqu->data.length); @@ -7390,7 +7392,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); u32 target_rate = wrqu->bitrate.value; u32 rate; int err = 0; @@ -7431,7 +7433,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int val; int len = sizeof(val); int err = 0; @@ -7483,7 +7485,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int value, err; /* Auto RTS not yet supported */ @@ -7523,7 +7525,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); wrqu->rts.value = priv->rts_threshold & ~RTS_DISABLED; wrqu->rts.fixed = 1; /* no auto select */ @@ -7540,7 +7542,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err = 0, value; if (priv->ieee->iw_mode != IW_MODE_ADHOC) @@ -7580,7 +7582,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); if (priv->ieee->iw_mode != IW_MODE_ADHOC) { wrqu->power.disabled = 1; @@ -7616,7 +7618,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); if (!wrqu->frag.fixed) return -EINVAL; @@ -7646,7 +7648,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); wrqu->frag.value = priv->frag_threshold & ~FRAG_DISABLED; wrqu->frag.fixed = 0; /* no auto select */ wrqu->frag.disabled = (priv->frag_threshold & FRAG_DISABLED) ? 1 : 0; @@ -7660,7 +7662,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err = 0; if (wrqu->retry.flags & IW_RETRY_LIFETIME || @@ -7709,7 +7711,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); wrqu->retry.disabled = 0; /* can't be disabled */ @@ -7738,7 +7740,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err = 0; down(&priv->action_sem); @@ -7769,7 +7771,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); return ieee80211_wx_get_scan(priv->ieee, info, wrqu, extra); } @@ -7785,7 +7787,7 @@ * No check of STATUS_INITIALIZED required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); return ieee80211_wx_set_encode(priv->ieee, info, wrqu, key); } @@ -7797,7 +7799,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); return ieee80211_wx_get_encode(priv->ieee, info, wrqu, key); } @@ -7805,7 +7807,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err = 0; down(&priv->action_sem); @@ -7855,7 +7857,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); if (!(priv->power_mode & IPW_POWER_ENABLED)) { wrqu->power.disabled = 1; @@ -7880,7 +7882,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int *parms = (int *)extra; int enable = (parms[0] > 0); int err = 0; @@ -7911,7 +7913,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); if (priv->status & STATUS_INITIALIZED) schedule_reset(priv); return 0; @@ -7923,7 +7925,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err = 0, mode = *(int *)extra; down(&priv->action_sem); @@ -7951,7 +7953,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int level = IPW_POWER_LEVEL(priv->power_mode); s32 timeout, period; @@ -7988,7 +7990,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); int err, mode = *(int *)extra; down(&priv->action_sem); @@ -8021,7 +8023,7 @@ * This can be called at any time. No action lock required */ - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); if (priv->config & CFG_LONG_PREAMBLE) snprintf(wrqu->name, IFNAMSIZ, "long (1)"); @@ -8163,7 +8165,7 @@ int tx_qual; int beacon_qual; - struct ipw2100_priv *priv = ieee80211_priv(dev); + struct ipw2100_priv *priv = ieee80211_dev_to_priv(dev); struct iw_statistics *wstats; u32 rssi, quality, tx_retries, missed_beacons, tx_failures; u32 ord_len = sizeof(u32); Index: netdev/drivers/net/wireless/ipw2200.c =================================================================== --- netdev.orig/drivers/net/wireless/ipw2200.c 2005-06-01 11:03:37.000000000 +0200 +++ netdev/drivers/net/wireless/ipw2200.c 2005-06-03 11:57:33.000000000 +0200 @@ -5157,7 +5157,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); if (!(priv->status & STATUS_ASSOCIATED)) strcpy(wrqu->name, "unassociated"); else @@ -5210,7 +5210,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); struct iw_freq *fwrq = &wrqu->freq; /* if setting by freq convert to channel */ @@ -5244,7 +5244,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); wrqu->freq.e = 0; @@ -5264,7 +5264,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); int err = 0; IPW_DEBUG_WX("Set MODE: %d\n", wrqu->mode); @@ -5317,7 +5317,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); wrqu->mode = priv->ieee->iw_mode; IPW_DEBUG_WX("Get MODE -> %d\n", wrqu->mode); @@ -5354,7 +5354,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); struct iw_range *range = (struct iw_range *)extra; u16 val; int i; @@ -5418,7 +5418,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); static const unsigned char any[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff @@ -5472,7 +5472,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); /* If we are associated, trying to associate, or have a statically * configured BSSID then return that; otherwise return ANY */ if (priv->config & CFG_STATIC_BSSID || @@ -5491,7 +5491,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); char *essid = ""; /* ANY */ int length = 0; @@ -5543,7 +5543,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); /* If we are associated, trying to associate, or have a statically * configured ESSID then return that; otherwise return ANY */ @@ -5567,7 +5567,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); IPW_DEBUG_WX("Setting nick to '%s'\n", extra); if (wrqu->data.length > IW_ESSID_MAX_SIZE) @@ -5586,7 +5586,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); IPW_DEBUG_WX("Getting nick\n"); wrqu->data.length = strlen(priv->nick) + 1; memcpy(extra, priv->nick, wrqu->data.length); @@ -5607,7 +5607,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv * priv = ieee80211_priv(dev); + struct ipw_priv * priv = ieee80211_dev_to_priv(dev); wrqu->bitrate.value = priv->last_rate; IPW_DEBUG_WX("GET Rate -> %d \n", wrqu->bitrate.value); @@ -5619,7 +5619,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); if (wrqu->rts.disabled) priv->rts_threshold = DEFAULT_RTS_THRESHOLD; @@ -5640,7 +5640,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); wrqu->rts.value = priv->rts_threshold; wrqu->rts.fixed = 0; /* no auto select */ wrqu->rts.disabled = @@ -5655,7 +5655,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); struct ipw_tx_power tx_power; int i; @@ -5699,7 +5699,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); wrqu->power.value = priv->tx_power; wrqu->power.fixed = 1; @@ -5717,7 +5717,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); if (wrqu->frag.disabled) priv->ieee->fts = DEFAULT_FTS; @@ -5738,7 +5738,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); wrqu->frag.value = priv->ieee->fts; wrqu->frag.fixed = 0; /* no auto select */ wrqu->frag.disabled = @@ -5771,7 +5771,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); IPW_DEBUG_WX("Start scan\n"); if (ipw_request_scan(priv)) return -EIO; @@ -5782,7 +5782,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); return ieee80211_wx_get_scan(priv->ieee, info, wrqu, extra); } @@ -5790,7 +5790,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *key) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); return ieee80211_wx_set_encode(priv->ieee, info, wrqu, key); } @@ -5798,7 +5798,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *key) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); return ieee80211_wx_get_encode(priv->ieee, info, wrqu, key); } @@ -5806,7 +5806,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); int err; if (wrqu->power.disabled) { @@ -5855,7 +5855,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); if (!(priv->power_mode & IPW_POWER_ENABLED)) { wrqu->power.disabled = 1; @@ -5872,7 +5872,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); int mode = *(int *)extra; int err; @@ -5900,7 +5900,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); int level = IPW_POWER_LEVEL(priv->power_mode); char *p = extra; @@ -5932,7 +5932,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); int mode = *(int *)extra; u8 band = 0, modulation = 0; @@ -5998,7 +5998,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); switch (priv->ieee->freq_band) { case IEEE80211_24GHZ_BAND: @@ -6046,7 +6046,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); int *parms = (int *)extra; int enable = (parms[0] > 0); @@ -6072,7 +6072,7 @@ struct iw_request_info *info, union iwreq_data *wrqu, char *extra) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); IPW_DEBUG_WX("RESET\n"); ipw_adapter_restart(priv); return 0; @@ -6185,7 +6185,7 @@ */ static struct iw_statistics *ipw_get_wireless_stats(struct net_device * dev) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); struct iw_statistics *wstats; wstats = &priv->wstats; @@ -6248,7 +6248,7 @@ static int ipw_net_open(struct net_device *dev) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); IPW_DEBUG_INFO("dev->open\n"); /* we should be verifying the device is ready to be opened */ if (!(priv->status & STATUS_RF_KILL_MASK) && @@ -6394,9 +6394,9 @@ } static int ipw_net_hard_start_xmit(struct ieee80211_txb *txb, - struct net_device *dev) + struct ieee80211_device *ieee) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_priv(ieee); unsigned long flags; IPW_DEBUG_TX("dev->xmit(%d bytes)\n", txb->payload_size); @@ -6406,7 +6406,7 @@ if (!(priv->status & STATUS_ASSOCIATED)) { IPW_DEBUG_INFO("Tx attempt while not associated.\n"); priv->ieee->stats.tx_carrier_errors++; - netif_stop_queue(dev); + netif_stop_queue(ieee80211_dev(ieee)); goto fail_unlock; } @@ -6422,7 +6422,7 @@ static struct net_device_stats *ipw_net_get_stats(struct net_device *dev) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); priv->ieee->stats.tx_packets = priv->tx_packets; priv->ieee->stats.rx_packets = priv->rx_packets; @@ -6436,7 +6436,7 @@ static int ipw_net_set_mac_address(struct net_device *dev, void *p) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); struct sockaddr *addr = p; if (!is_valid_ether_addr(addr->sa_data)) return -EADDRNOTAVAIL; @@ -6451,7 +6451,7 @@ static void ipw_ethtool_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { - struct ipw_priv *p = ieee80211_priv(dev); + struct ipw_priv *p = ieee80211_dev_to_priv(dev); char vers[64]; char date[32]; u32 len; @@ -6472,7 +6472,7 @@ static u32 ipw_ethtool_get_link(struct net_device *dev) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); return (priv->status & STATUS_ASSOCIATED) != 0; } @@ -6484,7 +6484,7 @@ static int ipw_ethtool_get_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, u8 *bytes) { - struct ipw_priv *p = ieee80211_priv(dev); + struct ipw_priv *p = ieee80211_dev_to_priv(dev); if (eeprom->offset + eeprom->len > CX2_EEPROM_IMAGE_SIZE) return -EINVAL; @@ -6496,7 +6496,7 @@ static int ipw_ethtool_set_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, u8 *bytes) { - struct ipw_priv *p = ieee80211_priv(dev); + struct ipw_priv *p = ieee80211_dev_to_priv(dev); int i; if (eeprom->offset + eeprom->len > CX2_EEPROM_IMAGE_SIZE) @@ -6633,10 +6633,10 @@ } -static void shim__set_security(struct net_device *dev, +static void shim__set_security(struct ieee80211_device *ieee, struct ieee80211_security *sec) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_priv(ieee); int i; for (i = 0; i < 4; i++) { @@ -6874,7 +6874,7 @@ /* Called by register_netdev() */ static int ipw_net_init(struct net_device *dev) { - struct ipw_priv *priv = ieee80211_priv(dev); + struct ipw_priv *priv = ieee80211_dev_to_priv(dev); if (priv->status & STATUS_RF_KILL_SW) { IPW_WARNING("Radio disabled by module parameter.\n"); @@ -6952,19 +6952,21 @@ { int err = 0; struct net_device *net_dev; + struct ieee80211_device *ieee; void __iomem *base; u32 length, val; struct ipw_priv *priv; int band, modulation; - net_dev = alloc_ieee80211(sizeof(struct ipw_priv)); - if (net_dev == NULL) { + ieee = alloc_ieee80211(sizeof(struct ipw_priv)); + if (ieee == NULL) { err = -ENOMEM; goto out; } + net_dev = ieee80211_dev(ieee); - priv = ieee80211_priv(net_dev); - priv->ieee = netdev_priv(net_dev); + priv = ieee80211_priv(ieee); + priv->ieee = ieee; priv->net_dev = net_dev; priv->pci_dev = pdev; #ifdef CONFIG_IPW_DEBUG @@ -7160,7 +7162,7 @@ pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); out_free_ieee80211: - free_ieee80211(priv->net_dev); + free_ieee80211(priv->ieee); out: return err; } @@ -7202,7 +7204,7 @@ pci_release_regions(pdev); pci_disable_device(pdev); pci_set_drvdata(pdev, NULL); - free_ieee80211(priv->net_dev); + free_ieee80211(priv->ieee); #ifdef CONFIG_PM if (fw_loaded) { -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:35:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:35:21 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GZGXq002144 for ; Fri, 3 Jun 2005 09:35:17 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 960F16282FC; Fri, 3 Jun 2005 18:34:18 +0200 (CEST) Date: Fri, 3 Jun 2005 18:34:18 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [6/9] ieee80211: ethernet independency Message-ID: <20050603183418.58c47b0c@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2032 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 38022 Lines: 1183 Makes the 802.11 layer independent of ethernet. (The previous implementation had the ethernet headers built by the ethernet layer and then parsed them and rebuilt them into 802.11 headers.) Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/include/linux/netdevice.h =================================================================== --- netdev.orig/include/linux/netdevice.h 2005-06-01 11:05:01.000000000 +0200 +++ netdev/include/linux/netdevice.h 2005-06-03 13:21:00.000000000 +0200 @@ -83,13 +83,18 @@ * used. */ -#if !defined(CONFIG_AX25) && !defined(CONFIG_AX25_MODULE) && !defined(CONFIG_TR) +#if !defined(CONFIG_AX25) && !defined(CONFIG_AX25_MODULE) && !defined(CONFIG_TR) \ + && !defined(CONFIG_IEEE80211) #define LL_MAX_HEADER 32 #else #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE) #define LL_MAX_HEADER 96 #else +#if defined(CONFIG_TR) #define LL_MAX_HEADER 48 +#else +#define LL_MAX_HEADER 38 +#endif #endif #endif Index: netdev/include/net/ieee80211.h =================================================================== --- netdev.orig/include/net/ieee80211.h 2005-06-03 13:20:46.000000000 +0200 +++ netdev/include/net/ieee80211.h 2005-06-03 13:21:00.000000000 +0200 @@ -20,7 +20,6 @@ */ #ifndef IEEE80211_H #define IEEE80211_H -#include /* ETH_ALEN */ #include /* ARRAY_SIZE */ #if WIRELESS_EXT < 17 @@ -42,25 +41,26 @@ WEP IV and ICV. (this interpretation suggested by Ramiro Barreiro) */ +#define IEEE80211_ALEN 6 #define IEEE80211_HLEN 30 #define IEEE80211_FRAME_LEN (IEEE80211_DATA_LEN + IEEE80211_HLEN) struct ieee80211_hdr { u16 frame_ctl; u16 duration_id; - u8 addr1[ETH_ALEN]; - u8 addr2[ETH_ALEN]; - u8 addr3[ETH_ALEN]; + u8 addr1[IEEE80211_ALEN]; + u8 addr2[IEEE80211_ALEN]; + u8 addr3[IEEE80211_ALEN]; u16 seq_ctl; - u8 addr4[ETH_ALEN]; + u8 addr4[IEEE80211_ALEN]; } __attribute__ ((packed)); struct ieee80211_hdr_3addr { u16 frame_ctl; u16 duration_id; - u8 addr1[ETH_ALEN]; - u8 addr2[ETH_ALEN]; - u8 addr3[ETH_ALEN]; + u8 addr1[IEEE80211_ALEN]; + u8 addr2[IEEE80211_ALEN]; + u8 addr3[IEEE80211_ALEN]; u16 seq_ctl; } __attribute__ ((packed)); @@ -233,7 +233,7 @@ #define ETH_P_PREAUTH 0x88C7 /* IEEE 802.11i pre-authentication */ #ifndef ETH_P_80211_RAW -#define ETH_P_80211_RAW (ETH_P_ECONET + 1) +#define ETH_P_80211_RAW 0x0003 #endif /* IEEE 802.11 defines */ @@ -246,11 +246,29 @@ u8 ssap; /* always 0xAA */ u8 ctrl; /* always 0x03 */ u8 oui[P80211_OUI_LEN]; /* organizational universal id */ + u16 type; /* packet type ID field */ } __attribute__ ((packed)); #define SNAP_SIZE sizeof(struct ieee80211_snap_hdr) +#define IEEE80211_SNAP_IS_RFC1042(snap) \ + ((snap)->oui[0] == 0 && (snap)->oui[1] == 0 && (snap)->oui[2] == 0) +#define IEEE80211_SNAP_IS_BRIDGE_TUNNEL(snap) \ + ((snap)->oui[0] == 0 && (snap)->oui[1] == 0 && (snap)->oui[2] == 0xf8) + +#define IEEE80211_FC_GET_TODS(hdr) \ + ((hdr)->frame_ctl & __constant_cpu_to_le16(IEEE80211_FCTL_TODS)) +#define IEEE80211_FC_GET_FROMDS(hdr) \ + ((hdr)->frame_ctl & __constant_cpu_to_le16(IEEE80211_FCTL_FROMDS)) +#define IEEE80211_GET_DADDR(hdr) \ + (IEEE80211_FC_GET_TODS(hdr) ? (hdr)->addr3 : (hdr)->addr1) +#define IEEE80211_GET_SADDR(hdr) \ + (IEEE80211_FC_GET_FROMDS(hdr) ? \ + (IEEE80211_FC_GET_TODS(hdr) ? (hdr)->addr4 : (hdr)->addr3) \ + : (hdr)->addr2) +/* IEEE80211_GET_xADDR do not work when both TODS and FROMDS are set. */ + #define WLAN_FC_GET_TYPE(fc) ((fc) & IEEE80211_FCTL_FTYPE) #define WLAN_FC_GET_STYPE(fc) ((fc) & IEEE80211_FCTL_STYPE) @@ -395,8 +413,8 @@ unsigned int seq; unsigned int last_frag; struct sk_buff *skb; - u8 src_addr[ETH_ALEN]; - u8 dst_addr[ETH_ALEN]; + u8 src_addr[IEEE80211_ALEN]; + u8 dst_addr[IEEE80211_ALEN]; }; struct ieee80211_stats { @@ -507,7 +525,7 @@ u16 auth_sequence; u16 beacon_interval; u16 capability; - u8 current_ap[ETH_ALEN]; + u8 current_ap[IEEE80211_ALEN]; u16 listen_interval; struct { u16 association_id:14, reserved:2; @@ -537,7 +555,7 @@ struct ieee80211_assoc_request_frame { u16 capability; u16 listen_interval; - u8 current_ap[ETH_ALEN]; + u8 current_ap[IEEE80211_ALEN]; struct ieee80211_info_element info_element; } __attribute__ ((packed)); @@ -581,7 +599,7 @@ struct ieee80211_network { /* These entries are used to identify a unique network */ - u8 bssid[ETH_ALEN]; + u8 bssid[IEEE80211_ALEN]; u8 channel; /* Ensure null-terminated for any debug msgs */ u8 ssid[IW_ESSID_MAX_SIZE + 1]; @@ -625,12 +643,12 @@ #define MAC_ARG(x) ((u8*)(x))[0],((u8*)(x))[1],((u8*)(x))[2],((u8*)(x))[3],((u8*)(x))[4],((u8*)(x))[5] -extern inline int is_multicast_ether_addr(const u8 *addr) +extern inline int is_multicast_ieee80211_addr(const u8 *addr) { return ((addr[0] != 0xff) && (0x01 & addr[0])); } -extern inline int is_broadcast_ether_addr(const u8 *addr) +extern inline int is_broadcast_ieee80211_addr(const u8 *addr) { return ((addr[0] == 0xff) && (addr[1] == 0xff) && (addr[2] == 0xff) && \ (addr[3] == 0xff) && (addr[4] == 0xff) && (addr[5] == 0xff)); @@ -694,7 +712,7 @@ u16 fts; /* Fragmentation Threshold */ /* Association info */ - u8 bssid[ETH_ALEN]; + u8 bssid[IEEE80211_ALEN]; enum ieee80211_state state; @@ -783,7 +801,7 @@ return 0; } -extern inline int ieee80211_get_hdrlen(u16 fc) +extern inline int __ieee80211_get_hdrlen(u16 fc) { int hdrlen = IEEE80211_3ADDR_LEN; @@ -807,12 +825,29 @@ return hdrlen; } +#define ieee80211_get_hdrlen(hdr) __ieee80211_get_hdrlen(le16_to_cpu((hdr)->frame_ctl)) +#define IEEE80211_GET_DATA_HDR_LEN(hdr) \ + ((((hdr)->frame_ctl & \ + __constant_cpu_to_le16(IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) \ + == __constant_cpu_to_le16(IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) \ + ? IEEE80211_4ADDR_LEN : IEEE80211_3ADDR_LEN) +#define IEEE80211_GET_SNAP(hdr) \ + ((struct ieee80211_snap_hdr *) \ + ((u8 *)(hdr) + IEEE80211_GET_DATA_HDR_LEN(hdr))) + +extern inline int ieee80211_get_proto(struct ieee80211_hdr *header) +{ + struct ieee80211_snap_hdr *snap = IEEE80211_GET_SNAP(header); + return (snap->dsap == 0xaa && snap->ssap == 0xaa ? + ntohs(snap->type) : ETH_P_802_2); +} /* ieee80211.c */ extern void free_ieee80211(struct ieee80211_device *ieee); extern struct ieee80211_device *alloc_ieee80211(int sizeof_priv); +extern void ieee80211_setup(struct net_device *dev); extern int ieee80211_set_encryption(struct ieee80211_device *ieee); Index: netdev/net/ieee80211/ieee80211_rx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_rx.c 2005-06-03 13:20:46.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_rx.c 2005-06-03 13:21:00.000000000 +0200 @@ -41,11 +41,10 @@ struct ieee80211_rx_stats *rx_stats) { struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; - u16 fc = le16_to_cpu(hdr->frame_ctl); skb->dev = ieee->dev; skb->mac.raw = skb->data; - skb_pull(skb, ieee80211_get_hdrlen(fc)); + skb_pull(skb, ieee80211_get_hdrlen(hdr)); skb->pkt_type = PACKET_OTHERHOST; skb->protocol = __constant_htons(ETH_P_80211_RAW); memset(skb->cb, 0, sizeof(skb->cb)); @@ -75,8 +74,8 @@ if (entry->skb != NULL && entry->seq == seq && (entry->last_frag + 1 == frag || frag == -1) && - memcmp(entry->src_addr, src, ETH_ALEN) == 0 && - memcmp(entry->dst_addr, dst, ETH_ALEN) == 0) + memcmp(entry->src_addr, src, IEEE80211_ALEN) == 0 && + memcmp(entry->dst_addr, dst, IEEE80211_ALEN) == 0) return entry; } @@ -103,7 +102,7 @@ sizeof(struct ieee80211_hdr) + 8 /* LLC */ + 2 /* alignment */ + - 8 /* WEP */ + ETH_ALEN /* WDS */); + 8 /* WEP */ + IEEE80211_ALEN /* WDS */); if (skb == NULL) return NULL; @@ -119,8 +118,8 @@ entry->seq = seq; entry->last_frag = frag; entry->skb = skb; - memcpy(entry->src_addr, hdr->addr2, ETH_ALEN); - memcpy(entry->dst_addr, hdr->addr1, ETH_ALEN); + memcpy(entry->src_addr, hdr->addr2, IEEE80211_ALEN); + memcpy(entry->dst_addr, hdr->addr1, IEEE80211_ALEN); } else { /* received a fragment of a frame for which the head fragment * should have already been received */ @@ -220,15 +219,6 @@ #endif -/* See IEEE 802.1H for LLC/SNAP encapsulation/decapsulation */ -/* Ethernet-II snap header (RFC1042 for most EtherTypes) */ -static unsigned char rfc1042_header[] = -{ 0xaa, 0xaa, 0x03, 0x00, 0x00, 0x00 }; -/* Bridge-Tunnel header (for EtherTypes ETH_P_AARP and ETH_P_IPX) */ -static unsigned char bridge_tunnel_header[] = -{ 0xaa, 0xaa, 0x03, 0x00, 0x00, 0xf8 }; -/* No encapsulation header if EtherType < 0x600 (=length) */ - /* Called by ieee80211_rx_frame_decrypt */ static int ieee80211_is_eapol_frame(struct ieee80211_device *ieee, struct sk_buff *skb) @@ -236,7 +226,6 @@ struct net_device *dev = ieee80211_dev(ieee); u16 fc, ethertype; struct ieee80211_hdr *hdr; - u8 *pos; if (skb->len < 24) return 0; @@ -247,12 +236,12 @@ /* check that the frame is unicast frame to us */ if ((fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) == IEEE80211_FCTL_TODS && - memcmp(hdr->addr1, dev->dev_addr, ETH_ALEN) == 0 && - memcmp(hdr->addr3, dev->dev_addr, ETH_ALEN) == 0) { + memcmp(hdr->addr1, dev->dev_addr, IEEE80211_ALEN) == 0 && + memcmp(hdr->addr3, dev->dev_addr, IEEE80211_ALEN) == 0) { /* ToDS frame with own addr BSSID and DA */ } else if ((fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) == IEEE80211_FCTL_FROMDS && - memcmp(hdr->addr1, dev->dev_addr, ETH_ALEN) == 0) { + memcmp(hdr->addr1, dev->dev_addr, IEEE80211_ALEN) == 0) { /* FromDS frame with own addr as DA */ } else return 0; @@ -261,8 +250,7 @@ return 0; /* check for port access entity Ethernet type */ - pos = skb->data + 24; - ethertype = (pos[6] << 8) | pos[7]; + ethertype = ieee80211_get_proto(hdr); if (ethertype == ETH_P_PAE) return 1; @@ -281,7 +269,7 @@ return 0; hdr = (struct ieee80211_hdr *) skb->data; - hdrlen = ieee80211_get_hdrlen(le16_to_cpu(hdr->frame_ctl)); + hdrlen = ieee80211_get_hdrlen(hdr); #ifdef CONFIG_IEEE80211_CRYPT_TKIP if (ieee->tkip_countermeasures && @@ -326,7 +314,7 @@ return 0; hdr = (struct ieee80211_hdr *) skb->data; - hdrlen = ieee80211_get_hdrlen(le16_to_cpu(hdr->frame_ctl)); + hdrlen = ieee80211_get_hdrlen(hdr); atomic_inc(&crypt->refcnt); res = crypt->ops->decrypt_msdu(skb, keyidx, hdrlen, crypt->priv); @@ -342,6 +330,44 @@ } +unsigned short ieee80211_type_trans(struct sk_buff *skb, + struct ieee80211_device *ieee) +{ + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; + struct ieee80211_snap_hdr *snap; + int hdrlen; + u8 *daddr = IEEE80211_GET_DADDR(hdr); + unsigned short type; + + skb->mac.raw = skb->data; + + hdrlen = ieee80211_get_hdrlen(hdr); + snap = (struct ieee80211_snap_hdr *)(skb->data + hdrlen); + if (snap->dsap == 0xaa && snap->ssap == 0xaa && + ((IEEE80211_SNAP_IS_RFC1042(snap) && + snap->type != __constant_htons(ETH_P_AARP) && + snap->type != __constant_htons(ETH_P_IPX)) || + IEEE80211_SNAP_IS_BRIDGE_TUNNEL(snap))) { + type = snap->type; + skb_pull(skb, hdrlen + SNAP_SIZE); + } + else { + type = __constant_htons(ETH_P_802_2); + skb_pull(skb, hdrlen); + } + + skb->input_dev = ieee->dev; + if (is_broadcast_ieee80211_addr(daddr)) + skb->pkt_type = PACKET_BROADCAST; + else if (is_multicast_ieee80211_addr(daddr)) + skb->pkt_type = PACKET_MULTICAST; + else if (memcmp(daddr, ieee->dev->dev_addr, IEEE80211_ALEN)) + skb->pkt_type = PACKET_OTHERHOST; + + return type; +} + + /* All received frames are sent to this function. @skb contains the frame in * IEEE 802.11 format, i.e., in the format it was sent over air. * This function is called only as a tasklet (software IRQ). */ @@ -354,8 +380,6 @@ u16 fc, type, stype, sc; struct net_device_stats *stats; unsigned int frag; - u8 *payload; - u16 ethertype; #ifdef NOT_YET struct net_device *wds = NULL; struct sk_buff *skb2 = NULL; @@ -364,8 +388,8 @@ int from_assoc_ap = 0; void *sta = NULL; #endif - u8 dst[ETH_ALEN]; - u8 src[ETH_ALEN]; + u8 dst[IEEE80211_ALEN]; + u8 src[IEEE80211_ALEN]; struct ieee80211_crypt_data *crypt = NULL; int keyidx = 0; @@ -383,7 +407,7 @@ stype = WLAN_FC_GET_STYPE(fc); sc = le16_to_cpu(hdr->seq_ctl); frag = WLAN_GET_SEQ_FRAG(sc); - hdrlen = ieee80211_get_hdrlen(fc); + hdrlen = __ieee80211_get_hdrlen(fc); #ifdef NOT_YET #if WIRELESS_EXT > 15 @@ -479,22 +503,23 @@ switch (fc & (IEEE80211_FCTL_FROMDS | IEEE80211_FCTL_TODS)) { case IEEE80211_FCTL_FROMDS: - memcpy(dst, hdr->addr1, ETH_ALEN); - memcpy(src, hdr->addr3, ETH_ALEN); + memcpy(dst, hdr->addr1, IEEE80211_ALEN); + memcpy(src, hdr->addr3, IEEE80211_ALEN); break; case IEEE80211_FCTL_TODS: - memcpy(dst, hdr->addr3, ETH_ALEN); - memcpy(src, hdr->addr2, ETH_ALEN); + memcpy(dst, hdr->addr3, IEEE80211_ALEN); + memcpy(src, hdr->addr2, IEEE80211_ALEN); break; case IEEE80211_FCTL_FROMDS | IEEE80211_FCTL_TODS: if (skb->len < IEEE80211_4ADDR_LEN) goto rx_dropped; - memcpy(dst, hdr->addr3, ETH_ALEN); - memcpy(src, hdr->addr4, ETH_ALEN); + memcpy(dst, hdr->addr3, IEEE80211_ALEN); + memcpy(src, hdr->addr4, IEEE80211_ALEN); + /* FIXME: this is wrong */ break; case 0: - memcpy(dst, hdr->addr1, ETH_ALEN); - memcpy(src, hdr->addr2, ETH_ALEN); + memcpy(dst, hdr->addr1, IEEE80211_ALEN); + memcpy(src, hdr->addr2, IEEE80211_ALEN); break; } @@ -509,7 +534,7 @@ if (ieee->iw_mode == IW_MODE_MASTER && !wds && (fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) == IEEE80211_FCTL_FROMDS && ieee->stadev && - memcmp(hdr->addr2, ieee->assoc_ap_addr, ETH_ALEN) == 0) { + memcmp(hdr->addr2, ieee->assoc_ap_addr, IEEE80211_ALEN) == 0) { /* Frame from BSSID of the AP for which we are a client */ skb->dev = dev = ieee->stadev; stats = hostap_get_stats(dev); @@ -667,9 +692,6 @@ /* skb: hdr + (possible reassembled) full plaintext payload */ - payload = skb->data + hdrlen; - ethertype = (payload[6] << 8) | payload[7]; - #ifdef NOT_YET /* If IEEE 802.1X is used, check whether the port is authorized to send * the received frame. */ @@ -696,38 +718,6 @@ } #endif - /* convert hdr + possible LLC headers into Ethernet header */ - if (skb->len - hdrlen >= 8 && - ((memcmp(payload, rfc1042_header, SNAP_SIZE) == 0 && - ethertype != ETH_P_AARP && ethertype != ETH_P_IPX) || - memcmp(payload, bridge_tunnel_header, SNAP_SIZE) == 0)) { - /* remove RFC1042 or Bridge-Tunnel encapsulation and - * replace EtherType */ - skb_pull(skb, hdrlen + SNAP_SIZE); - memcpy(skb_push(skb, ETH_ALEN), src, ETH_ALEN); - memcpy(skb_push(skb, ETH_ALEN), dst, ETH_ALEN); - } else { - u16 len; - /* Leave Ethernet header part of hdr and full payload */ - skb_pull(skb, hdrlen); - len = htons(skb->len); - memcpy(skb_push(skb, 2), &len, 2); - memcpy(skb_push(skb, ETH_ALEN), src, ETH_ALEN); - memcpy(skb_push(skb, ETH_ALEN), dst, ETH_ALEN); - } - -#ifdef NOT_YET - if (wds && ((fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) == - IEEE80211_FCTL_TODS) && - skb->len >= ETH_HLEN + ETH_ALEN) { - /* Non-standard frame: get addr4 from its bogus location after - * the payload */ - memcpy(skb->data + ETH_ALEN, - skb->data + skb->len - ETH_ALEN, ETH_ALEN); - skb_trim(skb, skb->len - ETH_ALEN); - } -#endif - stats->rx_packets++; stats->rx_bytes += skb->len; @@ -753,7 +743,7 @@ if (skb2 != NULL) { /* send to wireless media */ - skb2->protocol = __constant_htons(ETH_P_802_3); + skb2->protocol = ieee80211_type_trans(skb2, ieee); skb2->mac.raw = skb2->nh.raw = skb2->data; /* skb2->nh.raw = skb2->data + ETH_HLEN; */ skb2->dev = dev; @@ -763,7 +753,7 @@ #endif if (skb) { - skb->protocol = eth_type_trans(skb, dev); + skb->protocol = ieee80211_type_trans(skb, ieee); memset(skb->cb, 0, sizeof(skb->cb)); skb->dev = dev; skb->ip_summed = CHECKSUM_NONE; /* 802.11 crc not sufficient */ @@ -820,7 +810,7 @@ u8 i; /* Pull out fixed field data */ - memcpy(network->bssid, beacon->header.addr3, ETH_ALEN); + memcpy(network->bssid, beacon->header.addr3, IEEE80211_ALEN); network->capability = beacon->capability; network->last_scanned = jiffies; network->time_stamp[0] = beacon->time_stamp[0]; @@ -848,7 +838,7 @@ while (left >= sizeof(struct ieee80211_info_element_hdr)) { if (sizeof(struct ieee80211_info_element_hdr) + info_element->len > left) { IEEE80211_DEBUG_SCAN("SCAN: parse failed: info_element->len + 2 > left : info_element->len+2=%d left=%d.\n", - info_element->len + sizeof(struct ieee80211_info_element), + info_element->len + (int)sizeof(struct ieee80211_info_element), left); return 1; } @@ -1016,7 +1006,7 @@ * as one network */ return ((src->ssid_len == dst->ssid_len) && (src->channel == dst->channel) && - !memcmp(src->bssid, dst->bssid, ETH_ALEN) && + !memcmp(src->bssid, dst->bssid, IEEE80211_ALEN) && !memcmp(src->ssid, dst->ssid, src->ssid_len)); } Index: netdev/net/ieee80211/ieee80211_module.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_module.c 2005-06-03 13:20:46.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_module.c 2005-06-03 13:21:00.000000000 +0200 @@ -47,7 +47,6 @@ #include #include #include -#include #include #include @@ -102,24 +101,22 @@ { struct ieee80211_device *ieee; struct net_device *dev; - int alloc_size; + int alloc_size; int err; IEEE80211_DEBUG_INFO("Initializing...\n"); - alloc_size = ((sizeof(struct ieee80211_device) + NETDEV_ALIGN_CONST) - & ~NETDEV_ALIGN_CONST) - + sizeof_priv; - dev = alloc_etherdev(alloc_size); + alloc_size = ((sizeof(struct ieee80211_device) + NETDEV_ALIGN_CONST) + & ~NETDEV_ALIGN_CONST) + + sizeof_priv; + dev = alloc_netdev(alloc_size, "wlan%d", ieee80211_setup); if (!dev) { - IEEE80211_ERROR("Unable to network device.\n"); + IEEE80211_ERROR("Unable to allocate network device.\n"); goto failed; } ieee = netdev_priv(dev); ieee->dev = dev; ieee->priv = ieee80211_priv(ieee); - - dev->hard_start_xmit = ieee80211_xmit; err = ieee80211_networks_allocate(ieee); if (err) { Index: netdev/net/ieee80211/ieee80211_tx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_tx.c 2005-06-03 13:20:46.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_tx.c 2005-06-03 13:21:00.000000000 +0200 @@ -83,16 +83,6 @@ Total: 8 non-data bytes -802.3 Ethernet Data Frame - - ,-----------------------------------------. -Bytes | 6 | 6 | 2 | Variable | 4 | - |-------|-------|------|-----------|------| -Desc. | Dest. | Source| Type | IP Packet | fcs | - | MAC | MAC | | | | - `-----------------------------------------' -Total: 18 non-data bytes - In the event that fragmentation is required, the incoming payload is split into N parts of size ieee->fts. The first fragment contains the SNAP header and the remaining packets are just data. @@ -103,56 +93,8 @@ encryption it will take 3 frames. With WEP it will take 4 frames as the payload of each frame is reduced to 492 bytes. -* SKB visualization -* -* ,- skb->data -* | -* | ETHERNET HEADER ,-<-- PAYLOAD -* | | 14 bytes from skb->data -* | 2 bytes for Type --> ,T. | (sizeof ethhdr) -* | | | | -* |,-Dest.--. ,--Src.---. | | | -* | 6 bytes| | 6 bytes | | | | -* v | | | | | | -* 0 | v 1 | v | v 2 -* 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 -* ^ | ^ | ^ | -* | | | | | | -* | | | | `T' <---- 2 bytes for Type -* | | | | -* | | '---SNAP--' <-------- 6 bytes for SNAP -* | | -* `-IV--' <-------------------- 4 bytes for IV (WEP) -* -* SNAP HEADER -* */ -static u8 P802_1H_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0xf8 }; -static u8 RFC1042_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0x00 }; - -static inline int ieee80211_put_snap(u8 *data, u16 h_proto) -{ - struct ieee80211_snap_hdr *snap; - u8 *oui; - - snap = (struct ieee80211_snap_hdr *)data; - snap->dsap = 0xaa; - snap->ssap = 0xaa; - snap->ctrl = 0x03; - - if (h_proto == 0x8137 || h_proto == 0x80f3) - oui = P802_1H_OUI; - else - oui = RFC1042_OUI; - snap->oui[0] = oui[0]; - snap->oui[1] = oui[1]; - snap->oui[2] = oui[2]; - - *(u16 *)(data + SNAP_SIZE) = htons(h_proto); - - return SNAP_SIZE + sizeof(u16); -} static inline int ieee80211_encrypt_fragment( struct ieee80211_device *ieee, @@ -247,19 +189,16 @@ struct net_device *dev) { struct ieee80211_device *ieee = netdev_priv(dev); + struct ieee80211_hdr *header = (struct ieee80211_hdr *)skb->data; struct ieee80211_txb *txb = NULL; struct ieee80211_hdr *frag_hdr; int i, bytes_per_frag, nr_frags, bytes_last_frag, frag_size; unsigned long flags; struct net_device_stats *stats = &ieee->stats; - int ether_type, encrypt; + int type, encrypt; int bytes, fc, hdr_len; struct sk_buff *skb_frag; - struct ieee80211_hdr header = { /* Ensure zero initialized */ - .duration_id = 0, - .seq_ctl = 0 - }; - u8 dest[ETH_ALEN], src[ETH_ALEN]; + u8 *dest; struct ieee80211_crypt_data* crypt; @@ -268,76 +207,48 @@ /* If there is no driver handler to take the TXB, dont' bother * creating it... */ if (!ieee->hard_start_xmit) { - printk(KERN_WARNING "%s: No xmit handler.\n", - dev->name); + if (printk_ratelimit()) + printk(KERN_WARNING "%s: No xmit handler.\n", + dev->name); goto success; } - if (unlikely(skb->len < SNAP_SIZE + sizeof(u16))) { - printk(KERN_WARNING "%s: skb too small (%d).\n", - dev->name, skb->len); - goto success; - } - - ether_type = ntohs(((struct ethhdr *)skb->data)->h_proto); + type = ieee80211_get_proto(header); + dest = IEEE80211_GET_DADDR(header); + hdr_len = ieee80211_get_hdrlen(header); crypt = ieee->crypt[ieee->tx_keyidx]; - encrypt = !(ether_type == ETH_P_PAE && ieee->ieee802_1x) && + encrypt = !(type == ETH_P_PAE && ieee->ieee802_1x) && ieee->host_encrypt && crypt && crypt->ops; if (!encrypt && ieee->ieee802_1x && - ieee->drop_unencrypted && ether_type != ETH_P_PAE) { + ieee->drop_unencrypted && type != ETH_P_PAE) { stats->tx_dropped++; goto success; } #ifdef CONFIG_IEEE80211_DEBUG - if (crypt && !encrypt && ether_type == ETH_P_PAE) { - struct eapol *eap = (struct eapol *)(skb->data + - sizeof(struct ethhdr) - SNAP_SIZE - sizeof(u16)); + if (crypt && !encrypt && type == ETH_P_PAE) { + struct eapol *eap = (struct eapol *)(skb->data + hdr_len); IEEE80211_DEBUG_EAP("TX: IEEE 802.11 EAPOL frame: %s\n", eap_get_type(eap->type)); } #endif - /* Save source and destination addresses */ - memcpy(&dest, skb->data, ETH_ALEN); - memcpy(&src, skb->data+ETH_ALEN, ETH_ALEN); - - /* Advance the SKB to the start of the payload */ - skb_pull(skb, sizeof(struct ethhdr)); - /* Determine total amount of storage required for TXB packets */ - bytes = skb->len + SNAP_SIZE + sizeof(u16); + bytes = skb->len - hdr_len; + fc = le16_to_cpu(header->frame_ctl); if (encrypt) - fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA | - IEEE80211_FCTL_WEP; - else - fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA; + fc |= IEEE80211_FCTL_WEP; - if (ieee->iw_mode == IW_MODE_INFRA) { - fc |= IEEE80211_FCTL_TODS; - /* To DS: Addr1 = BSSID, Addr2 = SA, - Addr3 = DA */ - memcpy(&header.addr1, ieee->bssid, ETH_ALEN); - memcpy(&header.addr2, &src, ETH_ALEN); - memcpy(&header.addr3, &dest, ETH_ALEN); - } else if (ieee->iw_mode == IW_MODE_ADHOC) { - /* not From/To DS: Addr1 = DA, Addr2 = SA, - Addr3 = BSSID */ - memcpy(&header.addr1, dest, ETH_ALEN); - memcpy(&header.addr2, src, ETH_ALEN); - memcpy(&header.addr3, ieee->bssid, ETH_ALEN); - } - header.frame_ctl = cpu_to_le16(fc); - hdr_len = IEEE80211_3ADDR_LEN; + header->frame_ctl = cpu_to_le16(fc); /* Determine fragmentation size based on destination (multicast * and broadcast are not fragmented) */ - if (is_multicast_ether_addr(dest) || - is_broadcast_ether_addr(dest)) + if (is_multicast_ieee80211_addr(dest) || + is_broadcast_ieee80211_addr(dest)) frag_size = MAX_FRAG_THRESHOLD; else frag_size = ieee->fts; @@ -346,7 +257,7 @@ * this stack is providing the full 802.11 header, one will * eventually be affixed to this fragment -- so we must account for * it when determining the amount of payload space. */ - bytes_per_frag = frag_size - IEEE80211_3ADDR_LEN; + bytes_per_frag = frag_size - hdr_len; if (ieee->config & (CFG_IEEE80211_COMPUTE_FCS | CFG_IEEE80211_RESERVE_FCS)) bytes_per_frag -= IEEE80211_FCS_LEN; @@ -377,6 +288,8 @@ txb->encrypted = encrypt; txb->payload_size = bytes; + skb_pull(skb, hdr_len); + for (i = 0; i < nr_frags; i++) { skb_frag = txb->fragments[i]; @@ -384,7 +297,7 @@ skb_reserve(skb_frag, crypt->ops->extra_prefix_len); frag_hdr = (struct ieee80211_hdr *)skb_put(skb_frag, hdr_len); - memcpy(frag_hdr, &header, hdr_len); + memcpy(frag_hdr, header, hdr_len); /* If this is not the last fragment, then add the MOREFRAGS * bit to the frame control */ @@ -397,14 +310,6 @@ bytes = bytes_last_frag; } - /* Put a SNAP header on the first fragment */ - if (i == 0) { - ieee80211_put_snap( - skb_put(skb_frag, SNAP_SIZE + sizeof(u16)), - ether_type); - bytes -= SNAP_SIZE + sizeof(u16); - } - memcpy(skb_put(skb_frag, bytes), skb->data, bytes); /* Advance the SKB... */ Index: netdev/net/ieee80211/ieee80211_wx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_wx.c 2005-06-03 13:20:46.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_wx.c 2005-06-03 13:21:00.000000000 +0200 @@ -53,7 +53,7 @@ /* First entry *MUST* be the AP MAC address */ iwe.cmd = SIOCGIWAP; iwe.u.ap_addr.sa_family = ARPHRD_ETHER; - memcpy(iwe.u.ap_addr.sa_data, network->bssid, ETH_ALEN); + memcpy(iwe.u.ap_addr.sa_data, network->bssid, IEEE80211_ALEN); start = iwe_stream_add_event(start, stop, &iwe, IW_EV_ADDR_LEN); /* Remaining entries will be displayed in the order we provide them */ Index: netdev/net/ieee80211/ieee80211_crypt_ccmp.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_crypt_ccmp.c 2005-06-01 11:05:14.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_crypt_ccmp.c 2005-06-03 13:21:00.000000000 +0200 @@ -17,7 +17,6 @@ #include #include #include -#include #include #include #include @@ -156,7 +155,7 @@ * Dlen */ b0[0] = 0x59; b0[1] = qc; - memcpy(b0 + 2, hdr->addr2, ETH_ALEN); + memcpy(b0 + 2, hdr->addr2, IEEE80211_ALEN); memcpy(b0 + 8, pn, CCMP_PN_LEN); b0[14] = (dlen >> 8) & 0xff; b0[15] = dlen & 0xff; @@ -173,13 +172,13 @@ aad[1] = aad_len & 0xff; aad[2] = pos[0] & 0x8f; aad[3] = pos[1] & 0xc7; - memcpy(aad + 4, hdr->addr1, 3 * ETH_ALEN); + memcpy(aad + 4, hdr->addr1, 3 * IEEE80211_ALEN); pos = (u8 *) &hdr->seq_ctl; aad[22] = pos[0] & 0x0f; aad[23] = 0; /* all bits masked */ memset(aad + 24, 0, 8); if (a4_included) - memcpy(aad + 24, hdr->addr4, ETH_ALEN); + memcpy(aad + 24, hdr->addr4, IEEE80211_ALEN); if (qc_included) { aad[a4_included ? 30 : 24] = qc; /* rest of QC masked */ Index: netdev/net/ieee80211/ieee80211_crypt_tkip.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_crypt_tkip.c 2005-06-01 11:05:14.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_crypt_tkip.c 2005-06-03 13:21:00.000000000 +0200 @@ -17,7 +17,6 @@ #include #include #include -#include #include #include @@ -461,20 +460,20 @@ switch (le16_to_cpu(hdr11->frame_ctl) & (IEEE80211_FCTL_FROMDS | IEEE80211_FCTL_TODS)) { case IEEE80211_FCTL_TODS: - memcpy(hdr, hdr11->addr3, ETH_ALEN); /* DA */ - memcpy(hdr + ETH_ALEN, hdr11->addr2, ETH_ALEN); /* SA */ + memcpy(hdr, hdr11->addr3, IEEE80211_ALEN); /* DA */ + memcpy(hdr + IEEE80211_ALEN, hdr11->addr2, IEEE80211_ALEN); /* SA */ break; case IEEE80211_FCTL_FROMDS: - memcpy(hdr, hdr11->addr1, ETH_ALEN); /* DA */ - memcpy(hdr + ETH_ALEN, hdr11->addr3, ETH_ALEN); /* SA */ + memcpy(hdr, hdr11->addr1, IEEE80211_ALEN); /* DA */ + memcpy(hdr + IEEE80211_ALEN, hdr11->addr3, IEEE80211_ALEN); /* SA */ break; case IEEE80211_FCTL_FROMDS | IEEE80211_FCTL_TODS: - memcpy(hdr, hdr11->addr3, ETH_ALEN); /* DA */ - memcpy(hdr + ETH_ALEN, hdr11->addr4, ETH_ALEN); /* SA */ + memcpy(hdr, hdr11->addr3, IEEE80211_ALEN); /* DA */ + memcpy(hdr + IEEE80211_ALEN, hdr11->addr4, IEEE80211_ALEN); /* SA */ break; case 0: - memcpy(hdr, hdr11->addr1, ETH_ALEN); /* DA */ - memcpy(hdr + ETH_ALEN, hdr11->addr2, ETH_ALEN); /* SA */ + memcpy(hdr, hdr11->addr1, IEEE80211_ALEN); /* DA */ + memcpy(hdr + IEEE80211_ALEN, hdr11->addr2, IEEE80211_ALEN); /* SA */ break; } @@ -521,7 +520,7 @@ else ev.flags |= IW_MICFAILURE_PAIRWISE; ev.src_addr.sa_family = ARPHRD_ETHER; - memcpy(ev.src_addr.sa_data, hdr->addr2, ETH_ALEN); + memcpy(ev.src_addr.sa_data, hdr->addr2, IEEE80211_ALEN); memset(&wrqu, 0, sizeof(wrqu)); wrqu.data.length = sizeof(ev); wireless_send_event(dev, IWEVMICHAELMICFAILURE, &wrqu, (char *) &ev); Index: netdev/net/ieee80211/Makefile =================================================================== --- netdev.orig/net/ieee80211/Makefile 2005-06-01 11:05:14.000000000 +0200 +++ netdev/net/ieee80211/Makefile 2005-06-03 13:21:00.000000000 +0200 @@ -5,6 +5,7 @@ obj-$(CONFIG_IEEE80211_CRYPT_TKIP) += ieee80211_crypt_tkip.o ieee80211-objs := \ ieee80211_module.o \ + ieee80211_proto.o \ ieee80211_tx.o \ ieee80211_rx.o \ ieee80211_wx.o Index: netdev/net/ieee80211/ieee80211_proto.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ netdev/net/ieee80211/ieee80211_proto.c 2005-06-03 13:21:00.000000000 +0200 @@ -0,0 +1,239 @@ +/******************************************************************************* + + Copyright (c) 2005 Jiri Benc and Jirka Bohac + Copyright (c) 2004 Intel Corporation. All rights reserved. + (Contact: James P. Ketrenos ) + + Sponsored by SuSE. + + This program is free software; you can redistribute it and/or modify it + under the terms of version 2 of the GNU General Public License as + published by the Free Software Foundation. + + This program is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + You should have received a copy of the GNU General Public License along with + this program; if not, write to the Free Software Foundation, Inc., 59 + Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +*******************************************************************************/ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +static int ieee80211_change_mtu(struct net_device *dev, int new_mtu) +{ + if ((new_mtu < 68) || (new_mtu > IEEE80211_DATA_LEN - 8 - SNAP_SIZE)) + return -EINVAL; + dev->mtu = new_mtu; + return 0; +} + + +static u8 P802_1H_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0xf8 }; +static u8 RFC1042_OUI[P80211_OUI_LEN] = { 0x00, 0x00, 0x00 }; + +static inline int __ieee80211_put_snap(u8 *data, u16 h_proto) +{ + struct ieee80211_snap_hdr *snap; + u8 *oui; + + snap = (struct ieee80211_snap_hdr *)data; + snap->dsap = 0xaa; + snap->ssap = 0xaa; + snap->ctrl = 0x03; + + if (h_proto == __constant_htons(ETH_P_IPX) || + h_proto == __constant_htons(ETH_P_AARP)) + oui = P802_1H_OUI; + else + oui = RFC1042_OUI; + snap->oui[0] = oui[0]; + snap->oui[1] = oui[1]; + snap->oui[2] = oui[2]; + + snap->type = h_proto; + + return SNAP_SIZE; +} + +static inline int ieee80211_put_snap(u8 *data, u16 h_proto) +{ + return __ieee80211_put_snap(data, htons(h_proto)); +} + +/* + * Create the IEEE 802.11 MAC header for an arbitrary protocol layer + * + * saddr=NULL means use device source address + * daddr=NULL means leave destination address (eg unresolved arp) + */ +static int ieee80211_header(struct sk_buff *skb, struct net_device *dev, + unsigned short type, void *daddr, void *saddr, unsigned len) +{ + struct ieee80211_device *ieee = netdev_priv(dev); + struct ieee80211_hdr *header; + int fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA; + int hdr_len = IEEE80211_3ADDR_LEN; + + if (type != ETH_P_802_3 && type != ETH_P_802_2) { + ieee80211_put_snap(skb_push(skb, SNAP_SIZE), type); + hdr_len += SNAP_SIZE; + } + + if (!saddr) saddr = dev->dev_addr; + header = (struct ieee80211_hdr *)skb_push(skb, IEEE80211_3ADDR_LEN); + header->duration_id = header->seq_ctl = 0; + if (ieee->iw_mode == IW_MODE_INFRA) { + fc |= IEEE80211_FCTL_TODS; + /* To DS: Addr1 = BSSID, Addr2 = SA, + Addr3 = DA */ + memcpy(header->addr1, ieee->bssid, IEEE80211_ALEN); + memcpy(header->addr2, saddr, IEEE80211_ALEN); + if (daddr) + memcpy(header->addr3, daddr, IEEE80211_ALEN); + else + memset(header->addr3, 0, IEEE80211_ALEN); + } else if (ieee->iw_mode == IW_MODE_ADHOC) { + /* not From/To DS: Addr1 = DA, Addr2 = SA, + Addr3 = BSSID */ + if (daddr) + memcpy(header->addr1, daddr, IEEE80211_ALEN); + else + memset(header->addr1, 0, IEEE80211_ALEN); + memcpy(header->addr2, saddr, IEEE80211_ALEN); + memcpy(header->addr3, ieee->bssid, IEEE80211_ALEN); + } + header->frame_ctl = cpu_to_le16(fc); + + if (!daddr || (dev->flags & (IFF_LOOPBACK | IFF_NOARP))) + return -hdr_len; + return hdr_len; +} + +static int ieee80211_rebuild_header(struct sk_buff *skb) +{ + struct ieee80211_hdr *header = (struct ieee80211_hdr *)skb->data; + struct net_device *dev = skb->dev; + unsigned short type; + + type = ieee80211_get_proto(header); + + switch (type) { +#ifdef CONFIG_INET + case ETH_P_IP: + return arp_find(IEEE80211_GET_DADDR(header), skb); +#endif + default: + printk(KERN_DEBUG + "%s: unable to resolve type %X addresses.\n", + dev->name, type); + break; + } + + return 0; +} + +static int ieee80211_mac_addr(struct net_device *dev, void *p) +{ + struct sockaddr *addr = p; + + if (netif_running(dev)) + return -EBUSY; + memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); + return 0; +} + +static int ieee80211_header_cache(struct neighbour *neigh, struct hh_cache *hh) +{ + struct net_device *dev = neigh->dev; + struct ieee80211_device *ieee = netdev_priv(dev); + unsigned short type = hh->hh_type; + struct ieee80211_hdr *header; + int fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA; + + if (type == __constant_htons(ETH_P_802_3) || + type == __constant_htons(ETH_P_802_2)) + return -1; + + header = (struct ieee80211_hdr *) + (((u8 *)hh->hh_data) + + (HH_DATA_OFF(IEEE80211_3ADDR_LEN + SNAP_SIZE))); + __ieee80211_put_snap((u8 *)header + IEEE80211_3ADDR_LEN, type); + + header->duration_id = header->seq_ctl = 0; + if (ieee->iw_mode == IW_MODE_INFRA) { + fc |= IEEE80211_FCTL_TODS; + /* To DS: Addr1 = BSSID, Addr2 = SA, + Addr3 = DA */ + memcpy(header->addr1, ieee->bssid, IEEE80211_ALEN); + memcpy(header->addr2, dev->dev_addr, IEEE80211_ALEN); + memcpy(header->addr3, neigh->ha, IEEE80211_ALEN); + } else if (ieee->iw_mode == IW_MODE_ADHOC) { + /* not From/To DS: Addr1 = DA, Addr2 = SA, + Addr3 = BSSID */ + memcpy(header->addr1, neigh->ha, IEEE80211_ALEN); + memcpy(header->addr2, dev->dev_addr, IEEE80211_ALEN); + memcpy(header->addr3, ieee->bssid, IEEE80211_ALEN); + } + header->frame_ctl = cpu_to_le16(fc); + + hh->hh_len = IEEE80211_3ADDR_LEN + SNAP_SIZE; + return 0; +} + +static void ieee80211_header_cache_update(struct hh_cache *hh, + struct net_device *dev, unsigned char *haddr) +{ + struct ieee80211_hdr *header; + + header = (struct ieee80211_hdr *) + (((u8 *)hh->hh_data) + + (HH_DATA_OFF(IEEE80211_3ADDR_LEN + SNAP_SIZE))); + memcpy(IEEE80211_GET_DADDR(header), haddr, dev->addr_len); +} + +static int ieee80211_header_parse(struct sk_buff *skb, unsigned char *haddr) +{ + struct ieee80211_hdr *header = (struct ieee80211_hdr *)skb->data; + + memcpy(haddr, IEEE80211_GET_SADDR(header), IEEE80211_ALEN); + return IEEE80211_ALEN; +} + + +void ieee80211_setup(struct net_device *dev) +{ + dev->change_mtu = ieee80211_change_mtu; + dev->hard_header = ieee80211_header; + dev->rebuild_header = ieee80211_rebuild_header; + dev->set_mac_address = ieee80211_mac_addr; + dev->hard_header_cache = ieee80211_header_cache; + dev->header_cache_update = ieee80211_header_cache_update; + dev->hard_header_parse = ieee80211_header_parse; + + dev->hard_start_xmit = ieee80211_xmit; + + dev->type = ARPHRD_ETHER; + dev->hard_header_len = IEEE80211_3ADDR_LEN + SNAP_SIZE; + dev->mtu = IEEE80211_DATA_LEN - 8 - SNAP_SIZE; + dev->addr_len = IEEE80211_ALEN; + dev->tx_queue_len = 1000; + dev->flags = IFF_BROADCAST | IFF_MULTICAST; + + memset(dev->broadcast, 0xFF, IEEE80211_ALEN); +} + + +EXPORT_SYMBOL(ieee80211_setup); -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:36:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:36:29 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GaPXq002608 for ; Fri, 3 Jun 2005 09:36:25 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 052A06282FC; Fri, 3 Jun 2005 18:35:27 +0200 (CEST) Date: Fri, 3 Jun 2005 18:35:26 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [7/9] ipw: fix after "ieee80211: ethernet independency" Message-ID: <20050603183526.0effd2b0@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2033 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 1866 Lines: 59 Fixes ipw2200 after making the ieee80211 layer independent of ethernet. Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/drivers/net/wireless/ipw2200.c =================================================================== --- netdev.orig/drivers/net/wireless/ipw2200.c 2005-05-31 18:25:53.000000000 +0200 +++ netdev/drivers/net/wireless/ipw2200.c 2005-05-31 18:32:18.000000000 +0200 @@ -4920,8 +4920,8 @@ ETH_ALEN) || !memcmp(header->addr3, priv->bssid, ETH_ALEN) || - is_broadcast_ether_addr(header->addr1) || - is_multicast_ether_addr(header->addr1); + is_broadcast_ieee80211_addr(header->addr1) || + is_multicast_ieee80211_addr(header->addr1); break; case IW_MODE_INFRA: @@ -4932,8 +4932,8 @@ !memcmp(header->addr1, priv->net_dev->dev_addr, ETH_ALEN) || - is_broadcast_ether_addr(header->addr1) || - is_multicast_ether_addr(header->addr1); + is_broadcast_ieee80211_addr(header->addr1) || + is_multicast_ieee80211_addr(header->addr1); break; } @@ -6285,8 +6285,8 @@ switch (priv->ieee->iw_mode) { case IW_MODE_ADHOC: hdr_len = IEEE80211_3ADDR_LEN; - unicast = !is_broadcast_ether_addr(hdr->addr1) && - !is_multicast_ether_addr(hdr->addr1); + unicast = !is_broadcast_ieee80211_addr(hdr->addr1) && + !is_multicast_ieee80211_addr(hdr->addr1); id = ipw_find_station(priv, hdr->addr1); if (id == IPW_INVALID_STATION) { id = ipw_add_station(priv, hdr->addr1); @@ -6301,8 +6301,8 @@ case IW_MODE_INFRA: default: - unicast = !is_broadcast_ether_addr(hdr->addr3) && - !is_multicast_ether_addr(hdr->addr3); + unicast = !is_broadcast_ieee80211_addr(hdr->addr3) && + !is_multicast_ieee80211_addr(hdr->addr3); hdr_len = IEEE80211_3ADDR_LEN; id = 0; break; -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:37:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:37:21 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GbGXq003065 for ; Fri, 3 Jun 2005 09:37:16 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 022006282FC; Fri, 3 Jun 2005 18:36:18 +0200 (CEST) Date: Fri, 3 Jun 2005 18:36:17 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [8/9] ieee80211: add sequence numbers Message-ID: <20050603183617.7903c5a0@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2034 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 2286 Lines: 72 Adds sequence numbers to IEEE 802.11 headers. Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/include/net/ieee80211.h =================================================================== --- netdev.orig/include/net/ieee80211.h 2005-06-03 13:21:00.000000000 +0200 +++ netdev/include/net/ieee80211.h 2005-06-03 13:21:06.000000000 +0200 @@ -711,6 +711,8 @@ unsigned int frag_next_idx; u16 fts; /* Fragmentation Threshold */ + u16 seq_number; /* sequence number in transmitted frames */ + /* Association info */ u8 bssid[IEEE80211_ALEN]; Index: netdev/net/ieee80211/ieee80211_module.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_module.c 2005-06-03 13:21:00.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_module.c 2005-06-03 13:21:06.000000000 +0200 @@ -128,6 +128,7 @@ /* Default fragmentation threshold is maximum payload size */ ieee->fts = DEFAULT_FTS; + ieee->seq_number = 0; ieee->scan_age = DEFAULT_MAX_SCAN_AGE; ieee->open_wep = 1; Index: netdev/net/ieee80211/ieee80211_tx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_tx.c 2005-06-03 13:21:00.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_tx.c 2005-06-03 13:21:06.000000000 +0200 @@ -276,6 +276,13 @@ else bytes_last_frag = bytes_per_frag; + if (nr_frags > 16) { + /* Should never happen */ + printk(KERN_WARNING "%s: Fragmentation threshold too low\n", + dev->name); + goto failed; + } + /* When we allocate the TXB we allocate enough space for the reserve * and full fragment bytes (bytes_per_frag doesn't include prefix, * postfix, header, FCS, etc.) */ @@ -299,6 +306,8 @@ frag_hdr = (struct ieee80211_hdr *)skb_put(skb_frag, hdr_len); memcpy(frag_hdr, header, hdr_len); + frag_hdr->seq_ctl = cpu_to_le16(ieee->seq_number | i); + /* If this is not the last fragment, then add the MOREFRAGS * bit to the frame control */ if (i != nr_frags - 1) { @@ -323,7 +332,7 @@ (CFG_IEEE80211_COMPUTE_FCS | CFG_IEEE80211_RESERVE_FCS)) skb_put(skb_frag, 4); } - + ieee->seq_number += 0x10; success: spin_unlock_irqrestore(&ieee->lock, flags); -- Jiri Benc SUSE Labs From jbenc@suse.cz Fri Jun 3 09:38:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 09:38:30 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53GcPXq003682 for ; Fri, 3 Jun 2005 09:38:25 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id C06036282FC; Fri, 3 Jun 2005 18:37:26 +0200 (CEST) Date: Fri, 3 Jun 2005 18:37:26 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , Jirka Bohac Subject: [9/9] ieee80211: ETH_P_802_11 ethertype Message-ID: <20050603183726.482a91d2@griffin.suse.cz> In-Reply-To: <20050603182625.64d33be3@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2035 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 3775 Lines: 110 Introduced new ETH_P_802_11 ethertype. Fixed ieee80211_type_trans() to return ETH_P_802_11 in case of non-data frame. Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/include/linux/if_ether.h =================================================================== --- netdev.orig/include/linux/if_ether.h 2005-06-01 11:04:59.000000000 +0200 +++ netdev/include/linux/if_ether.h 2005-06-03 13:21:15.000000000 +0200 @@ -92,6 +92,7 @@ #define ETH_P_ECONET 0x0018 /* Acorn Econet */ #define ETH_P_HDLC 0x0019 /* HDLC frames */ #define ETH_P_ARCNET 0x001A /* 1A for ArcNet :-) */ +#define ETH_P_802_11 0x001B /* 802.11 frames */ /* * This is an Ethernet frame header. Index: netdev/include/net/ieee80211.h =================================================================== --- netdev.orig/include/net/ieee80211.h 2005-06-03 13:21:10.000000000 +0200 +++ netdev/include/net/ieee80211.h 2005-06-03 13:21:15.000000000 +0200 @@ -232,10 +232,6 @@ #define ETH_P_PREAUTH 0x88C7 /* IEEE 802.11i pre-authentication */ -#ifndef ETH_P_80211_RAW -#define ETH_P_80211_RAW 0x0003 -#endif - /* IEEE 802.11 defines */ #define P80211_OUI_LEN 3 Index: netdev/net/ieee80211/ieee80211_rx.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_rx.c 2005-06-03 13:21:00.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_rx.c 2005-06-03 13:21:15.000000000 +0200 @@ -46,7 +46,7 @@ skb->mac.raw = skb->data; skb_pull(skb, ieee80211_get_hdrlen(hdr)); skb->pkt_type = PACKET_OTHERHOST; - skb->protocol = __constant_htons(ETH_P_80211_RAW); + skb->protocol = __constant_htons(ETH_P_802_11); memset(skb->cb, 0, sizeof(skb->cb)); netif_rx(skb); } @@ -338,22 +338,33 @@ int hdrlen; u8 *daddr = IEEE80211_GET_DADDR(hdr); unsigned short type; + u16 fc; skb->mac.raw = skb->data; - hdrlen = ieee80211_get_hdrlen(hdr); - snap = (struct ieee80211_snap_hdr *)(skb->data + hdrlen); - if (snap->dsap == 0xaa && snap->ssap == 0xaa && - ((IEEE80211_SNAP_IS_RFC1042(snap) && - snap->type != __constant_htons(ETH_P_AARP) && - snap->type != __constant_htons(ETH_P_IPX)) || - IEEE80211_SNAP_IS_BRIDGE_TUNNEL(snap))) { - type = snap->type; - skb_pull(skb, hdrlen + SNAP_SIZE); + fc = le16_to_cpu(hdr->frame_ctl); + if (WLAN_FC_GET_TYPE(fc) == IEEE80211_FTYPE_DATA && + WLAN_FC_GET_STYPE(fc) == IEEE80211_STYPE_DATA) { + hdrlen = __ieee80211_get_hdrlen(fc); + snap = (struct ieee80211_snap_hdr *)(skb->data + hdrlen); + if (snap->dsap == 0xaa && snap->ssap == 0xaa && + ((IEEE80211_SNAP_IS_RFC1042(snap) && + snap->type != __constant_htons(ETH_P_AARP) && + snap->type != __constant_htons(ETH_P_IPX)) || + IEEE80211_SNAP_IS_BRIDGE_TUNNEL(snap))) { + type = snap->type; + skb_pull(skb, hdrlen + SNAP_SIZE); + } + else { + type = __constant_htons(ETH_P_802_2); + skb_pull(skb, hdrlen); + } } else { - type = __constant_htons(ETH_P_802_2); - skb_pull(skb, hdrlen); + /* If the type isn't data we want to keep the 802.11 header + * in place. + */ + type = __constant_htons(ETH_P_802_11); } skb->input_dev = ieee->dev; Index: netdev/net/ieee80211/ieee80211_proto.c =================================================================== --- netdev.orig/net/ieee80211/ieee80211_proto.c 2005-06-03 13:21:00.000000000 +0200 +++ netdev/net/ieee80211/ieee80211_proto.c 2005-06-03 13:21:15.000000000 +0200 @@ -87,6 +87,8 @@ int fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA; int hdr_len = IEEE80211_3ADDR_LEN; + if (type == ETH_P_802_11) + return 0; if (type != ETH_P_802_3 && type != ETH_P_802_2) { ieee80211_put_snap(skb_push(skb, SNAP_SIZE), type); hdr_len += SNAP_SIZE; -- Jiri Benc SUSE Labs From mitch.a.williams@intel.com Fri Jun 3 10:45:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 10:45:45 -0700 (PDT) Received: from orsfmr003.jf.intel.com (fmr18.intel.com [134.134.136.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53HjeXq009108 for ; Fri, 3 Jun 2005 10:45:41 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j53HhWV5003802; Fri, 3 Jun 2005 17:43:32 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j53HhWSc004792; Fri, 3 Jun 2005 17:43:32 GMT Received: from mawilli1-desk2.amr.corp.intel.com (mawilli1-desk2.amr.corp.intel.com [134.134.3.124]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j53HhWSL028048; Fri, 3 Jun 2005 10:43:32 -0700 Date: Fri, 3 Jun 2005 10:43:32 -0700 From: Mitch Williams X-X-Sender: mawilli1@mawilli1-desk2.amr.corp.intel.com To: jamal cc: "David S. Miller" , "Ronciak, John" , jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <1117765954.6095.49.camel@localhost.localdomain> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450BFDB@orsmsx408> <20050602.171812.48807872.davem@davemloft.net> <1117765954.6095.49.camel@localhost.localdomain> ReplyTo: "Mitch Williams" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2037 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mitch.a.williams@intel.com Precedence: bulk X-list: netdev Content-Length: 4153 Lines: 88 On Thu, 2 Jun 2005, jamal wrote: > > Heres what i think i saw as a flow of events: > Someone posted a theory that if you happen to reduce the weight > (iirc the reduction was via a shift) then the DRR would give less CPU > time cycle to the driver - Whats the big suprise there? thats DRR design > intent. Well, that was me. Or at least I was the original poster on this thread. But my theory (if you can call it that) really wasn't about CPU time. I spent several weeks in our lab with the somewhat nebulous task of "look at Linux performance". And what I found was, to me, counterintuitive: reducing weight improved performance, sometimes significantly. > > Stephen has a patch which allows people to reduce the weight. > DRR provides fairness. If you have 10 NICs coming at different wire > rates, the weights provide a fairness quota without caring about what > those speeds are. So it doesnt make any sense IMO to have the weight > based on what the NIC speed is. Infact i claim it is _nonsense_. You > dont need to factor speed. And the claim that DRR is not real world > is blasphemous. OK, well, call me a blasphemer (against whom?). I'm not really saying that the DRR algorithm is not real-world, but rather that NAPI as currently implemented has some significant performance limitations. In my mind, there are two major problems with NAPI as it stands today. First, at Gigabit and higher speeds, the default settings don't allow the driver to process received packets in a timely manner. This causes dropped packets due to lack of receive resources. Lowering the weight can fix this, at least in a single-adapter environment. Second, at 10Mbps and 100Mbps, modern processors are just too fast for the network. The NAPI polling loop runs so much quicker than the wire speed that only one or two packets are processed per softirq -- which effectively puts the adapter back in interrupt mode. Because of this, you can easily bog down a very fast box with relatively slow traffic, just due to the massive number of interrupts generated. My original post (and patch) were to address the first issue. By using the shift value on the quota, I effectively lowered the weight for every driver in the system. Stephen sent out a patch that allowed you to adjust each driver's weight individually. My testing has shown that, as expected, you can achieve the same performance gain either way. In a multiple-adapter environment, you need to adjust the weight of all drivers together to fix the dropped packets issue. Lowering the weight on one adapter won't help it if the other interfaces are still taking up a lot of time in their receive loops. My patch gave you one knob to twiddle that would correct this issue. Stephen's patch gave you one knob for each adapter, but now you need to twiddle them all to see any benefit. The second issue currently has no fix. What is needed is a way for the driver to request a delayed poll, possibly based on line speed. If we could wait, say, 8 packet times before polling, we could significantly reduce the number of interrupts the system has to deal with, at the cost of higher latency. We haven't had time to investigate this at all, but the need is clearly present -- we've had customer calls about this issue. > > Having said that: > I have a feeling that issue which is which is being waded around is the > amount that the softirq chews in the CPU (unfortunately a well known > issue) and to some extent the packet flow a specific driver chews > depending on the path it takes. I fiddled with this concept a little bit, but didn't see much performance gain by doing so. But it may be something that we can go back and look at. Either way, I think the netdev community needs to look critically at NAPI, and make some changes. Network performance in 2.6.12-rcWhatever is pretty poor. 2.4.30 beats it handily, and it really shouldn't be that way. > This, however, does not eradicate the need for DRR and is absolutely not > driver specific. Agreed. All of the changes I've experimented with at the NAPI level have affected performance similarly on multiple drivers. -Mitch From john.ronciak@intel.com Fri Jun 3 10:43:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 10:43:21 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53HhCXq008752 for ; Fri, 3 Jun 2005 10:43:12 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j53HepFT003163; Fri, 3 Jun 2005 17:40:51 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j53HeadQ026719; Fri, 3 Jun 2005 17:40:48 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060310404812872 ; Fri, 03 Jun 2005 10:40:48 -0700 Received: from orsmsx408.amr.corp.intel.com ([192.168.65.52]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Fri, 3 Jun 2005 10:40:48 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: RFC: NAPI packet weighting patch Date: Fri, 3 Jun 2005 10:40:47 -0700 Message-ID: <468F3FDA28AA87429AD807992E22D07E0450BFE6@orsmsx408> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: RFC: NAPI packet weighting patch Thread-Index: AcVn0fHLt/WdosjHQo2U4D6fkFIrvwAkDmig From: "Ronciak, John" To: "David S. Miller" Cc: , , , "Williams, Mitch A" , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" X-OriginalArrivalTime: 03 Jun 2005 17:40:48.0018 (UTC) FILETIME=[60830F20:01C56863] X-Scanned-By: MIMEDefang 2.44 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j53HhCXq008752 X-archive-position: 2036 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john.ronciak@intel.com Precedence: bulk X-list: netdev Content-Length: 899 Lines: 23 > What more do you need other than checking the statistics counter? The > drop statistics (the ones we care about) are incremented in real time > by the ->poll() code, so it's not like we have to trigger some > asynchronous event to get a current version of the number. > I think that there is some more confusion here. I'm talking about frames dropped by the Ethernet controller at the hardware level (no descriptor available). This for example is happening now with our driver with the weight set to 64. This is also what started us looking into what was going on with the weight. I don't see how the NAPI code to dynamically adjust the weight could easily get the hardware stats number to know if frames are being dropped or not. Sorry if I caused the confusion here. Mitch is working on a response to Jamal's last mail trying to level set what we are seeing and doing. Cheers, John From Robert.Olsson@data.slu.se Fri Jun 3 11:10:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 11:10:15 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53IA3Xq010457 for ; Fri, 3 Jun 2005 11:10:04 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j53I8mkB020710; Fri, 3 Jun 2005 20:08:48 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 143BBEE3F0; Fri, 3 Jun 2005 20:08:48 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17056.40112.39108.32685@robur.slu.se> Date: Fri, 3 Jun 2005 20:08:48 +0200 To: "Ronciak, John" Cc: "David S. Miller" , , , , "Williams, Mitch A" , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: RE: RFC: NAPI packet weighting patch In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450BFE6@orsmsx408> References: <468F3FDA28AA87429AD807992E22D07E0450BFE6@orsmsx408> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-archive-position: 2038 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Content-Length: 1102 Lines: 25 Ronciak, John writes: > > What more do you need other than checking the statistics counter? The > > drop statistics (the ones we care about) are incremented in real time > > by the ->poll() code, so it's not like we have to trigger some > > asynchronous event to get a current version of the number. > > > > I think that there is some more confusion here. I'm talking about > frames dropped by the Ethernet controller at the hardware level (no > descriptor available). This for example is happening now with our > driver with the weight set to 64. This is also what started us looking > into what was going on with the weight. I don't see how the NAPI code > to dynamically adjust the weight could easily get the hardware stats > number to know if frames are being dropped or not. Sorry if I caused > the confusion here. It's not obvious that weight is to blame for frames dropped. I would look into RX ring size in relation to HW mitigation. And of course if you system is very loaded the RX softirq gives room for other jobs and frames get dropped Cheers. --ro From john.ronciak@intel.com Fri Jun 3 11:21:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 11:21:33 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53ILIXq011580 for ; Fri, 3 Jun 2005 11:21:21 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j53IJ3FT010376; Fri, 3 Jun 2005 18:19:03 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j53IIcdm021983; Fri, 3 Jun 2005 18:19:03 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060311190319676 ; Fri, 03 Jun 2005 11:19:03 -0700 Received: from orsmsx408.amr.corp.intel.com ([192.168.65.52]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Fri, 3 Jun 2005 11:19:03 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: RFC: NAPI packet weighting patch Date: Fri, 3 Jun 2005 11:19:02 -0700 Message-ID: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: RFC: NAPI packet weighting patch Thread-Index: AcVoZ1hMIfRnNfjORXaqb2xRuIo6IAAAQaHw From: "Ronciak, John" To: "Robert Olsson" Cc: "David S. Miller" , , , , "Williams, Mitch A" , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" X-OriginalArrivalTime: 03 Jun 2005 18:19:03.0501 (UTC) FILETIME=[B8B9F7D0:01C56868] X-Scanned-By: MIMEDefang 2.44 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j53ILIXq011580 X-archive-position: 2039 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john.ronciak@intel.com Precedence: bulk X-list: netdev Content-Length: 584 Lines: 16 > It's not obvious that weight is to blame for frames dropped. I would > look into RX ring size in relation to HW mitigation. > And of course if you system is very loaded the RX softirq gives room > for other jobs and frames get dropped > With the same system (fairly high end with nothing major running on it) we got rid of the dropped frames by just reducing the weight for 64. So the weight did have something to do with the dropped frames. Maybe other factors as well, but in static tests like this it sure looks like the 64 value is wrong is some cases. Cheers, John From greearb@candelatech.com Fri Jun 3 11:34:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 11:34:26 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53IYEXq012436 for ; Fri, 3 Jun 2005 11:34:15 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j53J6U5I003158; Fri, 3 Jun 2005 12:06:31 -0700 Message-ID: <42A0A25C.8000503@candelatech.com> Date: Fri, 03 Jun 2005 11:33:00 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Ronciak, John" CC: Robert Olsson , "David S. Miller" , jdmason@us.ibm.com, shemminger@osdl.org, hadi@cyberus.ca, "Williams, Mitch A" , netdev@oss.sgi.com, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch References: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2040 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1320 Lines: 37 Ronciak, John wrote: >> It's not obvious that weight is to blame for frames dropped. I would >> look into RX ring size in relation to HW mitigation. >> And of course if you system is very loaded the RX softirq gives room >> for other jobs and frames get dropped >> > > With the same system (fairly high end with nothing major running on it) > we got rid of the dropped frames by just reducing the weight for 64. So > the weight did have something to do with the dropped frames. Maybe > other factors as well, but in static tests like this it sure looks like > the 64 value is wrong is some cases. Is this implying that having the NAPI poll do less work per poll of the driver actually increases performance? I would have guessed that the opposite would be true. Maybe the poll is disabling the IRQs on the NIC for too long, or something like that? For e1000, are you using larger than the default 256 receive descriptors? I have seen that increasing these descriptors helps decrease drops by a small percentage. Have you tried increasing the netdev-backlog setting to see if that fixes the problem (while leaving the weight at the default)? What packet sizes and speeds are you using for your tests? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@davemloft.net Fri Jun 3 11:39:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 11:39:33 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53IdTXq013121 for ; Fri, 3 Jun 2005 11:39:29 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeH3h-0001to-BE; Fri, 03 Jun 2005 11:38:17 -0700 Date: Fri, 03 Jun 2005 11:38:17 -0700 (PDT) Message-Id: <20050603.113817.74562842.davem@davemloft.net> To: mitch.a.williams@intel.com Cc: hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: References: <20050602.171812.48807872.davem@davemloft.net> <1117765954.6095.49.camel@localhost.localdomain> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2041 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1504 Lines: 33 From: Mitch Williams Date: Fri, 3 Jun 2005 10:43:32 -0700 > In my mind, there are two major problems with NAPI as it stands today. > First, at Gigabit and higher speeds, the default settings don't allow the > driver to process received packets in a timely manner. This causes > dropped packets due to lack of receive resources. Lowering the weight can > fix this, at least in a single-adapter environment. I really don't see how changing the weight can change things in the single adapter case. When we hit the quota, we just loop and process more packets. It doesn't fundamentally change anything about how the NAPI code operates. Please investigate what exactly is happening. I have a few theories. First, is it the case that with a lower weight we drop out of the loop because 'jiffies' advanced one tick? Some simply instrumentation in net/core/dev.c:net_rx_action() would show what's going on. Actually, we keep this statistic via netdev_rx_stat, so just cat /proc/net/softnet_stat to get a look at if "time_squeeze" is being incremented when dev->weight is 64 in your tests. Next, I don't think "budget" in that function is going down to zero, that's set to 300 by default. If the quota is consumed, the device is just added right back to the tail of the poll_list, and if it's the only device active we jump right back into it's ->poll() routine over and over until there is no more pending work in the device or we hit the "jiffies - start_time > 1" test. From hadi@cyberus.ca Fri Jun 3 11:44:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 11:44:03 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53IhxXq013815 for ; Fri, 3 Jun 2005 11:43:59 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DeH8M-0003zL-Lx for netdev@oss.sgi.com; Fri, 03 Jun 2005 14:43:06 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DeH8J-0005ai-51; Fri, 03 Jun 2005 14:43:03 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Mitch Williams Cc: "David S. Miller" , "Ronciak, John" , jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450BFDB@orsmsx408> <20050602.171812.48807872.davem@davemloft.net> <1117765954.6095.49.camel@localhost.localdomain> Content-Type: text/plain Organization: unknown Date: Fri, 03 Jun 2005 14:42:30 -0400 Message-Id: <1117824150.6071.34.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2042 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 5233 Lines: 124 On Fri, 2005-03-06 at 10:43 -0700, Mitch Williams wrote: > > On Thu, 2 Jun 2005, jamal wrote: > > > > Heres what i think i saw as a flow of events: > > Someone posted a theory that if you happen to reduce the weight > > (iirc the reduction was via a shift) then the DRR would give less CPU > > time cycle to the driver - Whats the big suprise there? thats DRR design > > intent. > > Well, that was me. Or at least I was the original poster on this thread. > But my theory (if you can call it that) really wasn't about CPU time. I > spent several weeks in our lab with the somewhat nebulous task of "look at > Linux performance". And what I found was, to me, counterintuitive: > reducing weight improved performance, sometimes significantly. > When you reduce the weight, the system is spending less time in the softirq processing packets before softirq yields. If this gives more opportunity to your app to run, then the performance will go up. Is this what you are seeing? > OK, well, call me a blasphemer (against whom?). > I'm not really saying > that the DRR algorithm is not real-world, but rather that NAPI as > currently implemented has some significant performance limitations. > And we need to be fair and investigate why. > In my mind, there are two major problems with NAPI as it stands today. > First, at Gigabit and higher speeds, the default settings don't allow the > driver to process received packets in a timely manner. What do you mean by timely? > This causes > dropped packets due to lack of receive resources. Lowering the weight can > fix this, at least in a single-adapter environment. > If your know your workload you could tune the weight. Additionaly you could tune the softirq using nice. > Second, at 10Mbps and 100Mbps, modern processors are just too fast for the > network. The NAPI polling loop runs so much quicker than the wire speed > that only one or two packets are processed per softirq -- which > effectively puts the adapter back in interrupt mode. Because of this, you > can easily bog down a very fast box with relatively slow traffic, just due > to the massive number of interrupts generated. > Massive is an overstatement. The issue is really IO. If you process one packet in each interupt then NAPI does add extra IO costs at "low" traffic levels. Note that this is also a known issue - reference the threads from waay back from people like Manfred Spraul and recently from the SGI folks. IO unfortunately hasnt kept up with CPU speeds; hardware vendors such as your company have been busy making processors faster but forgetting about IO and RAM latencies. PCI-E seems promising from what i have heard, interim PCI-E bridging to PCI-X is form what i have heard on its IO performance worse. > My original post (and patch) were to address the first issue. By using > the shift value on the quota, I effectively lowered the weight for every > driver in the system. Stephen sent out a patch that allowed you to > adjust each driver's weight individually. My testing has shown that, as > expected, you can achieve the same performance gain either way. > Ok, glad to hear thats resolved. > In a multiple-adapter environment, you need to adjust the weight of all > drivers together to fix the dropped packets issue. Lowering the weight on > one adapter won't help it if the other interfaces are still taking up a > lot of time in their receive loops. > > My patch gave you one knob to twiddle that would correct this issue. > Stephen's patch gave you one knob for each adapter, but now you need to > twiddle them all to see any benefit. > makes sense > The second issue currently has no fix. What is needed is a way for the > driver to request a delayed poll, possibly based on line speed. If we > could wait, say, 8 packet times before polling, we could significantly > reduce the number of interrupts the system has to deal with, at the cost > of higher latency. We haven't had time to investigate this at all, but > the need is clearly present -- we've had customer calls about this issue. > I can believe you (note it has to do with IO costs though) having seen how horrific MMIO numbers are on faster processors. Talk to Jesse, he has seen a little program from Lennert/Robert/Harald that does MMIO measurements. It seems the trend is that as CPUs get faster, IO gets more expensive in both cpu cycles as well as absolute time. The solution to this issue is to be found in mitigation at the moment in conjunction with NAPI. The SGI folks have made some real progress with recent patches from Davem and Michael Chan on tg3. I have been experimenting with some patches but they introduce unacceptable jitter in latency. So lets summarize it this way: This is something that needs to be resolved - but whatever solution needs to be generic. > Either way, I think the netdev community needs to look critically at NAPI, > and make some changes. I think what you call as the second issue needs a solution. Mitigation is the only generic solution at the moment. > Network performance in 2.6.12-rcWhatever is > pretty poor. 2.4.30 beats it handily, and it really shouldn't be that > way. > Are you using NAPI as well on 2.4.30? cheers, jamal From davem@davemloft.net Fri Jun 3 11:51:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 11:51:07 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53Ip3Xq014729 for ; Fri, 3 Jun 2005 11:51:03 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeHEs-0001vW-KJ; Fri, 03 Jun 2005 11:49:50 -0700 Date: Fri, 03 Jun 2005 11:49:50 -0700 (PDT) Message-Id: <20050603.114950.119242486.davem@davemloft.net> To: greearb@candelatech.com Cc: john.ronciak@intel.com, Robert.Olsson@data.slu.se, jdmason@us.ibm.com, shemminger@osdl.org, hadi@cyberus.ca, mitch.a.williams@intel.com, netdev@oss.sgi.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <42A0A25C.8000503@candelatech.com> References: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> <42A0A25C.8000503@candelatech.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2044 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1080 Lines: 28 From: Ben Greear Date: Fri, 03 Jun 2005 11:33:00 -0700 > Is this implying that having the NAPI poll do less work per poll > of the driver actually increases performance? I would have guessed that > the opposite would be true. Exactly my thoughts as well :) > Maybe the poll is disabling the IRQs on the NIC for too long, or something > like that? In a reply I just sent out to this thread, I postulate that the jiffies check is hitting earlier with a lower weight value, a quick look at /proc/net/softnet_stat during their testing will confirm or deny this theory. It could also just be a simple bug in the dev->quota accounting somewhere. Note that, in all of this, I do not have any objections to providing a way to configure the dev->weight values. I will be applying Stephen Hemminger's patches. But I think we MUST find out the reason for the observed behavior, especially in the single-adapter case since the result is so illogical. We could find an important bug in the NAPI implementation, or learn something new about how NAPI behaves. From fubar@us.ibm.com Fri Jun 3 11:50:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 11:50:15 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53Io8Xq014546 for ; Fri, 3 Jun 2005 11:50:08 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j53In9MK501054 for ; Fri, 3 Jun 2005 14:49:09 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j53In96g177088 for ; Fri, 3 Jun 2005 12:49:09 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j53In8aT012300 for ; Fri, 3 Jun 2005 12:49:09 -0600 Received: from death.nxdomain.ibm.com (lig32-225-151-29.us.lig-dial.ibm.com [32.225.151.29]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j53Imu67011588; Fri, 3 Jun 2005 12:48:57 -0600 Received: from death.nxdomain.ibm.com (localhost [127.0.0.1]) by death.nxdomain.ibm.com (8.12.8/8.12.8) with ESMTP id j53ImVse031367; Fri, 3 Jun 2005 11:48:51 -0700 Received: from death (fubar@localhost) by death.nxdomain.ibm.com (8.12.8/8.12.8/Submit) with ESMTP id j53ImAwZ031354; Fri, 3 Jun 2005 11:48:30 -0700 Message-Id: <200506031848.j53ImAwZ031354@death.nxdomain.ibm.com> To: netdev@oss.sgi.com, bonding-devel@lists.sourceforge.net Subject: [PATCH 2.6.12-rc5] bonding: documentation update X-Mailer: MH-E 7.83; nmh 1.0.4; GNU Emacs 21.3.1 Date: Fri, 03 Jun 2005 11:48:09 -0700 From: Jay Vosburgh X-archive-position: 2043 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fubar@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 57102 Lines: 1266 Documentation update: added some more configuration info, (hopefully) better examples, updated some out of date info, and a bonus pass through ispell to banish the "paramters." -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com Signed-off-by: Jay Vosburgh diff -ur linux-2.6.12-rc5/Documentation/networking/bonding.txt linux-doc/Documentation/networking/bonding.txt --- linux-2.6.12-rc5/Documentation/networking/bonding.txt 2005-06-03 11:29:04.394823672 -0700 +++ linux-doc/Documentation/networking/bonding.txt 2005-06-03 11:29:41.143237064 -0700 @@ -1,5 +1,7 @@ - Linux Ethernet Bonding Driver HOWTO + Linux Ethernet Bonding Driver HOWTO + + Latest update: 2 June 2005 Initial release : Thomas Davis Corrections, HA extensions : 2000/10/03-15 : @@ -11,15 +13,22 @@ Reorganized and updated Feb 2005 by Jay Vosburgh -Note : ------- +Introduction +============ + + The Linux bonding driver provides a method for aggregating +multiple network interfaces into a single logical "bonded" interface. +The behavior of the bonded interfaces depends upon the mode; generally +speaking, modes provide either hot standby or load balancing services. +Additionally, link integrity monitoring may be performed. -The bonding driver originally came from Donald Becker's beowulf patches for -kernel 2.0. It has changed quite a bit since, and the original tools from -extreme-linux and beowulf sites will not work with this version of the driver. + The bonding driver originally came from Donald Becker's +beowulf patches for kernel 2.0. It has changed quite a bit since, and +the original tools from extreme-linux and beowulf sites will not work +with this version of the driver. -For new versions of the driver, patches for older kernels and the updated -userspace tools, please follow the links at the end of this file. + For new versions of the driver, updated userspace tools, and +who to ask for help, please follow the links at the end of this file. Table of Contents ================= @@ -30,9 +39,13 @@ 3. Configuring Bonding Devices 3.1 Configuration with sysconfig support +3.1.1 Using DHCP with sysconfig +3.1.2 Configuring Multiple Bonds with sysconfig 3.2 Configuration with initscripts support +3.2.1 Using DHCP with initscripts +3.2.2 Configuring Multiple Bonds with initscripts 3.3 Configuring Bonding Manually -3.4 Configuring Multiple Bonds +3.3.1 Configuring Multiple Bonds Manually 5. Querying Bonding Configuration 5.1 Bonding Configuration @@ -56,21 +69,28 @@ 11. Promiscuous mode -12. High Availability Information +12. Configuring Bonding for High Availability 12.1 High Availability in a Single Switch Topology -12.1.1 Bonding Mode Selection for Single Switch Topology -12.1.2 Link Monitoring for Single Switch Topology 12.2 High Availability in a Multiple Switch Topology -12.2.1 Bonding Mode Selection for Multiple Switch Topology -12.2.2 Link Monitoring for Multiple Switch Topology -12.3 Switch Behavior Issues for High Availability +12.2.1 HA Bonding Mode Selection for Multiple Switch Topology +12.2.2 HA Link Monitoring for Multiple Switch Topology + +13. Configuring Bonding for Maximum Throughput +13.1 Maximum Throughput in a Single Switch Topology +13.1.1 MT Bonding Mode Selection for Single Switch Topology +13.1.2 MT Link Monitoring for Single Switch Topology +13.2 Maximum Throughput in a Multiple Switch Topology +13.2.1 MT Bonding Mode Selection for Multiple Switch Topology +13.2.2 MT Link Monitoring for Multiple Switch Topology -13. Hardware Specific Considerations -13.1 IBM BladeCenter +14. Switch Behavior Issues -14. Frequently Asked Questions +15. Hardware Specific Considerations +15.1 IBM BladeCenter -15. Resources and Links +16. Frequently Asked Questions + +17. Resources and Links 1. Bonding Driver Installation @@ -86,16 +106,10 @@ 1.1 Configure and build the kernel with bonding ----------------------------------------------- - The latest version of the bonding driver is available in the + The current version of the bonding driver is available in the drivers/net/bonding subdirectory of the most recent kernel source -(which is available on http://kernel.org). - - Prior to the 2.4.11 kernel, the bonding driver was maintained -largely outside the kernel tree; patches for some earlier kernels are -available on the bonding sourceforge site, although those patches are -still several years out of date. Most users will want to use either -the most recent kernel from kernel.org or whatever kernel came with -their distro. +(which is available on http://kernel.org). Most users "rolling their +own" will want to use the most recent kernel from kernel.org. Configure kernel with "make menuconfig" (or "make xconfig" or "make config"), then select "Bonding driver support" in the "Network @@ -103,8 +117,8 @@ driver as module since it is currently the only way to pass parameters to the driver or configure more than one bonding device. - Build and install the new kernel and modules, then proceed to -step 2. + Build and install the new kernel and modules, then continue +below to install ifenslave. 1.2 Install ifenslave Control Utility ------------------------------------- @@ -147,9 +161,9 @@ Options for the bonding driver are supplied as parameters to the bonding module at load time. They may be given as command line arguments to the insmod or modprobe command, but are usually specified -in either the /etc/modprobe.conf configuration file, or in a -distro-specific configuration file (some of which are detailed in the -next section). +in either the /etc/modules.conf or /etc/modprobe.conf configuration +file, or in a distro-specific configuration file (some of which are +detailed in the next section). The available bonding driver parameters are listed below. If a parameter is not specified the default value is used. When initially @@ -162,34 +176,34 @@ support at least miimon, so there is really no reason not to use it. Options with textual values will accept either the text name - or, for backwards compatibility, the option value. E.g., - "mode=802.3ad" and "mode=4" set the same mode. +or, for backwards compatibility, the option value. E.g., +"mode=802.3ad" and "mode=4" set the same mode. The parameters are as follows: arp_interval - Specifies the ARP monitoring frequency in milli-seconds. If - ARP monitoring is used in a load-balancing mode (mode 0 or 2), - the switch should be configured in a mode that evenly - distributes packets across all links - such as round-robin. If - the switch is configured to distribute the packets in an XOR + Specifies the ARP link monitoring frequency in milliseconds. + If ARP monitoring is used in an etherchannel compatible mode + (modes 0 and 2), the switch should be configured in a mode + that evenly distributes packets across all links. If the + switch is configured to distribute the packets in an XOR fashion, all replies from the ARP targets will be received on the same link which could cause the other team members to - fail. ARP monitoring should not be used in conjunction with - miimon. A value of 0 disables ARP monitoring. The default + fail. ARP monitoring should not be used in conjunction with + miimon. A value of 0 disables ARP monitoring. The default value is 0. arp_ip_target - Specifies the ip addresses to use when arp_interval is > 0. - These are the targets of the ARP request sent to determine the - health of the link to the targets. Specify these values in - ddd.ddd.ddd.ddd format. Multiple ip adresses must be - seperated by a comma. At least one IP address must be given - for ARP monitoring to function. The maximum number of targets - that can be specified is 16. The default value is no IP - addresses. + Specifies the IP addresses to use as ARP monitoring peers when + arp_interval is > 0. These are the targets of the ARP request + sent to determine the health of the link to the targets. + Specify these values in ddd.ddd.ddd.ddd format. Multiple IP + addresses must be separated by a comma. At least one IP + address must be given for ARP monitoring to function. The + maximum number of targets that can be specified is 16. The + default value is no IP addresses. downdelay @@ -207,11 +221,13 @@ are: slow or 0 - Request partner to transmit LACPDUs every 30 seconds (default) + Request partner to transmit LACPDUs every 30 seconds fast or 1 Request partner to transmit LACPDUs every 1 second + The default is slow. + max_bonds Specifies the number of bonding devices to create for this @@ -221,10 +237,11 @@ miimon - Specifies the frequency in milli-seconds that MII link - monitoring will occur. A value of zero disables MII link - monitoring. A value of 100 is a good starting point. The - use_carrier option, below, affects how the link state is + Specifies the MII link monitoring frequency in milliseconds. + This determines how often the link state of each slave is + inspected for link failures. A value of zero disables MII + link monitoring. A value of 100 is a good starting point. + The use_carrier option, below, affects how the link state is determined. See the High Availability section for additional information. The default value is 0. @@ -270,7 +287,7 @@ duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification. - Pre-requisites: + Prerequisites: 1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave. @@ -333,7 +350,7 @@ When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all - active slaves in the bond by intiating ARP Replies + active slaves in the bond by initiating ARP Replies with the selected mac address to each of the clients. The updelay parameter (detailed below) must be set to a value equal or greater than the switch's @@ -448,8 +465,9 @@ slave devices. On SLES 9, this is most easily done by running the yast2 sysconfig configuration utility. The goal is for to create an ifcfg-id file for each slave device. The simplest way to accomplish -this is to configure the devices for DHCP. The name of the -configuration file for each device will be of the form: +this is to configure the devices for DHCP (this is only to get the +file ifcfg-id file created; see below for some issues with DHCP). The +name of the configuration file for each device will be of the form: ifcfg-id-xx:xx:xx:xx:xx:xx @@ -459,7 +477,7 @@ Once the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been created, it is necessary to edit the configuration files for the slave devices (the MAC addresses correspond to those of the slave devices). -Before editing, the file will contain muliple lines, and will look +Before editing, the file will contain multiple lines, and will look something like this: BOOTPROTO='dhcp' @@ -501,11 +519,6 @@ Replace the sample BROADCAST, IPADDR, NETMASK and NETWORK values with the appropriate values for your network. - Note that configuring the bonding device with BOOTPROTO='dhcp' -does not work; the scripts attempt to obtain the device address from -DHCP prior to adding any of the slave devices. Without active slaves, -the DHCP requests are not sent to the network. - The STARTMODE specifies when the device is brought online. The possible values are: @@ -544,7 +557,7 @@ Note that the network control script (/sbin/ifdown) will remove the bonding module as part of the network shutdown processing, so it is not necessary to remove the module by hand if, e.g., the -module paramters have changed. +module parameters have changed. Also, at this writing, YaST/YaST2 will not manage bonding devices (they do not show bonding interfaces on its list of network @@ -559,12 +572,37 @@ Note that the template does not document the various BONDING_ settings described above, but does describe many of the other options. +3.1.1 Using DHCP with sysconfig +------------------------------- + + Under sysconfig, configuring a device with BOOTPROTO='dhcp' +will cause it to query DHCP for its IP address information. At this +writing, this does not function for bonding devices; the scripts +attempt to obtain the device address from DHCP prior to adding any of +the slave devices. Without active slaves, the DHCP requests are not +sent to the network. + +3.1.2 Configuring Multiple Bonds with sysconfig +----------------------------------------------- + + The sysconfig network initialization system is capable of +handling multiple bonding devices. All that is necessary is for each +bonding instance to have an appropriately configured ifcfg-bondX file +(as described above). Do not specify the "max_bonds" parameter to any +instance of bonding, as this will confuse sysconfig. If you require +multiple bonding devices with identical parameters, create multiple +ifcfg-bondX files. + + Because the sysconfig scripts supply the bonding module +options in the ifcfg-bondX file, it is not necessary to add them to +the system /etc/modules.conf or /etc/modprobe.conf configuration file. + 3.2 Configuration with initscripts support ------------------------------------------ This section applies to distros using a version of initscripts with bonding support, for example, Red Hat Linux 9 or Red Hat -Enterprise Linux version 3. On these systems, the network +Enterprise Linux version 3 or 4. On these systems, the network initialization scripts have some knowledge of bonding, and can be configured to control bonding devices. @@ -614,10 +652,11 @@ Be sure to change the networking specific lines (IPADDR, NETMASK, NETWORK and BROADCAST) to match your network configuration. - Finally, it is necessary to edit /etc/modules.conf to load the -bonding module when the bond0 interface is brought up. The following -sample lines in /etc/modules.conf will load the bonding module, and -select its options: + Finally, it is necessary to edit /etc/modules.conf (or +/etc/modprobe.conf, depending upon your distro) to load the bonding +module with your desired options when the bond0 interface is brought +up. The following lines in /etc/modules.conf (or modprobe.conf) will +load the bonding module, and select its options: alias bond0 bonding options bond0 mode=balance-alb miimon=100 @@ -629,6 +668,33 @@ will restart the networking subsystem and your bond link should be now up and running. +3.2.1 Using DHCP with initscripts +--------------------------------- + + Recent versions of initscripts (the version supplied with +Fedora Core 3 and Red Hat Enterprise Linux 4 is reported to work) do +have support for assigning IP information to bonding devices via DHCP. + + To configure bonding for DHCP, configure it as described +above, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" +and add a line consisting of "TYPE=Bonding". Note that the TYPE value +is case sensitive. + +3.2.2 Configuring Multiple Bonds with initscripts +------------------------------------------------- + + At this writing, the initscripts package does not directly +support loading the bonding driver multiple times, so the process for +doing so is the same as described in the "Configuring Multiple Bonds +Manually" section, below. + + NOTE: It has been observed that some Red Hat supplied kernels +are apparently unable to rename modules at load time (the "-obonding1" +part). Attempts to pass that option to modprobe will produce an +"Operation not permitted" error. This has been reported on some +Fedora Core kernels, and has been seen on RHEL 4 as well. On kernels +exhibiting this problem, it will be impossible to configure multiple +bonds with differing parameters. 3.3 Configuring Bonding Manually -------------------------------- @@ -638,10 +704,11 @@ knowledge of bonding. One such distro is SuSE Linux Enterprise Server version 8. - The general methodology for these systems is to place the -bonding module parameters into /etc/modprobe.conf, then add modprobe -and/or ifenslave commands to the system's global init script. The -name of the global init script differs; for sysconfig, it is + The general method for these systems is to place the bonding +module parameters into /etc/modules.conf or /etc/modprobe.conf (as +appropriate for the installed distro), then add modprobe and/or +ifenslave commands to the system's global init script. The name of +the global init script differs; for sysconfig, it is /etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. For example, if you wanted to make a simple bond of two e100 @@ -649,7 +716,7 @@ reboots, edit the appropriate file (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the following: -modprobe bonding -obond0 mode=balance-alb miimon=100 +modprobe bonding mode=balance-alb miimon=100 modprobe e100 ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up ifenslave bond0 eth0 @@ -657,11 +724,7 @@ Replace the example bonding module parameters and bond0 network configuration (IP address, netmask, etc) with the appropriate -values for your configuration. The above example loads the bonding -module with the name "bond0," this simplifies the naming if multiple -bonding modules are loaded (each successive instance of the module is -given a different name, and the module instance names match the -bonding interface names). +values for your configuration. Unfortunately, this method will not provide support for the ifup and ifdown scripts on the bond devices. To reload the bonding @@ -684,20 +747,23 @@ the following: # ifconfig bond0 down -# rmmod bond0 +# rmmod bonding # rmmod e100 Again, for convenience, it may be desirable to create a script with these commands. -3.4 Configuring Multiple Bonds ------------------------------- +3.3.1 Configuring Multiple Bonds Manually +----------------------------------------- This section contains information on configuring multiple -bonding devices with differing options. If you require multiple -bonding devices, but all with the same options, see the "max_bonds" -module paramter, documented above. +bonding devices with differing options for those systems whose network +initialization scripts lack support for configuring multiple bonds. + + If you require multiple bonding devices, but all with the same +options, you may wish to use the "max_bonds" module parameter, +documented above. To create multiple bonding devices with differing options, it is necessary to load the bonding driver multiple times. Note that @@ -724,11 +790,16 @@ miimon of 100. The second instance is named "bond1" and creates the bond1 device in balance-alb mode with an miimon of 50. + In some circumstances (typically with older distributions), +the above does not work, and the second bonding instance never sees +its options. In that case, the second options line can be substituted +as follows: + +install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50 + This may be repeated any number of times, specifying a new and -unique name in place of bond0 or bond1 for each instance. +unique name in place of bond1 for each subsequent instance. - When the appropriate module paramters are in place, then -configure bonding according to the instructions for your distro. 5. Querying Bonding Configuration ================================= @@ -846,8 +917,8 @@ self generated packets. For reasons of simplicity, and to support the use of adapters -that can do VLAN hardware acceleration offloding, the bonding -interface declares itself as fully hardware offloaing capable, it gets +that can do VLAN hardware acceleration offloading, the bonding +interface declares itself as fully hardware offloading capable, it gets the add_vid/kill_vid notifications to gather the necessary information, and it propagates those actions to the slaves. In case of mixed adapter types, hardware accelerated tagged packets that @@ -880,7 +951,7 @@ matches the hardware address of the VLAN interfaces. Note that changing a VLAN interface's HW address would set the -underlying device -- i.e. the bonding interface -- to promiscouos +underlying device -- i.e. the bonding interface -- to promiscuous mode, which might not be what you want. @@ -923,7 +994,7 @@ an additional target (or several) increases the reliability of the ARP monitoring. - Multiple ARP targets must be seperated by commas as follows: + Multiple ARP targets must be separated by commas as follows: # example options for ARP monitoring with three targets alias bond0 bonding @@ -1045,7 +1116,7 @@ This will, when loading the bonding module, rather than performing the normal action, instead execute the provided command. This command loads the device drivers in the order needed, then calls -modprobe with --ingore-install to cause the normal action to then take +modprobe with --ignore-install to cause the normal action to then take place. Full documentation on this can be found in the modprobe.conf and modprobe manual pages. @@ -1130,14 +1201,14 @@ common to enable promiscuous mode on the device, so that all traffic is seen (instead of seeing only traffic destined for the local host). The bonding driver handles promiscuous mode changes to the bonding -master device (e.g., bond0), and propogates the setting to the slave +master device (e.g., bond0), and propagates the setting to the slave devices. For the balance-rr, balance-xor, broadcast, and 802.3ad modes, -the promiscuous mode setting is propogated to all slaves. +the promiscuous mode setting is propagated to all slaves. For the active-backup, balance-tlb and balance-alb modes, the -promiscuous mode setting is propogated only to the active slave. +promiscuous mode setting is propagated only to the active slave. For balance-tlb mode, the active slave is the slave currently receiving inbound traffic. @@ -1148,46 +1219,182 @@ For the active-backup, balance-tlb and balance-alb modes, when the active slave changes (e.g., due to a link failure), the -promiscuous setting will be propogated to the new active slave. +promiscuous setting will be propagated to the new active slave. -12. High Availability Information -================================= +12. Configuring Bonding for High Availability +============================================= High Availability refers to configurations that provide maximum network availability by having redundant or backup devices, -links and switches between the host and the rest of the world. - - There are currently two basic methods for configuring to -maximize availability. They are dependent on the network topology and -the primary goal of the configuration, but in general, a configuration -can be optimized for maximum available bandwidth, or for maximum -network availability. +links or switches between the host and the rest of the world. The +goal is to provide the maximum availability of network connectivity +(i.e., the network always works), even though other configurations +could provide higher throughput. 12.1 High Availability in a Single Switch Topology -------------------------------------------------- - If two hosts (or a host and a switch) are directly connected -via multiple physical links, then there is no network availability -penalty for optimizing for maximum bandwidth: there is only one switch -(or peer), so if it fails, you have no alternative access to fail over -to. - -Example 1 : host to switch (or other host) - - +----------+ +----------+ - | |eth0 eth0| switch | - | Host A +--------------------------+ or | - | +--------------------------+ other | - | |eth1 eth1| host | - +----------+ +----------+ + If two hosts (or a host and a single switch) are directly +connected via multiple physical links, then there is no availability +penalty to optimizing for maximum bandwidth. In this case, there is +only one switch (or peer), so if it fails, there is no alternative +access to fail over to. Additionally, the bonding load balance modes +support link monitoring of their members, so if individual links fail, +the load will be rebalanced across the remaining devices. + + See Section 13, "Configuring Bonding for Maximum Throughput" +for information on configuring bonding with one peer device. +12.2 High Availability in a Multiple Switch Topology +---------------------------------------------------- -12.1.1 Bonding Mode Selection for single switch topology --------------------------------------------------------- + With multiple switches, the configuration of bonding and the +network changes dramatically. In multiple switch topologies, there is +a trade off between network availability and usable bandwidth. + + Below is a sample network, configured to maximize the +availability of the network: + + | | + |port3 port3| + +-----+----+ +-----+----+ + | |port2 ISL port2| | + | switch A +--------------------------+ switch B | + | | | | + +-----+----+ +-----++---+ + |port1 port1| + | +-------+ | + +-------------+ host1 +---------------+ + eth0 +-------+ eth1 + + In this configuration, there is a link between the two +switches (ISL, or inter switch link), and multiple ports connecting to +the outside world ("port3" on each switch). There is no technical +reason that this could not be extended to a third switch. + +12.2.1 HA Bonding Mode Selection for Multiple Switch Topology +------------------------------------------------------------- + + In a topology such as the example above, the active-backup and +broadcast modes are the only useful bonding modes when optimizing for +availability; the other modes require all links to terminate on the +same peer for them to behave rationally. + +active-backup: This is generally the preferred mode, particularly if + the switches have an ISL and play together well. If the + network configuration is such that one switch is specifically + a backup switch (e.g., has lower capacity, higher cost, etc), + then the primary option can be used to insure that the + preferred link is always used when it is available. + +broadcast: This mode is really a special purpose mode, and is suitable + only for very specific needs. For example, if the two + switches are not connected (no ISL), and the networks beyond + them are totally independent. In this case, if it is + necessary for some specific one-way traffic to reach both + independent networks, then the broadcast mode may be suitable. + +12.2.2 HA Link Monitoring Selection for Multiple Switch Topology +---------------------------------------------------------------- + + The choice of link monitoring ultimately depends upon your +switch. If the switch can reliably fail ports in response to other +failures, then either the MII or ARP monitors should work. For +example, in the above example, if the "port3" link fails at the remote +end, the MII monitor has no direct means to detect this. The ARP +monitor could be configured with a target at the remote end of port3, +thus detecting that failure without switch support. + + In general, however, in a multiple switch topology, the ARP +monitor can provide a higher level of reliability in detecting end to +end connectivity failures (which may be caused by the failure of any +individual component to pass traffic for any reason). Additionally, +the ARP monitor should be configured with multiple targets (at least +one for each switch in the network). This will insure that, +regardless of which switch is active, the ARP monitor has a suitable +target to query. + + +13. Configuring Bonding for Maximum Throughput +============================================== + +13.1 Maximizing Throughput in a Single Switch Topology +------------------------------------------------------ + + In a single switch configuration, the best method to maximize +throughput depends upon the application and network environment. The +various load balancing modes each have strengths and weaknesses in +different environments, as detailed below. + + For this discussion, we will break down the topologies into +two categories. Depending upon the destination of most traffic, we +categorize them into either "gatewayed" or "local" configurations. + + In a gatewayed configuration, the "switch" is acting primarily +as a router, and the majority of traffic passes through this router to +other networks. An example would be the following: + + + +----------+ +----------+ + | |eth0 port1| | to other networks + | Host A +---------------------+ router +-------------------> + | +---------------------+ | Hosts B and C are out + | |eth1 port2| | here somewhere + +----------+ +----------+ + + The router may be a dedicated router device, or another host +acting as a gateway. For our discussion, the important point is that +the majority of traffic from Host A will pass through the router to +some other network before reaching its final destination. + + In a gatewayed network configuration, although Host A may +communicate with many other systems, all of its traffic will be sent +and received via one other peer on the local network, the router. + + Note that the case of two systems connected directly via +multiple physical links is, for purposes of configuring bonding, the +same as a gatewayed configuration. In that case, it happens that all +traffic is destined for the "gateway" itself, not some other network +beyond the gateway. + + In a local configuration, the "switch" is acting primarily as +a switch, and the majority of traffic passes through this switch to +reach other stations on the same network. An example would be the +following: + + +----------+ +----------+ +--------+ + | |eth0 port1| +-------+ Host B | + | Host A +------------+ switch |port3 +--------+ + | +------------+ | +--------+ + | |eth1 port2| +------------------+ Host C | + +----------+ +----------+port4 +--------+ + + + Again, the switch may be a dedicated switch device, or another +host acting as a gateway. For our discussion, the important point is +that the majority of traffic from Host A is destined for other hosts +on the same local network (Hosts B and C in the above example). + + In summary, in a gatewayed configuration, traffic to and from +the bonded device will be to the same MAC level peer on the network +(the gateway itself, i.e., the router), regardless of its final +destination. In a local configuration, traffic flows directly to and +from the final destinations, thus, each destination (Host B, Host C) +will be addressed directly by their individual MAC addresses. + + This distinction between a gatewayed and a local network +configuration is important because many of the load balancing modes +available use the MAC addresses of the local network source and +destination to make load balancing decisions. The behavior of each +mode is described below. + + +13.1.1 MT Bonding Mode Selection for Single Switch Topology +----------------------------------------------------------- This configuration is the easiest to set up and to understand, although you will have to decide which bonding mode best suits your -needs. The tradeoffs for each mode are detailed below: +needs. The trade offs for each mode are detailed below: balance-rr: This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple @@ -1206,6 +1413,23 @@ interface's worth of throughput, even after adjusting tcp_reordering. + Note that this out of order delivery occurs when both the + sending and receiving systems are utilizing a multiple + interface bond. Consider a configuration in which a + balance-rr bond feeds into a single higher capacity network + channel (e.g., multiple 100Mb/sec ethernets feeding a single + gigabit ethernet via an etherchannel capable switch). In this + configuration, traffic sent from the multiple 100Mb devices to + a destination connected to the gigabit device will not see + packets out of order. However, traffic sent from the gigabit + device to the multiple 100Mb devices may or may not see + traffic out of order, depending upon the balance policy of the + switch. Many switches do not support any modes that stripe + traffic (instead choosing a port based upon IP or MAC level + addresses); for those devices, traffic flowing from the + gigabit device to the many 100Mb devices will only utilize one + interface. + If you are utilizing protocols other than TCP/IP, UDP for example, and your application can tolerate out of order delivery, then this mode can allow for single stream datagram @@ -1220,16 +1444,21 @@ connected to the same peer as the primary. In this case, a load balancing mode (with link monitoring) will provide the same level of network availability, but with increased - available bandwidth. On the plus side, it does not require - any configuration of the switch. + available bandwidth. On the plus side, active-backup mode + does not require any configuration of the switch, so it may + have value if the hardware available does not support any of + the load balance modes. balance-xor: This mode will limit traffic such that packets destined for specific peers will always be sent over the same interface. Since the destination is determined by the MAC - addresses involved, this may be desirable if you have a large - network with many hosts. It is likely to be suboptimal if all - your traffic is passed through a single router, however. As - with balance-rr, the switch ports need to be configured for + addresses involved, this mode works best in a "local" network + configuration (as described above), with destinations all on + the same local network. This mode is likely to be suboptimal + if all your traffic is passed through a single router (i.e., a + "gatewayed" network configuration, as described above). + + As with balance-rr, the switch ports need to be configured for "etherchannel" or "trunking." broadcast: Like active-backup, there is not much advantage to this @@ -1241,122 +1470,128 @@ protocol includes automatic configuration of the aggregates, so minimal manual configuration of the switch is needed (typically only to designate that some set of devices is - usable for 802.3ad). The 802.3ad standard also mandates that - frames be delivered in order (within certain limits), so in - general single connections will not see misordering of + available for 802.3ad). The 802.3ad standard also mandates + that frames be delivered in order (within certain limits), so + in general single connections will not see misordering of packets. The 802.3ad mode does have some drawbacks: the standard mandates that all devices in the aggregate operate at the same speed and duplex. Also, as with all bonding load balance modes other than balance-rr, no single connection will be able to utilize more than a single interface's worth of - bandwidth. Additionally, the linux bonding 802.3ad - implementation distributes traffic by peer (using an XOR of - MAC addresses), so in general all traffic to a particular - destination will use the same interface. Finally, the 802.3ad - mode mandates the use of the MII monitor, therefore, the ARP - monitor is not available in this mode. - -balance-tlb: This mode is also a good choice for this type of - topology. It has no special switch configuration - requirements, and balances outgoing traffic by peer, in a - vaguely intelligent manner (not a simple XOR as in balance-xor - or 802.3ad mode), so that unlucky MAC addresses will not all - "bunch up" on a single interface. Interfaces may be of - differing speeds. On the down side, in this mode all incoming - traffic arrives over a single interface, this mode requires - certain ethtool support in the network device driver of the - slave interfaces, and the ARP monitor is not available. - -balance-alb: This mode is everything that balance-tlb is, and more. It - has all of the features (and restrictions) of balance-tlb, and - will also balance incoming traffic from peers (as described in - the Bonding Module Options section, above). The only extra - down side to this mode is that the network device driver must - support changing the hardware address while the device is - open. + bandwidth. -12.1.2 Link Monitoring for Single Switch Topology -------------------------------------------------- + Additionally, the linux bonding 802.3ad implementation + distributes traffic by peer (using an XOR of MAC addresses), + so in a "gatewayed" configuration, all outgoing traffic will + generally use the same device. Incoming traffic may also end + up on a single device, but that is dependent upon the + balancing policy of the peer's 8023.ad implementation. In a + "local" configuration, traffic will be distributed across the + devices in the bond. + + Finally, the 802.3ad mode mandates the use of the MII monitor, + therefore, the ARP monitor is not available in this mode. + +balance-tlb: The balance-tlb mode balances outgoing traffic by peer. + Since the balancing is done according to MAC address, in a + "gatewayed" configuration (as described above), this mode will + send all traffic across a single device. However, in a + "local" network configuration, this mode balances multiple + local network peers across devices in a vaguely intelligent + manner (not a simple XOR as in balance-xor or 802.3ad mode), + so that mathematically unlucky MAC addresses (i.e., ones that + XOR to the same value) will not all "bunch up" on a single + interface. + + Unlike 802.3ad, interfaces may be of differing speeds, and no + special switch configuration is required. On the down side, + in this mode all incoming traffic arrives over a single + interface, this mode requires certain ethtool support in the + network device driver of the slave interfaces, and the ARP + monitor is not available. + +balance-alb: This mode is everything that balance-tlb is, and more. + It has all of the features (and restrictions) of balance-tlb, + and will also balance incoming traffic from local network + peers (as described in the Bonding Module Options section, + above). + + The only additional down side to this mode is that the network + device driver must support changing the hardware address while + the device is open. + +13.1.2 MT Link Monitoring for Single Switch Topology +---------------------------------------------------- The choice of link monitoring may largely depend upon which mode you choose to use. The more advanced load balancing modes do not support the use of the ARP monitor, and are thus restricted to using -the MII monitor (which does not provide as high a level of assurance -as the ARP monitor). - - -12.2 High Availability in a Multiple Switch Topology ----------------------------------------------------- - - With multiple switches, the configuration of bonding and the -network changes dramatically. In multiple switch topologies, there is -a tradeoff between network availability and usable bandwidth. - - Below is a sample network, configured to maximize the -availability of the network: - - | | - |port3 port3| - +-----+----+ +-----+----+ - | |port2 ISL port2| | - | switch A +--------------------------+ switch B | - | | | | - +-----+----+ +-----++---+ - |port1 port1| - | +-------+ | - +-------------+ host1 +---------------+ - eth0 +-------+ eth1 - - In this configuration, there is a link between the two -switches (ISL, or inter switch link), and multiple ports connecting to -the outside world ("port3" on each switch). There is no technical -reason that this could not be extended to a third switch. +the MII monitor (which does not provide as high a level of end to end +assurance as the ARP monitor). -12.2.1 Bonding Mode Selection for Multiple Switch Topology ----------------------------------------------------------- +13.2 Maximum Throughput in a Multiple Switch Topology +----------------------------------------------------- - In a topology such as this, the active-backup and broadcast -modes are the only useful bonding modes; the other modes require all -links to terminate on the same peer for them to behave rationally. - -active-backup: This is generally the preferred mode, particularly if - the switches have an ISL and play together well. If the - network configuration is such that one switch is specifically - a backup switch (e.g., has lower capacity, higher cost, etc), - then the primary option can be used to insure that the - preferred link is always used when it is available. + Multiple switches may be utilized to optimize for throughput +when they are configured in parallel as part of an isolated network +between two or more systems, for example: + + +-----------+ + | Host A | + +-+---+---+-+ + | | | + +--------+ | +---------+ + | | | + +------+---+ +-----+----+ +-----+----+ + | Switch A | | Switch B | | Switch C | + +------+---+ +-----+----+ +-----+----+ + | | | + +--------+ | +---------+ + | | | + +-+---+---+-+ + | Host B | + +-----------+ + + In this configuration, the switches are isolated from one +another. One reason to employ a topology such as this is for an +isolated network with many hosts (a cluster configured for high +performance, for example), using multiple smaller switches can be more +cost effective than a single larger switch, e.g., on a network with 24 +hosts, three 24 port switches can be significantly less expensive than +a single 72 port switch. + + If access beyond the network is required, an individual host +can be equipped with an additional network device connected to an +external network; this host then additionally acts as a gateway. -broadcast: This mode is really a special purpose mode, and is suitable - only for very specific needs. For example, if the two - switches are not connected (no ISL), and the networks beyond - them are totally independant. In this case, if it is - necessary for some specific one-way traffic to reach both - independent networks, then the broadcast mode may be suitable. - -12.2.2 Link Monitoring Selection for Multiple Switch Topology +13.2.1 MT Bonding Mode Selection for Multiple Switch Topology ------------------------------------------------------------- - The choice of link monitoring ultimately depends upon your -switch. If the switch can reliably fail ports in response to other -failures, then either the MII or ARP monitors should work. For -example, in the above example, if the "port3" link fails at the remote -end, the MII monitor has no direct means to detect this. The ARP -monitor could be configured with a target at the remote end of port3, -thus detecting that failure without switch support. - - In general, however, in a multiple switch topology, the ARP -monitor can provide a higher level of reliability in detecting link -failures. Additionally, it should be configured with multiple targets -(at least one for each switch in the network). This will insure that, -regardless of which switch is active, the ARP monitor has a suitable -target to query. - + In actual practice, the bonding mode typically employed in +configurations of this type is balance-rr. Historically, in this +network configuration, the usual caveats about out of order packet +delivery are mitigated by the use of network adapters that do not do +any kind of packet coalescing (via the use of NAPI, or because the +device itself does not generate interrupts until some number of +packets has arrived). When employed in this fashion, the balance-rr +mode allows individual connections between two hosts to effectively +utilize greater than one interface's bandwidth. + +13.2.2 MT Link Monitoring for Multiple Switch Topology +------------------------------------------------------ + + Again, in actual practice, the MII monitor is most often used +in this configuration, as performance is given preference over +availability. The ARP monitor will function in this topology, but its +advantages over the MII monitor are mitigated by the volume of probes +needed as the number of systems involved grows (remember that each +host in the network is configured with bonding). -12.3 Switch Behavior Issues for High Availability -------------------------------------------------- +14. Switch Behavior Issues +-------------------------- - You may encounter issues with the timing of link up and down -reporting by the switch. + Some switches exhibit undesirable behavior with regard to the +timing of link up and down reporting by the switch. First, when a link comes up, some switches may indicate that the link is up (carrier available), but not pass traffic over the @@ -1370,30 +1605,31 @@ Second, some switches may "bounce" the link state one or more times while a link is changing state. This occurs most commonly while the switch is initializing. Again, an appropriate updelay value may -help, but note that if all links are down, then updelay is ignored -when any link becomes active (the slave closest to completing its -updelay is chosen). +help. Note that when a bonding interface has no active links, the -driver will immediately reuse the first link that goes up, even if -updelay parameter was specified. If there are slave interfaces -waiting for the updelay timeout to expire, the interface that first -went into that state will be immediately reused. This reduces down -time of the network if the value of updelay has been overestimated. +driver will immediately reuse the first link that goes up, even if the +updelay parameter has been specified (the updelay is ignored in this +case). If there are slave interfaces waiting for the updelay timeout +to expire, the interface that first went into that state will be +immediately reused. This reduces down time of the network if the +value of updelay has been overestimated, and since this occurs only in +cases with no connectivity, there is no additional penalty for +ignoring the updelay. In addition to the concerns about switch timings, if your switches take a long time to go into backup mode, it may be desirable to not activate a backup interface immediately after a link goes down. Failover may be delayed via the downdelay bonding module option. -13. Hardware Specific Considerations +15. Hardware Specific Considerations ==================================== This section contains additional information for configuring bonding on specific hardware platforms, or for interfacing bonding with particular switches or other devices. -13.1 IBM BladeCenter +15.1 IBM BladeCenter -------------------- This applies to the JS20 and similar systems. @@ -1407,12 +1643,12 @@ -------------------------------- All JS20s come with two Broadcom Gigabit Ethernet ports -integrated on the planar. In the BladeCenter chassis, the eth0 port -of all JS20 blades is hard wired to I/O Module #1; similarly, all eth1 -ports are wired to I/O Module #2. An add-on Broadcom daughter card -can be installed on a JS20 to provide two more Gigabit Ethernet ports. -These ports, eth2 and eth3, are wired to I/O Modules 3 and 4, -respectively. +integrated on the planar (that's "motherboard" in IBM-speak). In the +BladeCenter chassis, the eth0 port of all JS20 blades is hard wired to +I/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. +An add-on Broadcom daughter card can be installed on a JS20 to provide +two more Gigabit Ethernet ports. These ports, eth2 and eth3, are +wired to I/O Modules 3 and 4, respectively. Each I/O Module may contain either a switch or a passthrough module (which allows ports to be directly connected to an external @@ -1432,29 +1668,30 @@ of ways, this discussion will be confined to describing basic configurations. - Normally, Ethernet Switch Modules (ESM) are used in I/O + Normally, Ethernet Switch Modules (ESMs) are used in I/O modules 1 and 2. In this configuration, the eth0 and eth1 ports of a JS20 will be connected to different internal switches (in the respective I/O modules). - An optical passthru module (OPM) connects the I/O module -directly to an external switch. By using OPMs in I/O module #1 and -#2, the eth0 and eth1 interfaces of a JS20 can be redirected to the -outside world and connected to a common external switch. - - Depending upon the mix of ESM and OPM modules, the network -will appear to bonding as either a single switch topology (all OPM -modules) or as a multiple switch topology (one or more ESM modules, -zero or more OPM modules). It is also possible to connect ESM modules -together, resulting in a configuration much like the example in "High -Availability in a multiple switch topology." - -Requirements for specifc modes ------------------------------- - - The balance-rr mode requires the use of OPM modules for -devices in the bond, all connected to an common external switch. That -switch must be configured for "etherchannel" or "trunking" on the + A passthrough module (OPM or CPM, optical or copper, +passthrough module) connects the I/O module directly to an external +switch. By using PMs in I/O module #1 and #2, the eth0 and eth1 +interfaces of a JS20 can be redirected to the outside world and +connected to a common external switch. + + Depending upon the mix of ESMs and PMs, the network will +appear to bonding as either a single switch topology (all PMs) or as a +multiple switch topology (one or more ESMs, zero or more PMs). It is +also possible to connect ESMs together, resulting in a configuration +much like the example in "High Availability in a Multiple Switch +Topology," above. + +Requirements for specific modes +------------------------------- + + The balance-rr mode requires the use of passthrough modules +for devices in the bond, all connected to an common external switch. +That switch must be configured for "etherchannel" or "trunking" on the appropriate ports, as is usual for balance-rr. The balance-alb and balance-tlb modes will function with @@ -1484,17 +1721,18 @@ Other concerns -------------- - The Serial Over LAN link is established over the primary + The Serial Over LAN (SoL) link is established over the primary ethernet (eth0) only, therefore, any loss of link to eth0 will result in losing your SoL connection. It will not fail over with other -network traffic. +network traffic, as the SoL system is beyond the control of the +bonding driver. It may be desirable to disable spanning tree on the switch (either the internal Ethernet Switch Module, or an external switch) to -avoid fail-over delays issues when using bonding. +avoid fail-over delay issues when using bonding. -14. Frequently Asked Questions +16. Frequently Asked Questions ============================== 1. Is it SMP safe? @@ -1505,8 +1743,8 @@ 2. What type of cards will work with it? Any Ethernet type cards (you can even mix cards - a Intel -EtherExpress PRO/100 and a 3com 3c905b, for example). They need not -be of the same speed. +EtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, +devices need not be of the same speed. 3. How many bonding devices can I have? @@ -1524,11 +1762,12 @@ disabled. The active-backup mode will fail over to a backup link, and other modes will ignore the failed link. The link will continue to be monitored, and should it recover, it will rejoin the bond (in whatever -manner is appropriate for the mode). See the section on High -Availability for additional information. +manner is appropriate for the mode). See the sections on High +Availability and the documentation for each mode for additional +information. Link monitoring can be enabled via either the miimon or -arp_interval paramters (described in the module paramters section, +arp_interval parameters (described in the module parameters section, above). In general, miimon monitors the carrier state as sensed by the underlying network device, and the arp monitor (arp_interval) monitors connectivity to another host on the local network. @@ -1536,7 +1775,7 @@ If no link monitoring is configured, the bonding driver will be unable to detect link failures, and will assume that all links are always available. This will likely result in lost packets, and a -resulting degredation of performance. The precise performance loss +resulting degradation of performance. The precise performance loss depends upon the bonding mode and network configuration. 6. Can bonding be used for High Availability? @@ -1550,12 +1789,12 @@ In the basic balance modes (balance-rr and balance-xor), it works with any system that supports etherchannel (also called trunking). Most managed switches currently available have such -support, and many unmananged switches as well. +support, and many unmanaged switches as well. The advanced balance modes (balance-tlb and balance-alb) do not have special switch requirements, but do need device drivers that support specific features (described in the appropriate section under -module paramters, above). +module parameters, above). In 802.3ad mode, it works with with systems that support IEEE 802.3ad Dynamic Link Aggregation. Most managed and many unmanaged @@ -1565,17 +1804,19 @@ 8. Where does a bonding device get its MAC address from? - If not explicitly configured with ifconfig, the MAC address of -the bonding device is taken from its first slave device. This MAC -address is then passed to all following slaves and remains persistent -(even if the the first slave is removed) until the bonding device is -brought down or reconfigured. + If not explicitly configured (with ifconfig or ip link), the +MAC address of the bonding device is taken from its first slave +device. This MAC address is then passed to all following slaves and +remains persistent (even if the the first slave is removed) until the +bonding device is brought down or reconfigured. If you wish to change the MAC address, you can set it with -ifconfig: +ifconfig or ip link: # ifconfig bond0 hw ether 00:11:22:33:44:55 +# ip link set bond0 address 66:77:88:99:aa:bb + The MAC address can be also changed by bringing down/up the device and then changing its slaves (or their order): @@ -1591,23 +1832,28 @@ then restore the MAC addresses that the slaves had before they were enslaved. -15. Resources and Links +16. Resources and Links ======================= The latest version of the bonding driver can be found in the latest version of the linux kernel, found on http://kernel.org +The latest version of this document can be found in either the latest +kernel source (named Documentation/networking/bonding.txt), or on the +bonding sourceforge site: + +http://www.sourceforge.net/projects/bonding + Discussions regarding the bonding driver take place primarily on the bonding-devel mailing list, hosted at sourceforge.net. If you have -questions or problems, post them to the list. +questions or problems, post them to the list. The list address is: bonding-devel@lists.sourceforge.net -https://lists.sourceforge.net/lists/listinfo/bonding-devel - -There is also a project site on sourceforge. + The administrative interface (to subscribe or unsubscribe) can +be found at: -http://www.sourceforge.net/projects/bonding +https://lists.sourceforge.net/lists/listinfo/bonding-devel Donald Becker's Ethernet Drivers and diag programs may be found at : - http://www.scyld.com/network/ From greearb@candelatech.com Fri Jun 3 12:00:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 12:00:44 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53J0fXq016275 for ; Fri, 3 Jun 2005 12:00:41 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j53JX55I003518; Fri, 3 Jun 2005 12:33:05 -0700 Message-ID: <42A0A897.5080006@candelatech.com> Date: Fri, 03 Jun 2005 11:59:35 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: john.ronciak@intel.com, Robert.Olsson@data.slu.se, jdmason@us.ibm.com, shemminger@osdl.org, hadi@cyberus.ca, mitch.a.williams@intel.com, netdev@oss.sgi.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch References: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> <42A0A25C.8000503@candelatech.com> <20050603.114950.119242486.davem@davemloft.net> In-Reply-To: <20050603.114950.119242486.davem@davemloft.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2045 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1039 Lines: 32 David S. Miller wrote: > From: Ben Greear >>Maybe the poll is disabling the IRQs on the NIC for too long, or something >>like that? > > > In a reply I just sent out to this thread, I postulate that the > jiffies check is hitting earlier with a lower weight value, a quick > look at /proc/net/softnet_stat during their testing will confirm or > deny this theory. That would basically just decrease the work done in the NAPI poll though, so I don't see how that could be the problem, since the 'solution' was to force less work to be done. > It could also just be a simple bug in the dev->quota accounting > somewhere. > > Note that, in all of this, I do not have any objections to providing > a way to configure the dev->weight values. I will be applying Stephen > Hemminger's patches. Good. The more knobs the merrier, so long as they are at least somewhat documented and default to good sane values :) Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@davemloft.net Fri Jun 3 12:02:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 12:02:42 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53J2cXq016818 for ; Fri, 3 Jun 2005 12:02:38 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeHQ6-0001wa-TJ; Fri, 03 Jun 2005 12:01:26 -0700 Date: Fri, 03 Jun 2005 12:01:26 -0700 (PDT) Message-Id: <20050603.120126.41874584.davem@davemloft.net> To: hadi@cyberus.ca Cc: mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <1117824150.6071.34.camel@localhost.localdomain> References: <1117765954.6095.49.camel@localhost.localdomain> <1117824150.6071.34.camel@localhost.localdomain> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2046 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1271 Lines: 34 From: jamal Date: Fri, 03 Jun 2005 14:42:30 -0400 > When you reduce the weight, the system is spending less time in the > softirq processing packets before softirq yields. If this gives more > opportunity to your app to run, then the performance will go up. > Is this what you are seeing? Jamal, this is my current theory as well, we hit the jiffies check. It it the only logical explanation I can come up with for the single adapter case. There are some ways we can mitigate this. Here is one idea off the top of my head. When the jiffies check is hit, lower the weight of the most recently polled device towards some minimum (perhaps divide by two). If we successfully poll without hitting the jiffies check, make a small increment of the weight up to some limit. It is Van Jacobson TCP congestion avoidance applied to NAPI :-) Just a simple AIMD (Additive Increase, Multiplicative Decrease). So, hitting the jiffies work limit is congestion, and the cause of the congestion is the most recently polled device. In this regime, what the driver currently specifies as "->weight" is actually the maximum we'll use in the congestion control algorithm. And we can choose some constant minimum, something like "8" ought to work well. Comments? From davem@davemloft.net Fri Jun 3 12:04:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 12:04:07 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53J44Xq017423 for ; Fri, 3 Jun 2005 12:04:04 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeHRZ-0001x7-H4; Fri, 03 Jun 2005 12:02:57 -0700 Date: Fri, 03 Jun 2005 12:02:57 -0700 (PDT) Message-Id: <20050603.120257.21929814.davem@davemloft.net> To: greearb@candelatech.com Cc: john.ronciak@intel.com, Robert.Olsson@data.slu.se, jdmason@us.ibm.com, shemminger@osdl.org, hadi@cyberus.ca, mitch.a.williams@intel.com, netdev@oss.sgi.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <42A0A897.5080006@candelatech.com> References: <42A0A25C.8000503@candelatech.com> <20050603.114950.119242486.davem@davemloft.net> <42A0A897.5080006@candelatech.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2047 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 581 Lines: 14 From: Ben Greear Date: Fri, 03 Jun 2005 11:59:35 -0700 > David S. Miller wrote: > > In a reply I just sent out to this thread, I postulate that the > > jiffies check is hitting earlier with a lower weight value, a quick > > look at /proc/net/softnet_stat during their testing will confirm or > > deny this theory. > > That would basically just decrease the work done in the NAPI poll though, > so I don't see how that could be the problem, since the 'solution' was to > force less work to be done. It allows his application to get onto the CPU faster. From davem@davemloft.net Fri Jun 3 12:27:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 12:27:07 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53JR2Xq022784 for ; Fri, 3 Jun 2005 12:27:03 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeHnq-00028w-Pk; Fri, 03 Jun 2005 12:25:58 -0700 Date: Fri, 03 Jun 2005 12:25:58 -0700 (PDT) Message-Id: <20050603.122558.88474819.davem@davemloft.net> To: netdev@oss.sgi.com CC: mchan@broadcom.com Subject: [PATCH]: Tigon3 new NAPI locking v2 From: "David S. Miller" X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2048 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 24870 Lines: 915 This version incorporates two bug fixes from Michael. 1) Check the mailbox register for 0x1 while polling on the COMPLETE state bit. 2) Remove the BUG_ON() check in tg3_restart_ints(), it can legally and harmlessly occur. Point #2 may want some refinements, but this patch below is good enough for testing. If someone (please please, pretty please) could be adventurous enough to attempt this kind of change for e1000, that would be great. Thanks. [TG3]: Eliminate all hw IRQ handler spinlocks. Move all driver spinlocks to be taken at sw IRQ context only. This fixes the skb_copy() we were doing with hw IRQs disabled (which is illegal and triggers a BUG() with HIGHMEM enabled). It also simplifies the locking all over the driver tremendously. We accomplish this feat by creating a special sequence to synchronize with the hw IRQ handler using a 2-bit atomic state. Signed-off-by: David S. Miller --- 1/drivers/net/tg3.c.~1~ 2005-06-03 12:11:40.000000000 -0700 +++ 2/drivers/net/tg3.c 2005-06-03 12:15:34.000000000 -0700 @@ -337,12 +337,10 @@ static struct { static void tg3_write_indirect_reg32(struct tg3 *tp, u32 off, u32 val) { if ((tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) != 0) { - unsigned long flags; - - spin_lock_irqsave(&tp->indirect_lock, flags); + spin_lock_bh(&tp->indirect_lock); pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off); pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val); - spin_unlock_irqrestore(&tp->indirect_lock, flags); + spin_unlock_bh(&tp->indirect_lock); } else { writel(val, tp->regs + off); if ((tp->tg3_flags & TG3_FLAG_5701_REG_WRITE_BUG) != 0) @@ -353,12 +351,10 @@ static void tg3_write_indirect_reg32(str static void _tw32_flush(struct tg3 *tp, u32 off, u32 val) { if ((tp->tg3_flags & TG3_FLAG_PCIX_TARGET_HWBUG) != 0) { - unsigned long flags; - - spin_lock_irqsave(&tp->indirect_lock, flags); + spin_lock_bh(&tp->indirect_lock); pci_write_config_dword(tp->pdev, TG3PCI_REG_BASE_ADDR, off); pci_write_config_dword(tp->pdev, TG3PCI_REG_DATA, val); - spin_unlock_irqrestore(&tp->indirect_lock, flags); + spin_unlock_bh(&tp->indirect_lock); } else { void __iomem *dest = tp->regs + off; writel(val, dest); @@ -398,28 +394,24 @@ static inline void _tw32_tx_mbox(struct static void tg3_write_mem(struct tg3 *tp, u32 off, u32 val) { - unsigned long flags; - - spin_lock_irqsave(&tp->indirect_lock, flags); + spin_lock_bh(&tp->indirect_lock); pci_write_config_dword(tp->pdev, TG3PCI_MEM_WIN_BASE_ADDR, off); pci_write_config_dword(tp->pdev, TG3PCI_MEM_WIN_DATA, val); /* Always leave this as zero. */ pci_write_config_dword(tp->pdev, TG3PCI_MEM_WIN_BASE_ADDR, 0); - spin_unlock_irqrestore(&tp->indirect_lock, flags); + spin_unlock_bh(&tp->indirect_lock); } static void tg3_read_mem(struct tg3 *tp, u32 off, u32 *val) { - unsigned long flags; - - spin_lock_irqsave(&tp->indirect_lock, flags); + spin_lock_bh(&tp->indirect_lock); pci_write_config_dword(tp->pdev, TG3PCI_MEM_WIN_BASE_ADDR, off); pci_read_config_dword(tp->pdev, TG3PCI_MEM_WIN_DATA, val); /* Always leave this as zero. */ pci_write_config_dword(tp->pdev, TG3PCI_MEM_WIN_BASE_ADDR, 0); - spin_unlock_irqrestore(&tp->indirect_lock, flags); + spin_unlock_bh(&tp->indirect_lock); } static void tg3_disable_ints(struct tg3 *tp) @@ -443,7 +435,7 @@ static void tg3_enable_ints(struct tg3 * tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, (tp->last_tag << 24)); tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); - + tp->irq_state = 0; tg3_cond_int(tp); } @@ -2578,7 +2570,7 @@ static void tg3_tx(struct tg3 *tp) sw_idx = NEXT_TX(sw_idx); } - dev_kfree_skb_irq(skb); + dev_kfree_skb(skb); } tp->tx_cons = sw_idx; @@ -2884,11 +2876,8 @@ static int tg3_poll(struct net_device *n { struct tg3 *tp = netdev_priv(netdev); struct tg3_hw_status *sblk = tp->hw_status; - unsigned long flags; int done; - spin_lock_irqsave(&tp->lock, flags); - /* handle link change and other phy events */ if (!(tp->tg3_flags & (TG3_FLAG_USE_LINKCHG_REG | @@ -2896,7 +2885,9 @@ static int tg3_poll(struct net_device *n if (sblk->status & SD_STATUS_LINK_CHG) { sblk->status = SD_STATUS_UPDATED | (sblk->status & ~SD_STATUS_LINK_CHG); + spin_lock(&tp->lock); tg3_setup_phy(tp, 0); + spin_unlock(&tp->lock); } } @@ -2907,8 +2898,6 @@ static int tg3_poll(struct net_device *n spin_unlock(&tp->tx_lock); } - spin_unlock_irqrestore(&tp->lock, flags); - /* run RX thread, within the bounds set by NAPI. * All RX "locking" is done by ensuring outside * code synchronizes with dev->poll() @@ -2933,15 +2922,62 @@ static int tg3_poll(struct net_device *n /* if no more work, tell net stack and NIC we're done */ done = !tg3_has_work(tp); if (done) { - spin_lock_irqsave(&tp->lock, flags); + spin_lock(&tp->lock); __netif_rx_complete(netdev); tg3_restart_ints(tp); - spin_unlock_irqrestore(&tp->lock, flags); + spin_unlock(&tp->lock); } return (done ? 0 : 1); } +static void tg3_irq_quiesce(struct tg3 *tp) +{ + BUG_ON(test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)); + + set_bit(TG3_IRQSTATE_SYNC, &tp->irq_state); + smp_mb(); + tw32(GRC_LOCAL_CTRL, + tp->grc_local_ctrl | GRC_LCLCTRL_SETINT); + + while (!test_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state)) { + u32 val = tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); + + if (val == 0x00000001) + break; + + cpu_relax(); + } +} + +static inline int tg3_irq_sync(struct tg3 *tp) +{ + if (test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)) { + set_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state); + return 1; + } + return 0; +} + +/* Fully shutdown all tg3 driver activity elsewhere in the system. + * If irq_sync is non-zero, then the IRQ handler must be synchronized + * with as well. Most of the time, this is not necessary except when + * shutting down the device. + */ +static inline void tg3_full_lock(struct tg3 *tp, int irq_sync) +{ + if (irq_sync) + tg3_irq_quiesce(tp); + spin_lock_bh(&tp->lock); + spin_lock(&tp->tx_lock); +} + +static inline void tg3_full_unlock(struct tg3 *tp) +{ + spin_unlock(&tp->tx_lock); + spin_unlock_bh(&tp->lock); +} + /* MSI ISR - No need to check for interrupt sharing and no need to * flush status block and interrupt mailbox. PCI ordering rules * guarantee that MSI will arrive after the status block. @@ -2951,9 +2987,6 @@ static irqreturn_t tg3_msi(int irq, void struct net_device *dev = dev_id; struct tg3 *tp = netdev_priv(dev); struct tg3_hw_status *sblk = tp->hw_status; - unsigned long flags; - - spin_lock_irqsave(&tp->lock, flags); /* * Writing any value to intr-mbox-0 clears PCI INTA# and @@ -2964,6 +2997,8 @@ static irqreturn_t tg3_msi(int irq, void */ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); tp->last_tag = sblk->status_tag; + if (tg3_irq_sync(tp)) + goto out; sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ @@ -2972,9 +3007,7 @@ static irqreturn_t tg3_msi(int irq, void tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, tp->last_tag << 24); } - - spin_unlock_irqrestore(&tp->lock, flags); - +out: return IRQ_RETVAL(1); } @@ -2983,11 +3016,8 @@ static irqreturn_t tg3_interrupt(int irq struct net_device *dev = dev_id; struct tg3 *tp = netdev_priv(dev); struct tg3_hw_status *sblk = tp->hw_status; - unsigned long flags; unsigned int handled = 1; - spin_lock_irqsave(&tp->lock, flags); - /* In INTx mode, it is possible for the interrupt to arrive at * the CPU before the status block posted prior to the interrupt. * Reading the PCI State register will confirm whether the @@ -3004,6 +3034,8 @@ static irqreturn_t tg3_interrupt(int irq */ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); + if (tg3_irq_sync(tp)) + goto out; sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ @@ -3018,9 +3050,7 @@ static irqreturn_t tg3_interrupt(int irq } else { /* shared interrupt */ handled = 0; } - - spin_unlock_irqrestore(&tp->lock, flags); - +out: return IRQ_RETVAL(handled); } @@ -3029,11 +3059,8 @@ static irqreturn_t tg3_interrupt_tagged( struct net_device *dev = dev_id; struct tg3 *tp = netdev_priv(dev); struct tg3_hw_status *sblk = tp->hw_status; - unsigned long flags; unsigned int handled = 1; - spin_lock_irqsave(&tp->lock, flags); - /* In INTx mode, it is possible for the interrupt to arrive at * the CPU before the status block posted prior to the interrupt. * Reading the PCI State register will confirm whether the @@ -3051,6 +3078,8 @@ static irqreturn_t tg3_interrupt_tagged( tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); tp->last_tag = sblk->status_tag; + if (tg3_irq_sync(tp)) + goto out; sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ @@ -3065,9 +3094,7 @@ static irqreturn_t tg3_interrupt_tagged( } else { /* shared interrupt */ handled = 0; } - - spin_unlock_irqrestore(&tp->lock, flags); - +out: return IRQ_RETVAL(handled); } @@ -3106,8 +3133,7 @@ static void tg3_reset_task(void *_data) tg3_netif_stop(tp); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 1); restart_timer = tp->tg3_flags2 & TG3_FLG2_RESTART_TIMER; tp->tg3_flags2 &= ~TG3_FLG2_RESTART_TIMER; @@ -3117,8 +3143,7 @@ static void tg3_reset_task(void *_data) tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); if (restart_timer) mod_timer(&tp->timer, jiffies + 1); @@ -3224,39 +3249,21 @@ static int tg3_start_xmit(struct sk_buff unsigned int i; u32 len, entry, base_flags, mss; int would_hit_hwbug; - unsigned long flags; len = skb_headlen(skb); /* No BH disabling for tx_lock here. We are running in BH disabled * context and TX reclaim runs via tp->poll inside of a software - * interrupt. Rejoice! - * - * Actually, things are not so simple. If we are to take a hw - * IRQ here, we can deadlock, consider: - * - * CPU1 CPU2 - * tg3_start_xmit - * take tp->tx_lock - * tg3_timer - * take tp->lock - * tg3_interrupt - * spin on tp->lock - * spin on tp->tx_lock - * - * So we really do need to disable interrupts when taking - * tx_lock here. + * interrupt. Furthermore, IRQ processing runs lockless so we have + * no IRQ context deadlocks to worry about either. Rejoice! */ - local_irq_save(flags); - if (!spin_trylock(&tp->tx_lock)) { - local_irq_restore(flags); + if (!spin_trylock(&tp->tx_lock)) return NETDEV_TX_LOCKED; - } /* This is a hard error, log it. */ if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) { netif_stop_queue(dev); - spin_unlock_irqrestore(&tp->tx_lock, flags); + spin_unlock(&tp->tx_lock); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", dev->name); return NETDEV_TX_BUSY; @@ -3421,7 +3428,7 @@ static int tg3_start_xmit(struct sk_buff out_unlock: mmiowb(); - spin_unlock_irqrestore(&tp->tx_lock, flags); + spin_unlock(&tp->tx_lock); dev->trans_start = jiffies; @@ -3455,8 +3462,8 @@ static int tg3_change_mtu(struct net_dev } tg3_netif_stop(tp); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + + tg3_full_lock(tp, 1); tg3_halt(tp, RESET_KIND_SHUTDOWN, 1); @@ -3466,8 +3473,7 @@ static int tg3_change_mtu(struct net_dev tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); return 0; } @@ -5088,9 +5094,9 @@ static int tg3_set_mac_addr(struct net_d memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); - spin_lock_irq(&tp->lock); + spin_lock_bh(&tp->lock); __tg3_set_mac_addr(tp); - spin_unlock_irq(&tp->lock); + spin_unlock_bh(&tp->lock); return 0; } @@ -5802,10 +5808,8 @@ static void tg3_periodic_fetch_stats(str static void tg3_timer(unsigned long __opaque) { struct tg3 *tp = (struct tg3 *) __opaque; - unsigned long flags; - spin_lock_irqsave(&tp->lock, flags); - spin_lock(&tp->tx_lock); + spin_lock(&tp->lock); if (!(tp->tg3_flags & TG3_FLAG_TAGGED_STATUS)) { /* All of this garbage is because when using non-tagged @@ -5822,8 +5826,7 @@ static void tg3_timer(unsigned long __op if (!(tr32(WDMAC_MODE) & WDMAC_MODE_ENABLE)) { tp->tg3_flags2 |= TG3_FLG2_RESTART_TIMER; - spin_unlock(&tp->tx_lock); - spin_unlock_irqrestore(&tp->lock, flags); + spin_unlock(&tp->lock); schedule_work(&tp->reset_task); return; } @@ -5891,8 +5894,7 @@ static void tg3_timer(unsigned long __op tp->asf_counter = tp->asf_multiplier; } - spin_unlock(&tp->tx_lock); - spin_unlock_irqrestore(&tp->lock, flags); + spin_unlock(&tp->lock); tp->timer.expires = jiffies + tp->timer_offset; add_timer(&tp->timer); @@ -6007,14 +6009,12 @@ static int tg3_test_msi(struct tg3 *tp) /* Need to reset the chip because the MSI cycle may have terminated * with Master Abort. */ - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 1); tg3_halt(tp, RESET_KIND_SHUTDOWN, 1); err = tg3_init_hw(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); if (err) free_irq(tp->pdev->irq, dev); @@ -6027,14 +6027,12 @@ static int tg3_open(struct net_device *d struct tg3 *tp = netdev_priv(dev); int err; - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); tg3_disable_ints(tp); tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE; - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); /* The placement of this call is tied * to the setup and use of Host TX descriptors. @@ -6081,8 +6079,7 @@ static int tg3_open(struct net_device *d return err; } - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); err = tg3_init_hw(tp); if (err) { @@ -6106,8 +6103,7 @@ static int tg3_open(struct net_device *d tp->timer.function = tg3_timer; } - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); if (err) { free_irq(tp->pdev->irq, dev); @@ -6123,8 +6119,7 @@ static int tg3_open(struct net_device *d err = tg3_test_msi(tp); if (err) { - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); if (tp->tg3_flags2 & TG3_FLG2_USING_MSI) { pci_disable_msi(tp->pdev); @@ -6134,22 +6129,19 @@ static int tg3_open(struct net_device *d tg3_free_rings(tp); tg3_free_consistent(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); return err; } } - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); add_timer(&tp->timer); tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE; tg3_enable_ints(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); netif_start_queue(dev); @@ -6395,8 +6387,7 @@ static int tg3_close(struct net_device * del_timer_sync(&tp->timer); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 1); #if 0 tg3_dump_state(tp); #endif @@ -6410,8 +6401,7 @@ static int tg3_close(struct net_device * TG3_FLAG_GOT_SERDES_FLOWCTL); netif_carrier_off(tp->dev); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); free_irq(tp->pdev->irq, dev); if (tp->tg3_flags2 & TG3_FLG2_USING_MSI) { @@ -6448,16 +6438,15 @@ static unsigned long calc_crc_errors(str if (!(tp->tg3_flags2 & TG3_FLG2_PHY_SERDES) && (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5700 || GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)) { - unsigned long flags; u32 val; - spin_lock_irqsave(&tp->lock, flags); + spin_lock_bh(&tp->lock); if (!tg3_readphy(tp, 0x1e, &val)) { tg3_writephy(tp, 0x1e, val | 0x8000); tg3_readphy(tp, 0x14, &val); } else val = 0; - spin_unlock_irqrestore(&tp->lock, flags); + spin_unlock_bh(&tp->lock); tp->phy_crc_errors += val; @@ -6719,11 +6708,9 @@ static void tg3_set_rx_mode(struct net_d { struct tg3 *tp = netdev_priv(dev); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); __tg3_set_rx_mode(dev); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); } #define TG3_REGDUMP_LEN (32 * 1024) @@ -6745,8 +6732,7 @@ static void tg3_get_regs(struct net_devi memset(p, 0, TG3_REGDUMP_LEN); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); #define __GET_REG32(reg) (*(p)++ = tr32(reg)) #define GET_REG32_LOOP(base,len) \ @@ -6796,8 +6782,7 @@ do { p = (u32 *)(orig_p + (reg)); \ #undef GET_REG32_LOOP #undef GET_REG32_1 - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); } static int tg3_get_eeprom_len(struct net_device *dev) @@ -6973,8 +6958,7 @@ static int tg3_set_settings(struct net_d return -EINVAL; } - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); tp->link_config.autoneg = cmd->autoneg; if (cmd->autoneg == AUTONEG_ENABLE) { @@ -6990,8 +6974,7 @@ static int tg3_set_settings(struct net_d if (netif_running(dev)) tg3_setup_phy(tp, 1); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); return 0; } @@ -7027,12 +7010,12 @@ static int tg3_set_wol(struct net_device !(tp->tg3_flags & TG3_FLAG_SERDES_WOL_CAP)) return -EINVAL; - spin_lock_irq(&tp->lock); + spin_lock_bh(&tp->lock); if (wol->wolopts & WAKE_MAGIC) tp->tg3_flags |= TG3_FLAG_WOL_ENABLE; else tp->tg3_flags &= ~TG3_FLAG_WOL_ENABLE; - spin_unlock_irq(&tp->lock); + spin_unlock_bh(&tp->lock); return 0; } @@ -7072,7 +7055,7 @@ static int tg3_nway_reset(struct net_dev if (!netif_running(dev)) return -EAGAIN; - spin_lock_irq(&tp->lock); + spin_lock_bh(&tp->lock); r = -EINVAL; tg3_readphy(tp, MII_BMCR, &bmcr); if (!tg3_readphy(tp, MII_BMCR, &bmcr) && @@ -7080,7 +7063,7 @@ static int tg3_nway_reset(struct net_dev tg3_writephy(tp, MII_BMCR, bmcr | BMCR_ANRESTART); r = 0; } - spin_unlock_irq(&tp->lock); + spin_unlock_bh(&tp->lock); return r; } @@ -7111,8 +7094,7 @@ static int tg3_set_ringparam(struct net_ if (netif_running(dev)) tg3_netif_stop(tp); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); tp->rx_pending = ering->rx_pending; @@ -7128,8 +7110,7 @@ static int tg3_set_ringparam(struct net_ tg3_netif_start(tp); } - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); return 0; } @@ -7150,8 +7131,8 @@ static int tg3_set_pauseparam(struct net if (netif_running(dev)) tg3_netif_stop(tp); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 1); + if (epause->autoneg) tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG; else @@ -7170,8 +7151,8 @@ static int tg3_set_pauseparam(struct net tg3_init_hw(tp); tg3_netif_start(tp); } - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + + tg3_full_unlock(tp); return 0; } @@ -7192,12 +7173,12 @@ static int tg3_set_rx_csum(struct net_de return 0; } - spin_lock_irq(&tp->lock); + spin_lock_bh(&tp->lock); if (data) tp->tg3_flags |= TG3_FLAG_RX_CHECKSUMS; else tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS; - spin_unlock_irq(&tp->lock); + spin_unlock_bh(&tp->lock); return 0; } @@ -7719,8 +7700,7 @@ static void tg3_self_test(struct net_dev if (netif_running(dev)) tg3_netif_stop(tp); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 1); tg3_halt(tp, RESET_KIND_SUSPEND, 1); tg3_nvram_lock(tp); @@ -7742,14 +7722,14 @@ static void tg3_self_test(struct net_dev data[4] = 1; } - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); + if (tg3_test_interrupt(tp) != 0) { etest->flags |= ETH_TEST_FL_FAILED; data[5] = 1; } - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + + tg3_full_lock(tp, 0); tg3_halt(tp, RESET_KIND_SHUTDOWN, 1); if (netif_running(dev)) { @@ -7757,8 +7737,8 @@ static void tg3_self_test(struct net_dev tg3_init_hw(tp); tg3_netif_start(tp); } - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + + tg3_full_unlock(tp); } } @@ -7779,9 +7759,9 @@ static int tg3_ioctl(struct net_device * if (tp->tg3_flags2 & TG3_FLG2_PHY_SERDES) break; /* We have no PHY */ - spin_lock_irq(&tp->lock); + spin_lock_bh(&tp->lock); err = tg3_readphy(tp, data->reg_num & 0x1f, &mii_regval); - spin_unlock_irq(&tp->lock); + spin_unlock_bh(&tp->lock); data->val_out = mii_regval; @@ -7795,9 +7775,9 @@ static int tg3_ioctl(struct net_device * if (!capable(CAP_NET_ADMIN)) return -EPERM; - spin_lock_irq(&tp->lock); + spin_lock_bh(&tp->lock); err = tg3_writephy(tp, data->reg_num & 0x1f, data->val_in); - spin_unlock_irq(&tp->lock); + spin_unlock_bh(&tp->lock); return err; @@ -7813,28 +7793,24 @@ static void tg3_vlan_rx_register(struct { struct tg3 *tp = netdev_priv(dev); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); tp->vlgrp = grp; /* Update RX_MODE_KEEP_VLAN_TAG bit in RX_MODE register. */ __tg3_set_rx_mode(dev); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); } static void tg3_vlan_rx_kill_vid(struct net_device *dev, unsigned short vid) { struct tg3 *tp = netdev_priv(dev); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); if (tp->vlgrp) tp->vlgrp->vlan_devices[vid] = NULL; - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); } #endif @@ -10141,24 +10117,19 @@ static int tg3_suspend(struct pci_dev *p del_timer_sync(&tp->timer); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 1); tg3_disable_ints(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); netif_device_detach(dev); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); tg3_halt(tp, RESET_KIND_SHUTDOWN, 1); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); err = tg3_set_power_state(tp, pci_choose_state(pdev, state)); if (err) { - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); tg3_init_hw(tp); @@ -10168,8 +10139,7 @@ static int tg3_suspend(struct pci_dev *p netif_device_attach(dev); tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); } return err; @@ -10192,8 +10162,7 @@ static int tg3_resume(struct pci_dev *pd netif_device_attach(dev); - spin_lock_irq(&tp->lock); - spin_lock(&tp->tx_lock); + tg3_full_lock(tp, 0); tg3_init_hw(tp); @@ -10204,8 +10173,7 @@ static int tg3_resume(struct pci_dev *pd tg3_netif_start(tp); - spin_unlock(&tp->tx_lock); - spin_unlock_irq(&tp->lock); + tg3_full_unlock(tp); return 0; } --- 1/drivers/net/tg3.h.~1~ 2005-06-03 12:11:44.000000000 -0700 +++ 2/drivers/net/tg3.h 2005-06-03 12:12:03.000000000 -0700 @@ -2006,17 +2006,33 @@ struct tg3_ethtool_stats { struct tg3 { /* begin "general, frequently-used members" cacheline section */ + /* If the IRQ handler (which runs lockless) needs to be + * quiesced, the following bitmask state is used. The + * SYNC bit is set by non-IRQ context code to initiate + * the quiescence. The setter of this bit also forces + * an interrupt to run via the GRC misc host control + * register. + * + * The IRQ handler notes this, disables interrupts, and + * sets the COMPLETE bit. At this point the SYNC bit + * setter can be assured that interrupts will no longer + * get run. + * + * In this way all SMP driver locks are never acquired + * in hw IRQ context, only sw IRQ context or lower. + */ + unsigned long irq_state; +#define TG3_IRQSTATE_SYNC 0 +#define TG3_IRQSTATE_COMPLETE 1 + /* SMP locking strategy: * * lock: Held during all operations except TX packet * processing. * - * tx_lock: Held during tg3_start_xmit{,_4gbug} and tg3_tx + * tx_lock: Held during tg3_start_xmit and tg3_tx * - * If you want to shut up all asynchronous processing you must - * acquire both locks, 'lock' taken before 'tx_lock'. IRQs must - * be disabled to take 'lock' but only softirq disabling is - * necessary for acquisition of 'tx_lock'. + * Both of these locks are to be held with BH safety. */ spinlock_t lock; spinlock_t indirect_lock; From mitch.a.williams@intel.com Fri Jun 3 12:30:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 12:30:29 -0700 (PDT) Received: from orsfmr003.jf.intel.com (fmr18.intel.com [134.134.136.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53JUOXq023660 for ; Fri, 3 Jun 2005 12:30:24 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j53JSBV5021340; Fri, 3 Jun 2005 19:28:12 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j53JSBSc001696; Fri, 3 Jun 2005 19:28:11 GMT Received: from mawilli1-desk2.amr.corp.intel.com (mawilli1-desk2.amr.corp.intel.com [134.134.3.124]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j53JSASL003143; Fri, 3 Jun 2005 12:28:10 -0700 Date: Fri, 3 Jun 2005 12:28:10 -0700 From: Mitch Williams X-X-Sender: mawilli1@mawilli1-desk2.amr.corp.intel.com To: "David S. Miller" cc: hadi@cyberus.ca, mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <20050603.120126.41874584.davem@davemloft.net> Message-ID: References: <1117765954.6095.49.camel@localhost.localdomain> <1117824150.6071.34.camel@localhost.localdomain> <20050603.120126.41874584.davem@davemloft.net> ReplyTo: "Mitch Williams" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2049 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mitch.a.williams@intel.com Precedence: bulk X-list: netdev Content-Length: 2334 Lines: 57 On Fri, 3 Jun 2005, David S. Miller wrote: > From: jamal > Date: Fri, 03 Jun 2005 14:42:30 -0400 > > > When you reduce the weight, the system is spending less time in the > > softirq processing packets before softirq yields. If this gives more > > opportunity to your app to run, then the performance will go up. > > Is this what you are seeing? > > Jamal, this is my current theory as well, we hit the jiffies > check. Well, I hate to mess up your guys' theories, but the real reason is simpler: hardware receive resources, specifically descriptors and buffers. In a typical NAPI polling loop, the driver processes receive packets until it either hits the quota or runs out of packets. Then, at the end of the loop, it returns all of those now-free receive resources back to the hardware. With a heavy receive load, the hardware will run out of receive descriptors in the time it takes the driver/NAPI/stack to process 64 packets. So it drops them on the floor. And, as we know, dropped packets are A Bad Thing. By reducing the driver weight, we cause the driver to give receive resources back to the hardware more often, which prevents dropped packets. As Ben Greer noticed, increasing the number of descriptors can help with this issue. But it really can't eliminate the problem -- once the ring is full, it doesn't matter how big it is, it's still full. In my testing (Dual 2.8GHz Xeon, PCI-X bus, Gigabit network, 10 clients), I was able to completely eliminate dropped packets in most cases by reducing the driver weight down to about 20. Now for some speculation: Aside from dropped packets, I saw continued performance gain with even lower weights, with the sweet spot (on a single adapter) being about 8 to 10. I don't have a definite answer for why this is happening, but my theory is that it's latency. Packets are processed more often, meaning they spend less time sitting in hardware-owned buffers, which means they get to the stack quicker, which means less latency. But I'm happy to admit I might be wrong with this theory. Nevertheless, the effect exists, and I've seen it on drivers other than just e1000. (And, no, I'm not allowed to say which other drivers I've used, or give specific numbers, or our lawyers will string me up by my toes.) Anybody else got a theory? -Mitch From hadi@cyberus.ca Fri Jun 3 12:42:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 12:42:25 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53JgJXq024746 for ; Fri, 3 Jun 2005 12:42:22 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DeI2o-0000q7-UJ for netdev@oss.sgi.com; Fri, 03 Jun 2005 15:41:26 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DeI2l-0007BV-3a; Fri, 03 Jun 2005 15:41:23 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <20050603.120126.41874584.davem@davemloft.net> References: <1117765954.6095.49.camel@localhost.localdomain> <1117824150.6071.34.camel@localhost.localdomain> <20050603.120126.41874584.davem@davemloft.net> Content-Type: text/plain Organization: unknown Date: Fri, 03 Jun 2005 15:40:50 -0400 Message-Id: <1117827650.6071.59.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2050 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 1976 Lines: 55 On Fri, 2005-03-06 at 12:01 -0700, David S. Miller wrote: > From: jamal > Date: Fri, 03 Jun 2005 14:42:30 -0400 > > > When you reduce the weight, the system is spending less time in the > > softirq processing packets before softirq yields. If this gives more > > opportunity to your app to run, then the performance will go up. > > Is this what you are seeing? > > Jamal, this is my current theory as well, we hit the jiffies > check. > I think you are more than likely right. If we can instrument it Mitch could check it out. Mitch would you like to try something that will instrument this? I know i have seen this behavior but it was when i was playing with some system that had a real small HZ. > It it the only logical explanation I can come up with for the > single adapter case. > > There are some ways we can mitigate this. Here is one idea > off the top of my head. > > When the jiffies check is hit, lower the weight of the most recently > polled device towards some minimum (perhaps divide by two). If we > successfully poll without hitting the jiffies check, make a small > increment of the weight up to some limit. > You probably wanna start high up first until you hit congestion and then start lowering. > It is Van Jacobson TCP congestion avoidance applied to NAPI :-) > > Just a simple AIMD (Additive Increase, Multiplicative Decrease). > So, hitting the jiffies work limit is congestion, and the cause > of the congestion is the most recently polled device. > > In this regime, what the driver currently specifies as "->weight" > is actually the maximum we'll use in the congestion control > algorithm. And we can choose some constant minimum, something > like "8" ought to work well. > > Comments? > In theory it looks good - but i think you end up defeating the fairness factor. If you can narrow it down to which driver is causing congestion, and only penalize that driver i think it would work well. cheers, jamal From hadi@cyberus.ca Fri Jun 3 13:01:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:01:05 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53K10Xq026708 for ; Fri, 3 Jun 2005 13:01:00 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DeIKt-0007Az-K1 for netdev@oss.sgi.com; Fri, 03 Jun 2005 16:00:07 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DeIKp-0002Aj-Tt; Fri, 03 Jun 2005 16:00:04 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Mitch Williams Cc: "David S. Miller" , john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: References: <1117765954.6095.49.camel@localhost.localdomain> <1117824150.6071.34.camel@localhost.localdomain> <20050603.120126.41874584.davem@davemloft.net> Content-Type: text/plain Organization: unknown Date: Fri, 03 Jun 2005 15:59:31 -0400 Message-Id: <1117828771.6071.77.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2051 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 1995 Lines: 51 On Fri, 2005-03-06 at 12:28 -0700, Mitch Williams wrote: > > On Fri, 3 Jun 2005, David S. Miller wrote: > > > From: jamal > > Date: Fri, 03 Jun 2005 14:42:30 -0400 > > > > > When you reduce the weight, the system is spending less time in the > > > softirq processing packets before softirq yields. If this gives more > > > opportunity to your app to run, then the performance will go up. > > > Is this what you are seeing? > > > > Jamal, this is my current theory as well, we hit the jiffies > > check. > > Well, I hate to mess up your guys' theories, but the real reason is > simpler: hardware receive resources, specifically descriptors and > buffers. > > In a typical NAPI polling loop, the driver processes receive packets until > it either hits the quota or runs out of packets. Then, at the end of the > loop, it returns all of those now-free receive resources back to the > hardware. > > With a heavy receive load, the hardware will run out of receive > descriptors in the time it takes the driver/NAPI/stack to process 64 > packets. So it drops them on the floor. And, as we know, dropped packets > are A Bad Thing. > > By reducing the driver weight, we cause the driver to give receive > resources back to the hardware more often, which prevents dropped packets. > > As Ben Greer noticed, increasing the number of descriptors can help with > this issue. But it really can't eliminate the problem -- once the ring > is full, it doesn't matter how big it is, it's still full. > > In my testing (Dual 2.8GHz Xeon, PCI-X bus, Gigabit network, 10 clients), > I was able to completely eliminate dropped packets in most cases by > reducing the driver weight down to about 20. > > Now for some speculation: > What you said above is unfortunately also speculation ;-> But one that you could validate by putting proper hooks. As an example, try to restore a descriptor every time you pick one - for an example of this look at the sb1250 driver. cheers, jamal From Robert.Olsson@data.slu.se Fri Jun 3 13:18:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:19:00 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KIiXq027844 for ; Fri, 3 Jun 2005 13:18:45 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j53KHVFH031123; Fri, 3 Jun 2005 22:17:32 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 985F5EE3F0; Fri, 3 Jun 2005 22:17:31 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17056.47835.583602.151291@robur.slu.se> Date: Fri, 3 Jun 2005 22:17:31 +0200 To: "Ronciak, John" Cc: "Robert Olsson" , "David S. Miller" , , , , "Williams, Mitch A" , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: RE: RFC: NAPI packet weighting patch In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> References: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> X-Mailer: VM 7.18 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-archive-position: 2052 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Content-Length: 815 Lines: 23 Ronciak, John writes: > With the same system (fairly high end with nothing major running on it) > we got rid of the dropped frames by just reducing the weight for 64. So > the weight did have something to do with the dropped frames. Maybe > other factors as well, but in static tests like this it sure looks like > the 64 value is wrong is some cases. It is possible that a lower weight forced your driver to disable interrupts and do packet reception w/o interrupts often this is more efficient as we get rid intr. latency etc. Again I think weight should only used for fairness and not control the threshold when to disable interrupts. You can test with a new policy in e1000_clean so you schedule for a new poll if work_done (any pkts received) or tx_cleaned is true. Cheers. --ro From davem@davemloft.net Fri Jun 3 13:24:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:24:14 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KO9Xq028589 for ; Fri, 3 Jun 2005 13:24:10 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeIh0-0002Ce-4d; Fri, 03 Jun 2005 13:22:58 -0700 Date: Fri, 03 Jun 2005 13:22:57 -0700 (PDT) Message-Id: <20050603.132257.23013342.davem@davemloft.net> To: mitch.a.williams@intel.com Cc: hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: References: <1117824150.6071.34.camel@localhost.localdomain> <20050603.120126.41874584.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2053 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 785 Lines: 20 From: Mitch Williams Date: Fri, 3 Jun 2005 12:28:10 -0700 > In a typical NAPI polling loop, the driver processes receive packets until > it either hits the quota or runs out of packets. Then, at the end of the > loop, it returns all of those now-free receive resources back to the > hardware. > > With a heavy receive load, the hardware will run out of receive > descriptors in the time it takes the driver/NAPI/stack to process 64 > packets. So it drops them on the floor. And, as we know, dropped packets > are A Bad Thing. This is why you should replenish RX packets _IN_ your RX packet receive processing, not via some tasklet or other seperate work processing context. No wonder I never see this on tg3. It is the only way to do this cleanly. From jgarzik@pobox.com Fri Jun 3 13:24:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:24:16 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KO9Xq028590 for ; Fri, 3 Jun 2005 13:24:10 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DeIhC-00022M-Vr; Fri, 03 Jun 2005 20:23:11 +0000 Message-ID: <42A0BC2B.4020409@pobox.com> Date: Fri, 03 Jun 2005 16:23:07 -0400 From: Jeff Garzik User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050328 Fedora/1.7.6-1.2.5 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 References: <20050603.122558.88474819.davem@davemloft.net> In-Reply-To: <20050603.122558.88474819.davem@davemloft.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2054 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 3007 Lines: 97 David S. Miller wrote: > [TG3]: Eliminate all hw IRQ handler spinlocks. > > Move all driver spinlocks to be taken at sw IRQ > context only. > > This fixes the skb_copy() we were doing with hw > IRQs disabled (which is illegal and triggers a > BUG() with HIGHMEM enabled). It also simplifies > the locking all over the driver tremendously. > > We accomplish this feat by creating a special > sequence to synchronize with the hw IRQ handler > using a 2-bit atomic state. > > Signed-off-by: David S. Miller overall, pretty spiffy :) As further work, I would like to see how much (alot? all?) of the timer code could be moved into a workqueue, where we could kill the last of the horrible-udelay loops in the driver. Particularly awful is while (++tick < 195000) { status = tg3_fiber_aneg_smachine(tp, &aninfo); if (status == ANEG_DONE || status == ANEG_FAILED) break; udelay(1); } where you could freeze a uniprocess box (lock out everything but interrupts) for over 1 second. IOW, the slower the phy, the more these slow-path delays can affect the overall system. This is a MINOR, low priority issue; but long delays are uglies that should be fixed, if its relatively painless. > +static void tg3_irq_quiesce(struct tg3 *tp) > +{ > + BUG_ON(test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)); > + > + set_bit(TG3_IRQSTATE_SYNC, &tp->irq_state); > + smp_mb(); > + tw32(GRC_LOCAL_CTRL, > + tp->grc_local_ctrl | GRC_LCLCTRL_SETINT); > + > + while (!test_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state)) { > + u32 val = tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); > + > + if (val == 0x00000001) > + break; > + > + cpu_relax(); > + } > +} * This loop makes me nervous... If there's a fault on the PCI bus or the hardware is unplugged, val will equal 0xffffffff. * A few comments for normal humans like "force an interrupt" and "wait for interrupt handler to complete" might be nice. * a BUG_ON(if-interrupts-are-disabled) line might be nice > +static inline int tg3_irq_sync(struct tg3 *tp) > +{ > + if (test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)) { > + set_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state); > + return 1; > + } > + return 0; > +} > + > +/* Fully shutdown all tg3 driver activity elsewhere in the system. > + * If irq_sync is non-zero, then the IRQ handler must be synchronized > + * with as well. Most of the time, this is not necessary except when > + * shutting down the device. > + */ > +static inline void tg3_full_lock(struct tg3 *tp, int irq_sync) > +{ > + if (irq_sync) > + tg3_irq_quiesce(tp); > + spin_lock_bh(&tp->lock); > + spin_lock(&tp->tx_lock); > +} Rather than an 'irq_sync' arg, my instinct would have been to create tg3_full_lock() and tg3_full_lock_sync(). This makes the action -much- more obvious to the reader, and since its inline doesn't cost anything (compiler's optimizer even does a tiny bit less work my way). Jeff From hadi@cyberus.ca Fri Jun 3 13:24:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:25:00 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KOsXq029246 for ; Fri, 3 Jun 2005 13:24:54 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DeIi2-0007K2-72 for netdev@oss.sgi.com; Fri, 03 Jun 2005 16:24:02 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DeIhy-0006OD-Jw; Fri, 03 Jun 2005 16:23:58 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <1117827650.6071.59.camel@localhost.localdomain> References: <1117765954.6095.49.camel@localhost.localdomain> <1117824150.6071.34.camel@localhost.localdomain> <20050603.120126.41874584.davem@davemloft.net> <1117827650.6071.59.camel@localhost.localdomain> Content-Type: text/plain Organization: unknown Date: Fri, 03 Jun 2005 16:23:25 -0400 Message-Id: <1117830205.6071.81.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2055 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 636 Lines: 22 On Fri, 2005-03-06 at 15:40 -0400, jamal wrote: > On Fri, 2005-03-06 at 12:01 -0700, David S. Miller wrote: > > I think you are more than likely right. If we can instrument it Mitch > could check it out. Mitch would you like to try something that will > instrument this? I know i have seen this behavior but it was when i was > playing with some system that had a real small HZ. > Sorry, Its already there as Dave said in his email. Look for time_squeeze. Its the column i labeled XXXX below. ----- $ cat /proc/net/softnet_stat 0000f938 00000000 XXXXXXX 00000000 00000000 00000000 00000000 00000000 00000000 ------ cheers, jamal From davem@davemloft.net Fri Jun 3 13:30:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:30:45 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KUYXq030460 for ; Fri, 3 Jun 2005 13:30:34 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeInD-0002DW-4v; Fri, 03 Jun 2005 13:29:23 -0700 Date: Fri, 03 Jun 2005 13:29:22 -0700 (PDT) Message-Id: <20050603.132922.63997492.davem@davemloft.net> To: mitch.a.williams@intel.com Cc: hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <20050603.132257.23013342.davem@davemloft.net> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2056 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1152 Lines: 33 From: "David S. Miller" Date: Fri, 03 Jun 2005 13:22:57 -0700 (PDT) > This is why you should replenish RX packets _IN_ your > RX packet receive processing, not via some tasklet > or other seperate work processing context. > > No wonder I never see this on tg3. Actually, the problem is slightly different. E1000 processes the full QUOTA of RX packets, _THEN_ replenishes with new RX buffers. No wonder the chip runs out of RX descriptors. You should replenish _AS_ you grab RX packets off the receive queue, just as tg3 does. This allows you to accomplish two things: 1) Keep up with the chip so that it does not starve, regardless of dev->weight setting or system load. 2) Make intelligent decisions when RX buffer allocation fails. When we look at a RX descriptor in tg3 we never leave the descriptor empty. If replacement RX buffer fails, we simply ignore the RX packet we're looking at and give it back to the chip. Every driver should implement this policy. Drivers that do not do things this way run into all kinds of RX ring chip starvation issues like the ones you are seeing here. From mitch.a.williams@intel.com Fri Jun 3 13:31:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:31:16 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KVCXq030621 for ; Fri, 3 Jun 2005 13:31:12 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j53KSvYu018304; Fri, 3 Jun 2005 20:28:57 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j53KSvdD000620; Fri, 3 Jun 2005 20:28:57 GMT Received: from mawilli1-desk2.amr.corp.intel.com (mawilli1-desk2.amr.corp.intel.com [134.134.3.124]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j53KSvSL006410; Fri, 3 Jun 2005 13:28:57 -0700 Date: Fri, 3 Jun 2005 13:28:57 -0700 From: Mitch Williams X-X-Sender: mawilli1@mawilli1-desk2.amr.corp.intel.com To: jamal cc: "David S. Miller" , "Williams, Mitch A" , "Ronciak, John" , jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <1117830205.6071.81.camel@localhost.localdomain> Message-ID: References: <1117765954.6095.49.camel@localhost.localdomain> <1117824150.6071.34.camel@localhost.localdomain> <20050603.120126.41874584.davem@davemloft.net> <1117827650.6071.59.camel@localhost.localdomain> <1117830205.6071.81.camel@localhost.localdomain> ReplyTo: "Mitch Williams" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2057 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mitch.a.williams@intel.com Precedence: bulk X-list: netdev Content-Length: 464 Lines: 21 On Fri, 3 Jun 2005, jamal wrote: > > Sorry, Its already there as Dave said in his email. > Look for time_squeeze. Its the column i labeled XXXX below. > > ----- > $ cat /proc/net/softnet_stat > 0000f938 00000000 XXXXXXX 00000000 00000000 00000000 00000000 00000000 > 00000000 > ------ I might not be able to get into the lab today (they keep making me do work!), but I should be able to pop in Monday and take a look. Shouldn't take too long. Thanks, Mitch From davem@davemloft.net Fri Jun 3 13:31:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:31:57 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KVoXq031106 for ; Fri, 3 Jun 2005 13:31:50 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeIoT-0002EE-9j; Fri, 03 Jun 2005 13:30:41 -0700 Date: Fri, 03 Jun 2005 13:30:41 -0700 (PDT) Message-Id: <20050603.133041.35664164.davem@davemloft.net> To: Robert.Olsson@data.slu.se Cc: john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, hadi@cyberus.ca, mitch.a.williams@intel.com, netdev@oss.sgi.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <17056.47835.583602.151291@robur.slu.se> References: <468F3FDA28AA87429AD807992E22D07E0450BFE8@orsmsx408> <17056.47835.583602.151291@robur.slu.se> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2058 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 852 Lines: 22 From: Robert Olsson Date: Fri, 3 Jun 2005 22:17:31 +0200 > It is possible that a lower weight forced your driver to disable interrupts > and do packet reception w/o interrupts often this is more efficient as > we get rid intr. latency etc. > > Again I think weight should only used for fairness and not control the > threshold when to disable interrupts. > > You can test with a new policy in e1000_clean so you schedule for a new > poll if work_done (any pkts received) or tx_cleaned is true. I don't think this is it. What's happening is that E1000 pulls up to a full dev->quota of packets off the ring, and _THEN_ goes back and does RX buffer replenishing. It is very clear why E1000 runs out of RX descriptors with this kind of policy. I outlined a way to fix this in the E1000 driver in another email. From greearb@candelatech.com Fri Jun 3 13:32:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:32:08 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KW1Xq031239 for ; Fri, 3 Jun 2005 13:32:01 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j53L4P5I004513; Fri, 3 Jun 2005 14:04:25 -0700 Message-ID: <42A0BDFE.1020607@candelatech.com> Date: Fri, 03 Jun 2005 13:30:54 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Mitch Williams CC: "David S. Miller" , hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch References: <1117765954.6095.49.camel@localhost.localdomain> <1117824150.6071.34.camel@localhost.localdomain> <20050603.120126.41874584.davem@davemloft.net> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2059 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 3639 Lines: 86 Mitch Williams wrote: > > On Fri, 3 Jun 2005, David S. Miller wrote: > > >>From: jamal >>Date: Fri, 03 Jun 2005 14:42:30 -0400 >> >> >>>When you reduce the weight, the system is spending less time in the >>>softirq processing packets before softirq yields. If this gives more >>>opportunity to your app to run, then the performance will go up. >>>Is this what you are seeing? >> >>Jamal, this is my current theory as well, we hit the jiffies >>check. > > > Well, I hate to mess up your guys' theories, but the real reason is > simpler: hardware receive resources, specifically descriptors and > buffers. > > In a typical NAPI polling loop, the driver processes receive packets until > it either hits the quota or runs out of packets. Then, at the end of the > loop, it returns all of those now-free receive resources back to the > hardware. > > With a heavy receive load, the hardware will run out of receive > descriptors in the time it takes the driver/NAPI/stack to process 64 > packets. So it drops them on the floor. And, as we know, dropped packets > are A Bad Thing. If it can fill up more than 190 RX descriptors in the time it takes NAPI to pull 64, then there is no possible way to not drop packets! How could NAPI ever keep up if what you say is true? > By reducing the driver weight, we cause the driver to give receive > resources back to the hardware more often, which prevents dropped packets. > > As Ben Greer noticed, increasing the number of descriptors can help with > this issue. But it really can't eliminate the problem -- once the ring > is full, it doesn't matter how big it is, it's still full. If you have 1024 rx descriptors, and the NAPI poll pulls off 64 at one time, I do not see how pulling off 20 could be any more useful. Either way, you have more than 900 other RX descriptors to be received. Even if you only have the default of 256 the NIC should be able to continue receiving packets with the other 190 or so descriptors while NAPI is doing it's receive poll. If the buffers are often nearly used up, then the problem is that the NAPI poll cannot pull the packets fast enough, and again, I do not see how making it do more polls could make it able to pull packets from the NIC more efficiently. Maybe you could instrument the NAPI receive logic to see if there is some horrible waste of CPU and/or time when it tries to pull larger amounts of packets at once? A linear increase in work cannot explain what you are describing. > In my testing (Dual 2.8GHz Xeon, PCI-X bus, Gigabit network, 10 clients), > I was able to completely eliminate dropped packets in most cases by > reducing the driver weight down to about 20. At least tell us what type of traffic you are using? TCP with MTU sized packets, traffic-generator with 60 byte packets? Actual speed that you are running (aggregate)? Full-duplex traffic, or mostly uni-directional? packets-per-second you are receiving & transmitting when the drops occur? On a dual 2.8Ghz xeon system with PCI-X bus, with a quad-port Intel pro/1000 NIC I can run about 950Mbps of traffic, bi-directional, on two ports at the same time, and drop few or no packets. (MTU sized packets here). This is using a modified version of pktgen, btw. So, if you are seeing any amount of dropped pkts on a single NIC, especially if you are mostly doing uni-directional traffic, then I think the problem might be elsewhere, because the stock 2.6.11 and similar kernels can easily handle this amount of network traffic. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@davemloft.net Fri Jun 3 13:32:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:32:49 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KWjXq032093 for ; Fri, 3 Jun 2005 13:32:45 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DeIpJ-0002EY-N9; Fri, 03 Jun 2005 13:31:33 -0700 Date: Fri, 03 Jun 2005 13:31:33 -0700 (PDT) Message-Id: <20050603.133133.38710501.davem@davemloft.net> To: hadi@cyberus.ca Cc: mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <1117828771.6071.77.camel@localhost.localdomain> References: <20050603.120126.41874584.davem@davemloft.net> <1117828771.6071.77.camel@localhost.localdomain> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2060 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 498 Lines: 13 From: jamal Date: Fri, 03 Jun 2005 15:59:31 -0400 > But one that you could validate by putting proper hooks. As an example, > try to restore a descriptor every time you pick one - for an example of > this look at the sb1250 driver. Yes, this in my mind is exactly the problem. TG3 does this properly, as do several other drivers. You should never defer RX buffer replenishment, you should always do it as you grab packets off of the ring. You will starve the chip otherwise. From gwingerde@home.nl Fri Jun 3 13:39:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:39:34 -0700 (PDT) Received: from smtpq2.home.nl (smtpq2.home.nl [213.51.128.197]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KdTXq000881 for ; Fri, 3 Jun 2005 13:39:29 -0700 Received: from [213.51.128.134] (port=56790 helo=smtp3.home.nl) by smtpq2.home.nl with esmtp (Exim 4.30) id 1DeIw3-0007qn-85; Fri, 03 Jun 2005 22:38:31 +0200 Received: from cc10088-a.ensch1.ov.home.nl ([217.123.128.105]:47093 helo=[192.168.14.1]) by smtp3.home.nl with esmtp (Exim 4.30) id 1DeIw1-0006TR-5a; Fri, 03 Jun 2005 22:38:29 +0200 Message-ID: <42A0BE19.3060503@home.nl> Date: Fri, 03 Jun 2005 22:31:21 +0200 From: Gertjan van Wingerde User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050322) X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: jgarzik@pobox.com Subject: [PATCH 1/2] ieee80211: Update generic definitions to latest specs - take #2 Content-Type: multipart/mixed; boundary="------------060209080801050100020902" X-AtHome-MailScanner-Information: Neem contact op met support@home.nl voor meer informatie X-AtHome-MailScanner: Found to be clean X-archive-position: 2062 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gwingerde@home.nl Precedence: bulk X-list: netdev Content-Length: 10875 Lines: 296 This is a multi-part message in MIME format. --------------060209080801050100020902 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, Attached patch updates the definitions of the generic ieee80211 stack to the latest versions of the published 802.11x specification suite. Please review and apply. Signed-off-by: Gertjan van Wingerde Thanks, Gertjan van Wingerde --------------060209080801050100020902 Content-Type: text/plain; name="ieee80211-new-definitions.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ieee80211-new-definitions.diff" Index: include/net/ieee80211.h =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/include/net/ieee80211.h (mode:100644) +++ uncommitted/include/net/ieee80211.h (mode:100644) @@ -103,7 +103,7 @@ #define MAX_FRAG_THRESHOLD 2346U /* Frame control field constants */ -#define IEEE80211_FCTL_VERS 0x0002 +#define IEEE80211_FCTL_VERS 0x0003 #define IEEE80211_FCTL_FTYPE 0x000c #define IEEE80211_FCTL_STYPE 0x00f0 #define IEEE80211_FCTL_TODS 0x0100 @@ -111,8 +111,8 @@ #define IEEE80211_FCTL_MOREFRAGS 0x0400 #define IEEE80211_FCTL_RETRY 0x0800 #define IEEE80211_FCTL_PM 0x1000 -#define IEEE80211_FCTL_MOREDATA 0x2000 -#define IEEE80211_FCTL_WEP 0x4000 +#define IEEE80211_FCTL_MOREDATA 0x2000 +#define IEEE80211_FCTL_PROTECTED 0x4000 #define IEEE80211_FCTL_ORDER 0x8000 #define IEEE80211_FTYPE_MGMT 0x0000 @@ -131,6 +131,7 @@ #define IEEE80211_STYPE_DISASSOC 0x00A0 #define IEEE80211_STYPE_AUTH 0x00B0 #define IEEE80211_STYPE_DEAUTH 0x00C0 +#define IEEE80211_STYPE_ACTION 0x00D0 /* control */ #define IEEE80211_STYPE_PSPOLL 0x00A0 @@ -251,6 +252,7 @@ #define SNAP_SIZE sizeof(struct ieee80211_snap_hdr) +#define WLAN_FC_GET_VERS(fc) ((fc) & IEEE80211_FCTL_VERS) #define WLAN_FC_GET_TYPE(fc) ((fc) & IEEE80211_FCTL_FTYPE) #define WLAN_FC_GET_STYPE(fc) ((fc) & IEEE80211_FCTL_STYPE) @@ -271,6 +273,9 @@ #define WLAN_CAPABILITY_SHORT_PREAMBLE (1<<5) #define WLAN_CAPABILITY_PBCC (1<<6) #define WLAN_CAPABILITY_CHANNEL_AGILITY (1<<7) +#define WLAN_CAPABILITY_SPECTRUM_MGMT (1<<8) +#define WLAN_CAPABILITY_SHORT_SLOT_TIME (1<<10) +#define WLAN_CAPABILITY_OSSS_OFDM (1<<13) /* Status codes */ #define WLAN_STATUS_SUCCESS 0 @@ -285,9 +290,24 @@ #define WLAN_STATUS_AP_UNABLE_TO_HANDLE_NEW_STA 17 #define WLAN_STATUS_ASSOC_DENIED_RATES 18 /* 802.11b */ -#define WLAN_STATUS_ASSOC_DENIED_NOSHORT 19 +#define WLAN_STATUS_ASSOC_DENIED_NOSHORTPREAMBLE 19 #define WLAN_STATUS_ASSOC_DENIED_NOPBCC 20 #define WLAN_STATUS_ASSOC_DENIED_NOAGILITY 21 +/* 802.11h */ +#define WLAN_STATUS_ASSOC_DENIED_NOSPECTRUM 22 +#define WLAN_STATUS_ASSOC_REJECTED_BAD_POWER 23 +#define WLAN_STATUS_ASSOC_REJECTED_BAD_SUPP_CHAN 24 +/* 802.11g */ +#define WLAN_STATUS_ASSOC_DENIED_NOSHORTTIME 25 +#define WLAN_STATUS_ASSOC_DENIED_NODSSSOFDM 26 +/* 802.11i */ +#define WLAN_STATUS_INVALID_IE 40 +#define WLAN_STATUS_INVALID_GROUP_CIPHER 41 +#define WLAN_STATUS_INVALID_PAIRWISE_CIPHER 42 +#define WLAN_STATUS_INVALID_AKMP 43 +#define WLAN_STATUS_UNSUPP_RSN_VERSION 44 +#define WLAN_STATUS_INVALID_RSN_IE_CAP 45 +#define WLAN_STATUS_CIPHER_SUITE_REJECTED 46 /* Reason codes */ #define WLAN_REASON_UNSPECIFIED 1 @@ -299,6 +319,22 @@ #define WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA 7 #define WLAN_REASON_DISASSOC_STA_HAS_LEFT 8 #define WLAN_REASON_STA_REQ_ASSOC_WITHOUT_AUTH 9 +/* 802.11h */ +#define WLAN_REASON_DISASSOC_BAD_POWER 10 +#define WLAN_REASON_DISASSOC_BAD_SUPP_CHAN 11 +/* 802.11i */ +#define WLAN_REASON_INVALID_IE 13 +#define WLAN_REASON_MIC_FAILURE 14 +#define WLAN_REASON_4WAY_HANDSHAKE_TIMEOUT 15 +#define WLAN_REASON_GROUP_KEY_HANDSHAKE_TIMEOUT 16 +#define WLAN_REASON_IE_DIFFERENT 17 +#define WLAN_REASON_INVALID_GROUP_CIPHER 18 +#define WLAN_REASON_INVALID_PAIRWISE_CIPHER 19 +#define WLAN_REASON_INVALID_AKMP 20 +#define WLAN_REASON_UNSUPP_RSN_VERSION 21 +#define WLAN_REASON_INVALID_RSN_IE_CAP 22 +#define WLAN_REASON_IEEE8021X_FAILED 23 +#define WLAN_REASON_CIPHER_SUITE_REJECTED 24 #define IEEE80211_STATMASK_SIGNAL (1<<0) @@ -477,17 +513,32 @@ #define BEACON_PROBE_SSID_ID_POSITION 12 /* Management Frame Information Element Types */ -#define MFIE_TYPE_SSID 0 -#define MFIE_TYPE_RATES 1 -#define MFIE_TYPE_FH_SET 2 -#define MFIE_TYPE_DS_SET 3 -#define MFIE_TYPE_CF_SET 4 -#define MFIE_TYPE_TIM 5 -#define MFIE_TYPE_IBSS_SET 6 -#define MFIE_TYPE_CHALLENGE 16 -#define MFIE_TYPE_RSN 48 -#define MFIE_TYPE_RATES_EX 50 -#define MFIE_TYPE_GENERIC 221 +#define MFIE_TYPE_SSID 0 +#define MFIE_TYPE_RATES 1 +#define MFIE_TYPE_FH_SET 2 +#define MFIE_TYPE_DS_SET 3 +#define MFIE_TYPE_CF_SET 4 +#define MFIE_TYPE_TIM 5 +#define MFIE_TYPE_IBSS_SET 6 +#define MFIE_TYPE_COUNTRY 7 +#define MFIE_TYPE_HOP_PARAMS 8 +#define MFIE_TYPE_HOP_TABLE 9 +#define MFIE_TYPE_REQUEST 10 +#define MFIE_TYPE_CHALLENGE 16 +#define MFIE_TYPE_POWER_CONSTRAINT 32 +#define MFIE_TYPE_POWER_CAPABILITY 33 +#define MFIE_TYPE_TPC_REQUEST 34 +#define MFIE_TYPE_TPC_REPORT 35 +#define MFIE_TYPE_SUPP_CHANNELS 36 +#define MFIE_TYPE_CSA 37 +#define MFIE_TYPE_MEASURE_REQUEST 38 +#define MFIE_TYPE_MEASURE_REPORT 39 +#define MFIE_TYPE_QUIET 40 +#define MFIE_TYPE_IBSS_DFS 41 +#define MFIE_TYPE_ERP_INFO 42 +#define MFIE_TYPE_RSN 48 +#define MFIE_TYPE_RATES_EX 50 +#define MFIE_TYPE_GENERIC 221 struct ieee80211_info_element_hdr { u8 id; Index: net/ieee80211/ieee80211_rx.c =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/net/ieee80211/ieee80211_rx.c (mode:100644) +++ uncommitted/net/ieee80211/ieee80211_rx.c (mode:100644) @@ -440,7 +440,7 @@ crypt->ops->decrypt_mpdu == NULL)) crypt = NULL; - if (!crypt && (fc & IEEE80211_FCTL_WEP)) { + if (!crypt && (fc & IEEE80211_FCTL_PROTECTED)) { /* This seems to be triggered by some (multicast?) * frames from other than current BSS, so just drop the * frames silently instead of filling system log with @@ -456,7 +456,7 @@ #ifdef NOT_YET if (type != WLAN_FC_TYPE_DATA) { if (type == WLAN_FC_TYPE_MGMT && stype == WLAN_FC_STYPE_AUTH && - fc & IEEE80211_FCTL_WEP && ieee->host_decrypt && + fc & IEEE80211_FCTL_PROTECTED && ieee->host_decrypt && (keyidx = hostap_rx_frame_decrypt(ieee, skb, crypt)) < 0) { printk(KERN_DEBUG "%s: failed to decrypt mgmt::auth " @@ -557,7 +557,7 @@ /* skb: hdr + (possibly fragmented, possibly encrypted) payload */ - if (ieee->host_decrypt && (fc & IEEE80211_FCTL_WEP) && + if (ieee->host_decrypt && (fc & IEEE80211_FCTL_PROTECTED) && (keyidx = ieee80211_rx_frame_decrypt(ieee, skb, crypt)) < 0) goto rx_dropped; @@ -565,7 +565,7 @@ /* skb: hdr + (possibly fragmented) plaintext payload */ // PR: FIXME: hostap has additional conditions in the "if" below: - // ieee->host_decrypt && (fc & IEEE80211_FCTL_WEP) && + // ieee->host_decrypt && (fc & IEEE80211_FCTL_PROTECTED) && if ((frag != 0 || (fc & IEEE80211_FCTL_MOREFRAGS))) { int flen; struct sk_buff *frag_skb = ieee80211_frag_cache_get(ieee, hdr); @@ -621,12 +621,12 @@ /* skb: hdr + (possible reassembled) full MSDU payload; possibly still * encrypted/authenticated */ - if (ieee->host_decrypt && (fc & IEEE80211_FCTL_WEP) && + if (ieee->host_decrypt && (fc & IEEE80211_FCTL_PROTECTED) && ieee80211_rx_frame_decrypt_msdu(ieee, skb, keyidx, crypt)) goto rx_dropped; hdr = (struct ieee80211_hdr *) skb->data; - if (crypt && !(fc & IEEE80211_FCTL_WEP) && !ieee->open_wep) { + if (crypt && !(fc & IEEE80211_FCTL_PROTECTED) && !ieee->open_wep) { if (/*ieee->ieee802_1x &&*/ ieee80211_is_eapol_frame(ieee, skb)) { #ifdef CONFIG_IEEE80211_DEBUG @@ -647,7 +647,7 @@ } #ifdef CONFIG_IEEE80211_DEBUG - if (crypt && !(fc & IEEE80211_FCTL_WEP) && + if (crypt && !(fc & IEEE80211_FCTL_PROTECTED) && ieee80211_is_eapol_frame(ieee, skb)) { struct eapol *eap = (struct eapol *)(skb->data + 24); @@ -656,7 +656,7 @@ } #endif - if (crypt && !(fc & IEEE80211_FCTL_WEP) && !ieee->open_wep && + if (crypt && !(fc & IEEE80211_FCTL_PROTECTED) && !ieee->open_wep && !ieee80211_is_eapol_frame(ieee, skb)) { IEEE80211_DEBUG_DROP( "dropped unencrypted RX data " Index: net/ieee80211/ieee80211_tx.c =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/net/ieee80211/ieee80211_tx.c (mode:100644) +++ uncommitted/net/ieee80211/ieee80211_tx.c (mode:100644) @@ -314,7 +314,7 @@ if (encrypt) fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA | - IEEE80211_FCTL_WEP; + IEEE80211_FCTL_PROTECTED; else fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_DATA; Index: drivers/net/wireless/atmel.c =================================================================== --- 4b4ba76aa81b3627142787262fd2f8049dd3662d/drivers/net/wireless/atmel.c (mode:100644) +++ uncommitted/drivers/net/wireless/atmel.c (mode:100644) @@ -867,7 +867,7 @@ header.duration_id = 0; header.seq_ctl = 0; if (priv->wep_is_on) - frame_ctl |= IEEE80211_FCTL_WEP; + frame_ctl |= IEEE80211_FCTL_PROTECTED; if (priv->operating_mode == IW_MODE_ADHOC) { memcpy(&header.addr1, skb->data, 6); memcpy(&header.addr2, dev->dev_addr, 6); @@ -1117,7 +1117,7 @@ /* probe for CRC use here if needed once five packets have arrived with the same crc status, we assume we know what's happening and stop probing */ if (priv->probe_crc) { - if (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_WEP)) { + if (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_PROTECTED)) { priv->do_rx_crc = probe_crc(priv, rx_packet_loc, msdu_size); } else { priv->do_rx_crc = probe_crc(priv, rx_packet_loc + 24, msdu_size - 24); @@ -1132,7 +1132,7 @@ } /* don't CRC header when WEP in use */ - if (priv->do_rx_crc && (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_WEP))) { + if (priv->do_rx_crc && (!priv->wep_is_on || !(frame_ctl & IEEE80211_FCTL_PROTECTED))) { crc = crc32_le(0xffffffff, (unsigned char *)&header, 24); } msdu_size -= 24; /* header */ @@ -2677,7 +2677,7 @@ auth.alg = cpu_to_le16(C80211_MGMT_AAN_SHAREDKEY); /* no WEP for authentication frames with TrSeqNo 1 */ if (priv->CurrentAuthentTransactionSeqNum != 1) - header.frame_ctl |= cpu_to_le16(IEEE80211_FCTL_WEP); + header.frame_ctl |= cpu_to_le16(IEEE80211_FCTL_PROTECTED); } else { auth.alg = cpu_to_le16(C80211_MGMT_AAN_OPENSYSTEM); } --------------060209080801050100020902-- From gwingerde@home.nl Fri Jun 3 13:39:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:39:38 -0700 (PDT) Received: from smtpq3.home.nl (smtpq3.home.nl [213.51.128.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KdYXq000900 for ; Fri, 3 Jun 2005 13:39:34 -0700 Received: from [213.51.128.133] (port=52706 helo=smtp2.home.nl) by smtpq3.home.nl with esmtp (Exim 4.30) id 1DeIw6-0001k0-7U; Fri, 03 Jun 2005 22:38:34 +0200 Received: from cc10088-a.ensch1.ov.home.nl ([217.123.128.105]:47094 helo=[192.168.14.1]) by smtp2.home.nl with esmtp (Exim 4.30) id 1DeIw4-00051c-PM; Fri, 03 Jun 2005 22:38:32 +0200 Message-ID: <42A0BE1C.6080904@home.nl> Date: Fri, 03 Jun 2005 22:31:24 +0200 From: Gertjan van Wingerde User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050322) X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: jgarzik@pobox.com Subject: [PATCH 2/2] ieee80211: Update generic definitions to latest specs - take #2 Content-Type: multipart/mixed; boundary="------------080907000309050506010000" X-AtHome-MailScanner-Information: Neem contact op met support@home.nl voor meer informatie X-AtHome-MailScanner: Found to be clean X-archive-position: 2063 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gwingerde@home.nl Precedence: bulk X-list: netdev Content-Length: 7315 Lines: 213 This is a multi-part message in MIME format. --------------080907000309050506010000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, Attached patch cleans up the long lists of #defines for status codes, reason codes, and information elements. Signed-off-by: Gertjan van Wingerde Thanks, Gertjan van Wingerde --------------080907000309050506010000 Content-Type: text/plain; name="ieee80211-cleanup.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ieee80211-cleanup.diff" Index: include/net/ieee80211.h =================================================================== --- eb77617da695526508e860d3775afc781de70dea/include/net/ieee80211.h (mode:100644) +++ uncommitted/include/net/ieee80211.h (mode:100644) @@ -278,63 +278,67 @@ #define WLAN_CAPABILITY_OSSS_OFDM (1<<13) /* Status codes */ -#define WLAN_STATUS_SUCCESS 0 -#define WLAN_STATUS_UNSPECIFIED_FAILURE 1 -#define WLAN_STATUS_CAPS_UNSUPPORTED 10 -#define WLAN_STATUS_REASSOC_NO_ASSOC 11 -#define WLAN_STATUS_ASSOC_DENIED_UNSPEC 12 -#define WLAN_STATUS_NOT_SUPPORTED_AUTH_ALG 13 -#define WLAN_STATUS_UNKNOWN_AUTH_TRANSACTION 14 -#define WLAN_STATUS_CHALLENGE_FAIL 15 -#define WLAN_STATUS_AUTH_TIMEOUT 16 -#define WLAN_STATUS_AP_UNABLE_TO_HANDLE_NEW_STA 17 -#define WLAN_STATUS_ASSOC_DENIED_RATES 18 -/* 802.11b */ -#define WLAN_STATUS_ASSOC_DENIED_NOSHORTPREAMBLE 19 -#define WLAN_STATUS_ASSOC_DENIED_NOPBCC 20 -#define WLAN_STATUS_ASSOC_DENIED_NOAGILITY 21 -/* 802.11h */ -#define WLAN_STATUS_ASSOC_DENIED_NOSPECTRUM 22 -#define WLAN_STATUS_ASSOC_REJECTED_BAD_POWER 23 -#define WLAN_STATUS_ASSOC_REJECTED_BAD_SUPP_CHAN 24 -/* 802.11g */ -#define WLAN_STATUS_ASSOC_DENIED_NOSHORTTIME 25 -#define WLAN_STATUS_ASSOC_DENIED_NODSSSOFDM 26 -/* 802.11i */ -#define WLAN_STATUS_INVALID_IE 40 -#define WLAN_STATUS_INVALID_GROUP_CIPHER 41 -#define WLAN_STATUS_INVALID_PAIRWISE_CIPHER 42 -#define WLAN_STATUS_INVALID_AKMP 43 -#define WLAN_STATUS_UNSUPP_RSN_VERSION 44 -#define WLAN_STATUS_INVALID_RSN_IE_CAP 45 -#define WLAN_STATUS_CIPHER_SUITE_REJECTED 46 +enum ieee80211_statuscode { + WLAN_STATUS_SUCCESS = 0, + WLAN_STATUS_UNSPECIFIED_FAILURE = 1, + WLAN_STATUS_CAPS_UNSUPPORTED = 10, + WLAN_STATUS_REASSOC_NO_ASSOC = 11, + WLAN_STATUS_ASSOC_DENIED_UNSPEC = 12, + WLAN_STATUS_NOT_SUPPORTED_AUTH_ALG = 13, + WLAN_STATUS_UNKNOWN_AUTH_TRANSACTION = 14, + WLAN_STATUS_CHALLENGE_FAIL = 15, + WLAN_STATUS_AUTH_TIMEOUT = 16, + WLAN_STATUS_AP_UNABLE_TO_HANDLE_NEW_STA = 17, + WLAN_STATUS_ASSOC_DENIED_RATES = 18, + /* 802.11b */ + WLAN_STATUS_ASSOC_DENIED_NOSHORTPREAMBLE = 19, + WLAN_STATUS_ASSOC_DENIED_NOPBCC = 20, + WLAN_STATUS_ASSOC_DENIED_NOAGILITY = 21, + /* 802.11h */ + WLAN_STATUS_ASSOC_DENIED_NOSPECTRUM = 22, + WLAN_STATUS_ASSOC_REJECTED_BAD_POWER = 23, + WLAN_STATUS_ASSOC_REJECTED_BAD_SUPP_CHAN = 24, + /* 802.11g */ + WLAN_STATUS_ASSOC_DENIED_NOSHORTTIME = 25, + WLAN_STATUS_ASSOC_DENIED_NODSSSOFDM = 26, + /* 802.11i */ + WLAN_STATUS_INVALID_IE = 40, + WLAN_STATUS_INVALID_GROUP_CIPHER = 41, + WLAN_STATUS_INVALID_PAIRWISE_CIPHER = 42, + WLAN_STATUS_INVALID_AKMP = 43, + WLAN_STATUS_UNSUPP_RSN_VERSION = 44, + WLAN_STATUS_INVALID_RSN_IE_CAP = 45, + WLAN_STATUS_CIPHER_SUITE_REJECTED = 46, +}; /* Reason codes */ -#define WLAN_REASON_UNSPECIFIED 1 -#define WLAN_REASON_PREV_AUTH_NOT_VALID 2 -#define WLAN_REASON_DEAUTH_LEAVING 3 -#define WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY 4 -#define WLAN_REASON_DISASSOC_AP_BUSY 5 -#define WLAN_REASON_CLASS2_FRAME_FROM_NONAUTH_STA 6 -#define WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA 7 -#define WLAN_REASON_DISASSOC_STA_HAS_LEFT 8 -#define WLAN_REASON_STA_REQ_ASSOC_WITHOUT_AUTH 9 -/* 802.11h */ -#define WLAN_REASON_DISASSOC_BAD_POWER 10 -#define WLAN_REASON_DISASSOC_BAD_SUPP_CHAN 11 -/* 802.11i */ -#define WLAN_REASON_INVALID_IE 13 -#define WLAN_REASON_MIC_FAILURE 14 -#define WLAN_REASON_4WAY_HANDSHAKE_TIMEOUT 15 -#define WLAN_REASON_GROUP_KEY_HANDSHAKE_TIMEOUT 16 -#define WLAN_REASON_IE_DIFFERENT 17 -#define WLAN_REASON_INVALID_GROUP_CIPHER 18 -#define WLAN_REASON_INVALID_PAIRWISE_CIPHER 19 -#define WLAN_REASON_INVALID_AKMP 20 -#define WLAN_REASON_UNSUPP_RSN_VERSION 21 -#define WLAN_REASON_INVALID_RSN_IE_CAP 22 -#define WLAN_REASON_IEEE8021X_FAILED 23 -#define WLAN_REASON_CIPHER_SUITE_REJECTED 24 +enum ieee80211_reasoncode { + WLAN_REASON_UNSPECIFIED = 1, + WLAN_REASON_PREV_AUTH_NOT_VALID = 2, + WLAN_REASON_DEAUTH_LEAVING = 3, + WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY = 4, + WLAN_REASON_DISASSOC_AP_BUSY = 5, + WLAN_REASON_CLASS2_FRAME_FROM_NONAUTH_STA = 6, + WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA = 7, + WLAN_REASON_DISASSOC_STA_HAS_LEFT = 8, + WLAN_REASON_STA_REQ_ASSOC_WITHOUT_AUTH = 9, + /* 802.11h */ + WLAN_REASON_DISASSOC_BAD_POWER = 10, + WLAN_REASON_DISASSOC_BAD_SUPP_CHAN = 11, + /* 802.11i */ + WLAN_REASON_INVALID_IE = 13, + WLAN_REASON_MIC_FAILURE = 14, + WLAN_REASON_4WAY_HANDSHAKE_TIMEOUT = 15, + WLAN_REASON_GROUP_KEY_HANDSHAKE_TIMEOUT = 16, + WLAN_REASON_IE_DIFFERENT = 17, + WLAN_REASON_INVALID_GROUP_CIPHER = 18, + WLAN_REASON_INVALID_PAIRWISE_CIPHER = 19, + WLAN_REASON_INVALID_AKMP = 20, + WLAN_REASON_UNSUPP_RSN_VERSION = 21, + WLAN_REASON_INVALID_RSN_IE_CAP = 22, + WLAN_REASON_IEEE8021X_FAILED = 23, + WLAN_REASON_CIPHER_SUITE_REJECTED = 24, +}; #define IEEE80211_STATMASK_SIGNAL (1<<0) @@ -513,32 +517,34 @@ #define BEACON_PROBE_SSID_ID_POSITION 12 /* Management Frame Information Element Types */ -#define MFIE_TYPE_SSID 0 -#define MFIE_TYPE_RATES 1 -#define MFIE_TYPE_FH_SET 2 -#define MFIE_TYPE_DS_SET 3 -#define MFIE_TYPE_CF_SET 4 -#define MFIE_TYPE_TIM 5 -#define MFIE_TYPE_IBSS_SET 6 -#define MFIE_TYPE_COUNTRY 7 -#define MFIE_TYPE_HOP_PARAMS 8 -#define MFIE_TYPE_HOP_TABLE 9 -#define MFIE_TYPE_REQUEST 10 -#define MFIE_TYPE_CHALLENGE 16 -#define MFIE_TYPE_POWER_CONSTRAINT 32 -#define MFIE_TYPE_POWER_CAPABILITY 33 -#define MFIE_TYPE_TPC_REQUEST 34 -#define MFIE_TYPE_TPC_REPORT 35 -#define MFIE_TYPE_SUPP_CHANNELS 36 -#define MFIE_TYPE_CSA 37 -#define MFIE_TYPE_MEASURE_REQUEST 38 -#define MFIE_TYPE_MEASURE_REPORT 39 -#define MFIE_TYPE_QUIET 40 -#define MFIE_TYPE_IBSS_DFS 41 -#define MFIE_TYPE_ERP_INFO 42 -#define MFIE_TYPE_RSN 48 -#define MFIE_TYPE_RATES_EX 50 -#define MFIE_TYPE_GENERIC 221 +enum ieee80211_mfie { + MFIE_TYPE_SSID = 0, + MFIE_TYPE_RATES = 1, + MFIE_TYPE_FH_SET = 2, + MFIE_TYPE_DS_SET = 3, + MFIE_TYPE_CF_SET = 4, + MFIE_TYPE_TIM = 5, + MFIE_TYPE_IBSS_SET = 6, + MFIE_TYPE_COUNTRY = 7, + MFIE_TYPE_HOP_PARAMS = 8, + MFIE_TYPE_HOP_TABLE = 9, + MFIE_TYPE_REQUEST = 10, + MFIE_TYPE_CHALLENGE = 16, + MFIE_TYPE_POWER_CONSTRAINT = 32, + MFIE_TYPE_POWER_CAPABILITY = 33, + MFIE_TYPE_TPC_REQUEST = 34, + MFIE_TYPE_TPC_REPORT = 35, + MFIE_TYPE_SUPP_CHANNELS = 36, + MFIE_TYPE_CSA = 37, + MFIE_TYPE_MEASURE_REQUEST = 38, + MFIE_TYPE_MEASURE_REPORT = 39, + MFIE_TYPE_QUIET = 40, + MFIE_TYPE_IBSS_DFS = 41, + MFIE_TYPE_ERP_INFO = 42, + MFIE_TYPE_RSN = 48, + MFIE_TYPE_RATES_EX = 50, + MFIE_TYPE_GENERIC = 221, +}; struct ieee80211_info_element_hdr { u8 id; --------------080907000309050506010000-- From gwingerde@home.nl Fri Jun 3 13:39:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:39:31 -0700 (PDT) Received: from smtpq2.home.nl (smtpq2.home.nl [213.51.128.197]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KdRXq000874 for ; Fri, 3 Jun 2005 13:39:28 -0700 Received: from [213.51.128.134] (port=56778 helo=smtp3.home.nl) by smtpq2.home.nl with esmtp (Exim 4.30) id 1DeIvx-0007q0-A0; Fri, 03 Jun 2005 22:38:25 +0200 Received: from cc10088-a.ensch1.ov.home.nl ([217.123.128.105]:47092 helo=[192.168.14.1]) by smtp3.home.nl with esmtp (Exim 4.30) id 1DeIvv-0006Rf-Qs; Fri, 03 Jun 2005 22:38:23 +0200 Message-ID: <42A0BE13.3060509@home.nl> Date: Fri, 03 Jun 2005 22:31:15 +0200 From: Gertjan van Wingerde User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050322) X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: jgarzik@pobox.com Subject: [PATCH 0/2] ieee80211: Update generic definitions to latest specs - take #2 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AtHome-MailScanner-Information: Neem contact op met support@home.nl voor meer informatie X-AtHome-MailScanner: Found to be clean X-archive-position: 2061 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gwingerde@home.nl Precedence: bulk X-list: netdev Content-Length: 381 Lines: 17 Hi, Following patches update the definitions of the generic ieee80211 stack to the latest versions of the published 802.11x specification suite, and cleans up the long list of defines. The set of patches is a resubmittal of my earlier patch, with the comments of Jiri Benc and Stephen Hemminger fixed. The patches need to be applied in order. Thanks, Gertjan van Wingerde From mchan@broadcom.com Fri Jun 3 13:48:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 13:48:29 -0700 (PDT) Received: from MMS1.broadcom.com (mms1.broadcom.com [216.31.210.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53KmPXq002813 for ; Fri, 3 Jun 2005 13:48:26 -0700 Received: from 10.10.64.121 by MMS1.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Fri, 03 Jun 2005 13:47:07 -0700 X-Server-Uuid: 146C3151-C1DE-4F71-9D02-C3BE503878DD Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Fri, 3 Jun 2005 13:47:06 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BCA28027; Fri, 3 Jun 2005 13:47:04 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id NAA24325; Fri, 3 Jun 2005 13:47:03 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Fri, 3 Jun 2005 20:47:03 +0000 Received: from rh4 by nt-irva-0741; 03 Jun 2005 12:49:29 -0700 Subject: Re: RFC: NAPI packet weighting patch From: "Michael Chan" To: "David S. Miller" cc: mitch.a.williams@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <20050603.132922.63997492.davem@davemloft.net> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> Date: Fri, 03 Jun 2005 12:49:29 -0700 Message-ID: <1117828169.4430.29.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EBE1E412U45192377-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2064 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 590 Lines: 15 On Fri, 2005-06-03 at 13:29 -0700, David S. Miller wrote: > E1000 processes the full QUOTA of RX packets, > _THEN_ replenishes with new RX buffers. No wonder > the chip runs out of RX descriptors. > > You should replenish _AS_ you grab RX packets > off the receive queue, just as tg3 does. Yes, in tg3, rx buffers are replenished and put back into the ring as completed packets are taken off the ring. But we don't tell the chip about these new buffers until we get to the end of the loop, potentially after a full quota of packets. Doesn't this make the end result the same as e1000? From buytenh@wantstofly.org Fri Jun 3 14:00:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:01:04 -0700 (PDT) Received: from xi.wantstofly.org (alephnull.demon.nl [212.238.201.82]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53L0nXq004267 for ; Fri, 3 Jun 2005 14:00:50 -0700 Received: by xi.wantstofly.org (Postfix, from userid 500) id 5F5B5945C8; Fri, 3 Jun 2005 22:59:45 +0200 (MEST) Date: Fri, 3 Jun 2005 22:59:45 +0200 From: Lennert Buytenhek To: Michael Chan Cc: "David S. Miller" , mitch.a.williams@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050603205944.GC20623@xi.wantstofly.org> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> <1117828169.4430.29.camel@rh4> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1117828169.4430.29.camel@rh4> User-Agent: Mutt/1.4.1i X-archive-position: 2065 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: buytenh@wantstofly.org Precedence: bulk X-list: netdev Content-Length: 873 Lines: 22 On Fri, Jun 03, 2005 at 12:49:29PM -0700, Michael Chan wrote: > > E1000 processes the full QUOTA of RX packets, > > _THEN_ replenishes with new RX buffers. No wonder > > the chip runs out of RX descriptors. > > > > You should replenish _AS_ you grab RX packets > > off the receive queue, just as tg3 does. > > Yes, in tg3, rx buffers are replenished and put back into the ring > as completed packets are taken off the ring. But we don't tell the > chip about these new buffers until we get to the end of the loop, > potentially after a full quota of packets. Which makes a lot more sense, since you'd rather do one MMIO write at the end of the loop than one per iteration, especially if your MMIO read (flush) latency is high. (Any subsequent MMIO read will have to flush out all pending writes, which'll be slow if there's a lot of writes still in the queue.) --L From edgar@edgar.se.axis.com Fri Jun 3 14:08:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:08:12 -0700 (PDT) Received: from miranda.se.axis.com (miranda.se.axis.com [193.13.178.8]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53L86Xq005258 for ; Fri, 3 Jun 2005 14:08:07 -0700 Received: from edgar.se.axis.com (edgar.se.axis.com [10.92.151.1]) by miranda.se.axis.com (8.12.9/8.12.9/Debian-5local0.1) with ESMTP id j53L71Nc014427 for ; Fri, 3 Jun 2005 23:07:01 +0200 Received: (qmail 3313 invoked by uid 400); 3 Jun 2005 23:07:01 +0200 Date: Fri, 3 Jun 2005 23:07:01 +0200 From: Edgar E Iglesias To: Lennert Buytenhek Cc: Michael Chan , "David S. Miller" , mitch.a.williams@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050603210701.GA3263@edgar.se.axis.com> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> <1117828169.4430.29.camel@rh4> <20050603205944.GC20623@xi.wantstofly.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20050603205944.GC20623@xi.wantstofly.org> User-Agent: Mutt/1.5.8i X-archive-position: 2066 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: edgar.iglesias@axis.com Precedence: bulk X-list: netdev Content-Length: 1307 Lines: 33 On Fri, Jun 03, 2005 at 10:59:45PM +0200, Lennert Buytenhek wrote: > On Fri, Jun 03, 2005 at 12:49:29PM -0700, Michael Chan wrote: > > > > E1000 processes the full QUOTA of RX packets, > > > _THEN_ replenishes with new RX buffers. No wonder > > > the chip runs out of RX descriptors. > > > > > > You should replenish _AS_ you grab RX packets > > > off the receive queue, just as tg3 does. > > > > Yes, in tg3, rx buffers are replenished and put back into the ring > > as completed packets are taken off the ring. But we don't tell the > > chip about these new buffers until we get to the end of the loop, > > potentially after a full quota of packets. > > Which makes a lot more sense, since you'd rather do one MMIO write > at the end of the loop than one per iteration, especially if your > MMIO read (flush) latency is high. (Any subsequent MMIO read will > have to flush out all pending writes, which'll be slow if there's > a lot of writes still in the queue.) > > > --L Maybe it would be better to put a fixed weight at this level, return the descriptors to the HW after every X packets. That way you can keep the NAPI weight at 64 (or what ever) and still give back descriptors to HW more often. Best regards -- Programmer Edgar E Iglesias 46.46.272.1946 From jdmason@us.ibm.com Fri Jun 3 14:13:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:13:32 -0700 (PDT) Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.132]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53LDPXq005959 for ; Fri, 3 Jun 2005 14:13:26 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e34.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j53LCR8E032746 for ; Fri, 3 Jun 2005 17:12:27 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j53LCRBm239632 for ; Fri, 3 Jun 2005 15:12:27 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j53LCQKD008289 for ; Fri, 3 Jun 2005 15:12:27 -0600 Received: from dyn95390157.austin.ibm.com (dyn95390157.austin.ibm.com [9.53.90.157]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j53LCQQY008283; Fri, 3 Jun 2005 15:12:26 -0600 From: Jon Mason Organization: IBM To: "David S. Miller" Subject: Re: RFC: NAPI packet weighting patch Date: Fri, 3 Jun 2005 16:12:10 -0500 User-Agent: KMail/1.7.2 Cc: hadi@cyberus.ca, mitch.a.williams@intel.com, john.ronciak@intel.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com References: <20050603.120126.41874584.davem@davemloft.net> <1117828771.6071.77.camel@localhost.localdomain> <20050603.133133.38710501.davem@davemloft.net> In-Reply-To: <20050603.133133.38710501.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506031612.10456.jdmason@us.ibm.com> X-archive-position: 2067 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jdmason@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 767 Lines: 19 On Friday 03 June 2005 03:31 pm, David S. Miller wrote: > From: jamal > Date: Fri, 03 Jun 2005 15:59:31 -0400 > > > But one that you could validate by putting proper hooks. As an example, > > try to restore a descriptor every time you pick one - for an example of > > this look at the sb1250 driver. > > Yes, this in my mind is exactly the problem. TG3 does this > properly, as do several other drivers. > > You should never defer RX buffer replenishment, you should > always do it as you grab packets off of the ring. You will > starve the chip otherwise. e1000 isn't the only driver to do things this way. r8169, via-velocity, dl2k, and skge (and I'm sure many more). Might be nice to perform a driver audit to see what drivers do this. From tgr@postel.suug.ch Fri Jun 3 14:15:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:15:14 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53LFAXq006409 for ; Fri, 3 Jun 2005 14:15:10 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 6AE181C0EE; Fri, 3 Jun 2005 23:14:31 +0200 (CEST) Message-Id: <20050603211241.593114000@axs> Date: Fri, 03 Jun 2005 23:12:41 +0200 From: Thomas Graf To: davem@davemloft.net Cc: netdev@oss.sgi.com Subject: [PATCHSET] PKT_SCHED related fixes and a meta ematch completion X-archive-position: 2068 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 334 Lines: 10 Dave, The following patchset fixes some serious bugs that prevent the basic classifier and the meta ematch from working properly. Patch 2 adds a few new meta collectors for socket attribtues which I'd like to have in 2.6.12 as well. If you think this is too intrusive (it isn't ;->) I'll resend patch 4 with offsets fixed. Thanks. From tgr@postel.suug.ch Fri Jun 3 14:15:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:15:19 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53LFCXq006459 for ; Fri, 3 Jun 2005 14:15:13 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 878551C0EE; Fri, 3 Jun 2005 23:14:36 +0200 (CEST) Message-Id: <20050603211315.521247000@axs> References: <20050603211241.593114000@axs> Date: Fri, 03 Jun 2005 23:12:42 +0200 From: Thomas Graf To: davem@davemloft.net Cc: netdev@oss.sgi.com Subject: [PATCH 1/4] [PKT_SCHED] Fix typo in NET_EMATCH_STACK help text Content-Disposition: inline; filename=fix_ematch_kconfig_typo X-archive-position: 2069 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 643 Lines: 20 Spotted by Geert Uytterhoeven . Signed-off-by: Thomas Graf Index: ematch/net/sched/Kconfig =================================================================== --- ematch.orig/net/sched/Kconfig +++ ematch/net/sched/Kconfig @@ -405,7 +405,7 @@ config NET_EMATCH_STACK ---help--- Size of the local stack variable used while evaluating the tree of ematches. Limits the depth of the tree, i.e. the number of - encapsulated precedences. Every level requires 4 bytes of addtional + encapsulated precedences. Every level requires 4 bytes of additional stack space. config NET_EMATCH_CMP From tgr@postel.suug.ch Fri Jun 3 14:15:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:15:22 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53LFIXq006541 for ; Fri, 3 Jun 2005 14:15:18 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 99C361C0EE; Fri, 3 Jun 2005 23:14:41 +0200 (CEST) Message-Id: <20050603211315.677553000@axs> References: <20050603211241.593114000@axs> Date: Fri, 03 Jun 2005 23:12:43 +0200 From: Thomas Graf To: davem@davemloft.net Cc: netdev@oss.sgi.com Subject: [PATCH 2/4] [PKT_SCHED] Allow socket attributes to be matched on via meta ematch Content-Disposition: inline; filename=ematch_meta_sk X-archive-position: 2070 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 10978 Lines: 385 Adds meta collectors for all socket attributes that make sense to be filtered upon. Some of them are only useful for debugging but having them doesn't hurt. Signed-off-by: Thomas Graf Index: ematch/net/sched/em_meta.c =================================================================== --- ematch.orig/net/sched/em_meta.c +++ ematch/net/sched/em_meta.c @@ -32,7 +32,7 @@ * +-----------+ +-----------+ * | | * ---> meta_ops[INT][INDEV](...) | - * | | + * | | * ----------- | * V V * +-----------+ +-----------+ @@ -70,6 +70,7 @@ #include #include #include +#include struct meta_obj { @@ -284,6 +285,214 @@ META_COLLECTOR(int_rtiif) } /************************************************************************** + * Socket Attributes + **************************************************************************/ + +#define SKIP_NONLOCAL(skb) \ + if (unlikely(skb->sk == NULL)) { \ + *err = -1; \ + return; \ + } + +META_COLLECTOR(int_sk_family) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_family; +} + +META_COLLECTOR(int_sk_state) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_state; +} + +META_COLLECTOR(int_sk_reuse) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_reuse; +} + +META_COLLECTOR(int_sk_bound_if) +{ + SKIP_NONLOCAL(skb); + /* No error if bound_dev_if is 0, legal userspace check */ + dst->value = skb->sk->sk_bound_dev_if; +} + +META_COLLECTOR(var_sk_bound_if) +{ + SKIP_NONLOCAL(skb); + + if (skb->sk->sk_bound_dev_if == 0) { + dst->value = (unsigned long) "any"; + dst->len = 3; + } else { + struct net_device *dev; + + dev = dev_get_by_index(skb->sk->sk_bound_dev_if); + *err = var_dev(dev, dst); + if (dev) + dev_put(dev); + } +} + +META_COLLECTOR(int_sk_refcnt) +{ + SKIP_NONLOCAL(skb); + dst->value = atomic_read(&skb->sk->sk_refcnt); +} + +META_COLLECTOR(int_sk_rcvbuf) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_rcvbuf; +} + +META_COLLECTOR(int_sk_shutdown) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_shutdown; +} + +META_COLLECTOR(int_sk_proto) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_protocol; +} + +META_COLLECTOR(int_sk_type) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_type; +} + +META_COLLECTOR(int_sk_rmem_alloc) +{ + SKIP_NONLOCAL(skb); + dst->value = atomic_read(&skb->sk->sk_rmem_alloc); +} + +META_COLLECTOR(int_sk_wmem_alloc) +{ + SKIP_NONLOCAL(skb); + dst->value = atomic_read(&skb->sk->sk_wmem_alloc); +} + +META_COLLECTOR(int_sk_omem_alloc) +{ + SKIP_NONLOCAL(skb); + dst->value = atomic_read(&skb->sk->sk_omem_alloc); +} + +META_COLLECTOR(int_sk_rcv_qlen) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_receive_queue.qlen; +} + +META_COLLECTOR(int_sk_snd_qlen) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_write_queue.qlen; +} + +META_COLLECTOR(int_sk_wmem_queued) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_wmem_queued; +} + +META_COLLECTOR(int_sk_fwd_alloc) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_forward_alloc; +} + +META_COLLECTOR(int_sk_sndbuf) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_sndbuf; +} + +META_COLLECTOR(int_sk_alloc) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_allocation; +} + +META_COLLECTOR(int_sk_route_caps) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_route_caps; +} + +META_COLLECTOR(int_sk_hashent) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_hashent; +} + +META_COLLECTOR(int_sk_lingertime) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_lingertime / HZ; +} + +META_COLLECTOR(int_sk_err_qlen) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_error_queue.qlen; +} + +META_COLLECTOR(int_sk_ack_bl) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_ack_backlog; +} + +META_COLLECTOR(int_sk_max_ack_bl) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_max_ack_backlog; +} + +META_COLLECTOR(int_sk_prio) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_priority; +} + +META_COLLECTOR(int_sk_rcvlowat) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_rcvlowat; +} + +META_COLLECTOR(int_sk_rcvtimeo) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_rcvtimeo / HZ; +} + +META_COLLECTOR(int_sk_sndtimeo) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_sndtimeo / HZ; +} + +META_COLLECTOR(int_sk_sendmsg_off) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_sndmsg_off; +} + +META_COLLECTOR(int_sk_write_pend) +{ + SKIP_NONLOCAL(skb); + dst->value = skb->sk->sk_write_pending; +} + +/************************************************************************** * Meta value collectors assignment table **************************************************************************/ @@ -293,41 +502,75 @@ struct meta_ops struct meta_value *, struct meta_obj *, int *); }; +#define META_ID(name) TCF_META_ID_##name +#define META_FUNC(name) { .get = meta_##name } + /* Meta value operations table listing all meta value collectors and * assigns them to a type and meta id. */ static struct meta_ops __meta_ops[TCF_META_TYPE_MAX+1][TCF_META_ID_MAX+1] = { [TCF_META_TYPE_VAR] = { - [TCF_META_ID_DEV] = { .get = meta_var_dev }, - [TCF_META_ID_INDEV] = { .get = meta_var_indev }, - [TCF_META_ID_REALDEV] = { .get = meta_var_realdev } + [META_ID(DEV)] = META_FUNC(var_dev), + [META_ID(INDEV)] = META_FUNC(var_indev), + [META_ID(REALDEV)] = META_FUNC(var_realdev), + [META_ID(SK_BOUND_IF)] = META_FUNC(var_sk_bound_if), }, [TCF_META_TYPE_INT] = { - [TCF_META_ID_RANDOM] = { .get = meta_int_random }, - [TCF_META_ID_LOADAVG_0] = { .get = meta_int_loadavg_0 }, - [TCF_META_ID_LOADAVG_1] = { .get = meta_int_loadavg_1 }, - [TCF_META_ID_LOADAVG_2] = { .get = meta_int_loadavg_2 }, - [TCF_META_ID_DEV] = { .get = meta_int_dev }, - [TCF_META_ID_INDEV] = { .get = meta_int_indev }, - [TCF_META_ID_REALDEV] = { .get = meta_int_realdev }, - [TCF_META_ID_PRIORITY] = { .get = meta_int_priority }, - [TCF_META_ID_PROTOCOL] = { .get = meta_int_protocol }, - [TCF_META_ID_SECURITY] = { .get = meta_int_security }, - [TCF_META_ID_PKTTYPE] = { .get = meta_int_pkttype }, - [TCF_META_ID_PKTLEN] = { .get = meta_int_pktlen }, - [TCF_META_ID_DATALEN] = { .get = meta_int_datalen }, - [TCF_META_ID_MACLEN] = { .get = meta_int_maclen }, + [META_ID(RANDOM)] = META_FUNC(int_random), + [META_ID(LOADAVG_0)] = META_FUNC(int_loadavg_0), + [META_ID(LOADAVG_1)] = META_FUNC(int_loadavg_1), + [META_ID(LOADAVG_2)] = META_FUNC(int_loadavg_2), + [META_ID(DEV)] = META_FUNC(int_dev), + [META_ID(INDEV)] = META_FUNC(int_indev), + [META_ID(REALDEV)] = META_FUNC(int_realdev), + [META_ID(PRIORITY)] = META_FUNC(int_priority), + [META_ID(PROTOCOL)] = META_FUNC(int_protocol), + [META_ID(SECURITY)] = META_FUNC(int_security), + [META_ID(PKTTYPE)] = META_FUNC(int_pkttype), + [META_ID(PKTLEN)] = META_FUNC(int_pktlen), + [META_ID(DATALEN)] = META_FUNC(int_datalen), + [META_ID(MACLEN)] = META_FUNC(int_maclen), #ifdef CONFIG_NETFILTER - [TCF_META_ID_NFMARK] = { .get = meta_int_nfmark }, + [META_ID(NFMARK)] = META_FUNC(int_nfmark), #endif - [TCF_META_ID_TCINDEX] = { .get = meta_int_tcindex }, + [META_ID(TCINDEX)] = META_FUNC(int_tcindex), #ifdef CONFIG_NET_CLS_ACT - [TCF_META_ID_TCVERDICT] = { .get = meta_int_tcverd }, - [TCF_META_ID_TCCLASSID] = { .get = meta_int_tcclassid }, + [META_ID(TCVERDICT)] = META_FUNC(int_tcverd), + [META_ID(TCCLASSID)] = META_FUNC(int_tcclassid), #endif #ifdef CONFIG_NET_CLS_ROUTE - [TCF_META_ID_RTCLASSID] = { .get = meta_int_rtclassid }, + [META_ID(RTCLASSID)] = META_FUNC(int_rtclassid), #endif - [TCF_META_ID_RTIIF] = { .get = meta_int_rtiif } + [META_ID(RTIIF)] = META_FUNC(int_rtiif), + [META_ID(SK_FAMILY)] = META_FUNC(int_sk_family), + [META_ID(SK_STATE)] = META_FUNC(int_sk_state), + [META_ID(SK_REUSE)] = META_FUNC(int_sk_reuse), + [META_ID(SK_BOUND_IF)] = META_FUNC(int_sk_bound_if), + [META_ID(SK_REFCNT)] = META_FUNC(int_sk_refcnt), + [META_ID(SK_RCVBUF)] = META_FUNC(int_sk_rcvbuf), + [META_ID(SK_SNDBUF)] = META_FUNC(int_sk_sndbuf), + [META_ID(SK_SHUTDOWN)] = META_FUNC(int_sk_shutdown), + [META_ID(SK_PROTO)] = META_FUNC(int_sk_proto), + [META_ID(SK_TYPE)] = META_FUNC(int_sk_type), + [META_ID(SK_RMEM_ALLOC)] = META_FUNC(int_sk_rmem_alloc), + [META_ID(SK_WMEM_ALLOC)] = META_FUNC(int_sk_wmem_alloc), + [META_ID(SK_OMEM_ALLOC)] = META_FUNC(int_sk_omem_alloc), + [META_ID(SK_WMEM_QUEUED)] = META_FUNC(int_sk_wmem_queued), + [META_ID(SK_RCV_QLEN)] = META_FUNC(int_sk_rcv_qlen), + [META_ID(SK_SND_QLEN)] = META_FUNC(int_sk_snd_qlen), + [META_ID(SK_ERR_QLEN)] = META_FUNC(int_sk_err_qlen), + [META_ID(SK_FORWARD_ALLOCS)] = META_FUNC(int_sk_fwd_alloc), + [META_ID(SK_ALLOCS)] = META_FUNC(int_sk_alloc), + [META_ID(SK_ROUTE_CAPS)] = META_FUNC(int_sk_route_caps), + [META_ID(SK_HASHENT)] = META_FUNC(int_sk_hashent), + [META_ID(SK_LINGERTIME)] = META_FUNC(int_sk_lingertime), + [META_ID(SK_ACK_BACKLOG)] = META_FUNC(int_sk_ack_bl), + [META_ID(SK_MAX_ACK_BACKLOG)] = META_FUNC(int_sk_max_ack_bl), + [META_ID(SK_PRIO)] = META_FUNC(int_sk_prio), + [META_ID(SK_RCVLOWAT)] = META_FUNC(int_sk_rcvlowat), + [META_ID(SK_RCVTIMEO)] = META_FUNC(int_sk_rcvtimeo), + [META_ID(SK_SNDTIMEO)] = META_FUNC(int_sk_sndtimeo), + [META_ID(SK_SENDMSG_OFF)] = META_FUNC(int_sk_sendmsg_off), + [META_ID(SK_WRITE_PENDING)] = META_FUNC(int_sk_write_pend), } }; Index: ematch/include/linux/tc_ematch/tc_em_meta.h =================================================================== --- ematch.orig/include/linux/tc_ematch/tc_em_meta.h +++ ematch/include/linux/tc_ematch/tc_em_meta.h @@ -56,6 +56,36 @@ enum TCF_META_ID_TCCLASSID, TCF_META_ID_RTCLASSID, TCF_META_ID_RTIIF, + TCF_META_ID_SK_FAMILY, + TCF_META_ID_SK_STATE, + TCF_META_ID_SK_REUSE, + TCF_META_ID_SK_BOUND_IF, + TCF_META_ID_SK_REFCNT, + TCF_META_ID_SK_SHUTDOWN, + TCF_META_ID_SK_PROTO, + TCF_META_ID_SK_TYPE, + TCF_META_ID_SK_RCVBUF, + TCF_META_ID_SK_RMEM_ALLOC, + TCF_META_ID_SK_WMEM_ALLOC, + TCF_META_ID_SK_OMEM_ALLOC, + TCF_META_ID_SK_WMEM_QUEUED, + TCF_META_ID_SK_RCV_QLEN, + TCF_META_ID_SK_SND_QLEN, + TCF_META_ID_SK_ERR_QLEN, + TCF_META_ID_SK_FORWARD_ALLOCS, + TCF_META_ID_SK_SNDBUF, + TCF_META_ID_SK_ALLOCS, + TCF_META_ID_SK_ROUTE_CAPS, + TCF_META_ID_SK_HASHENT, + TCF_META_ID_SK_LINGERTIME, + TCF_META_ID_SK_ACK_BACKLOG, + TCF_META_ID_SK_MAX_ACK_BACKLOG, + TCF_META_ID_SK_PRIO, + TCF_META_ID_SK_RCVLOWAT, + TCF_META_ID_SK_RCVTIMEO, + TCF_META_ID_SK_SNDTIMEO, + TCF_META_ID_SK_SENDMSG_OFF, + TCF_META_ID_SK_WRITE_PENDING, __TCF_META_ID_MAX }; #define TCF_META_ID_MAX (__TCF_META_ID_MAX - 1) From tgr@postel.suug.ch Fri Jun 3 14:15:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:15:31 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53LFNXq006648 for ; Fri, 3 Jun 2005 14:15:23 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id ABF7A1C0EE; Fri, 3 Jun 2005 23:14:46 +0200 (CEST) Message-Id: <20050603211315.818843000@axs> References: <20050603211241.593114000@axs> Date: Fri, 03 Jun 2005 23:12:44 +0200 From: Thomas Graf To: davem@davemloft.net Cc: netdev@oss.sgi.com Subject: [PATCH 3/4] [PKT_SCHED] Dump classification result for basic classifier Content-Disposition: inline; filename=cls_basic_dump_classid X-archive-position: 2071 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 581 Lines: 19 Signed-off-by: Thomas Graf Index: ematch/net/sched/cls_basic.c =================================================================== --- ematch.orig/net/sched/cls_basic.c +++ ematch/net/sched/cls_basic.c @@ -261,6 +261,9 @@ static int basic_dump(struct tcf_proto * rta = (struct rtattr *) b; RTA_PUT(skb, TCA_OPTIONS, 0, NULL); + if (f->res.classid) + RTA_PUT_U32(skb, TCA_BASIC_CLASSID, f->res.classid); + if (tcf_exts_dump(skb, &f->exts, &basic_ext_map) < 0 || tcf_em_tree_dump(skb, &f->ematches, TCA_BASIC_EMATCHES) < 0) goto rtattr_failure; From tgr@postel.suug.ch Fri Jun 3 14:15:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:15:34 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53LFSXq006744 for ; Fri, 3 Jun 2005 14:15:29 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id CA15F1C0EE; Fri, 3 Jun 2005 23:14:51 +0200 (CEST) Message-Id: <20050603211315.972265000@axs> References: <20050603211241.593114000@axs> Date: Fri, 03 Jun 2005 23:12:45 +0200 From: Thomas Graf To: davem@davemloft.net Cc: netdev@oss.sgi.com Subject: [PATCH 4/4] [PKT_SCHED] Fix numeric comparison in meta ematch Content-Disposition: inline; filename=meta_compare_fix X-archive-position: 2072 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 648 Lines: 23 This patch is brought to you by the department of applied stupidity. Signed-off-by: Thomas Graf Index: ematch/net/sched/em_meta.c =================================================================== --- ematch.orig/net/sched/em_meta.c +++ ematch/net/sched/em_meta.c @@ -639,9 +639,9 @@ static int meta_int_compare(struct meta_ /* Let gcc optimize it, the unlikely is not really based on * some numbers but jump free code for mismatches seems * more logical. */ - if (unlikely(a == b)) + if (unlikely(a->value == b->value)) return 0; - else if (a < b) + else if (a->value < b->value) return -1; else return 1; From mchan@broadcom.com Fri Jun 3 14:34:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:34:36 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53LYUXq009935 for ; Fri, 3 Jun 2005 14:34:30 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Fri, 03 Jun 2005 14:33:11 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Fri, 3 Jun 2005 14:33:10 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BCB13804; Fri, 3 Jun 2005 14:32:57 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id OAA10245; Fri, 3 Jun 2005 14:32:57 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Fri, 3 Jun 2005 21:32:56 +0000 Received: from rh4 by nt-irva-0741; 03 Jun 2005 13:35:22 -0700 Subject: Re: RFC: NAPI packet weighting patch From: "Michael Chan" To: "Lennert Buytenhek" cc: "David S. Miller" , mitch.a.williams@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <20050603205944.GC20623@xi.wantstofly.org> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> <1117828169.4430.29.camel@rh4> <20050603205944.GC20623@xi.wantstofly.org> Date: Fri, 03 Jun 2005 13:35:22 -0700 Message-ID: <1117830922.4430.44.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EBE131D1VO5004533-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2073 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 1119 Lines: 22 On Fri, 2005-06-03 at 22:59 +0200, Lennert Buytenhek wrote: > On Fri, Jun 03, 2005 at 12:49:29PM -0700, Michael Chan wrote: > > > Yes, in tg3, rx buffers are replenished and put back into the ring > > as completed packets are taken off the ring. But we don't tell the > > chip about these new buffers until we get to the end of the loop, > > potentially after a full quota of packets. > > Which makes a lot more sense, since you'd rather do one MMIO write > at the end of the loop than one per iteration, especially if your > MMIO read (flush) latency is high. (Any subsequent MMIO read will > have to flush out all pending writes, which'll be slow if there's > a lot of writes still in the queue.) > I agree on the merit of issuing only one IO at the end. What I'm saying is that doing so will make it similar to e1000 where all the buffers are replenished at the end. Isn't that so or am I missing something? By the way, in tg3 there is a buffer replenishment threshold programmed to the chip and is currently set at rx_pending / 8 (200/8 = 25). This means that the chip will replenish 25 rx buffers at a time. From shemminger@osdl.org Fri Jun 3 14:38:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 14:38:07 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53Lc5Xq010567 for ; Fri, 3 Jun 2005 14:38:05 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j53Lb3jA000331 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 3 Jun 2005 14:37:03 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j53Lb2gH015308; Fri, 3 Jun 2005 14:37:02 -0700 Date: Fri, 3 Jun 2005 14:37:02 -0700 From: Stephen Hemminger To: Adrian Bunk , Baruch Even Cc: Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 2.6.12-rc5-mm2: "bic unavailable using TCP reno" messages Message-ID: <20050603143702.0422101d@dxpl.pdx.osdl.net> In-Reply-To: <20050602203823.GI4992@stusta.de> References: <20050601022824.33c8206e.akpm@osdl.org> <20050602121511.GE4992@stusta.de> <429F1079.5070701@ev-en.org> <20050602103805.6beb4f4e@dxpl.pdx.osdl.net> <20050602203823.GI4992@stusta.de> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2074 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 6680 Lines: 232 Here is what I am working on as better way to make the sysctl selection. I am not totally happy with the way the default congestion control value is determined by the load order. But it does seem good that if you load "tcp_xxx" module and it registers it becomes the default. Index: 2.6.12-rc5-tcp3/include/net/tcp.h =================================================================== --- 2.6.12-rc5-tcp3.orig/include/net/tcp.h +++ 2.6.12-rc5-tcp3/include/net/tcp.h @@ -1242,6 +1242,8 @@ extern int tcp_register_congestion_contr extern void tcp_unregister_congestion_control(struct tcp_congestion_ops *type); extern void tcp_init_congestion_control(struct tcp_sock *tp); extern void tcp_release_congestion_control(struct tcp_sock *tp); +extern int tcp_set_congestion_control(const char *name); +extern void tcp_get_congestion_control(char *name); extern struct tcp_congestion_ops tcp_reno; extern u32 tcp_reno_ssthresh(struct tcp_sock *tp); Index: 2.6.12-rc5-tcp3/net/ipv4/tcp_cong.c =================================================================== --- 2.6.12-rc5-tcp3.orig/net/ipv4/tcp_cong.c +++ 2.6.12-rc5-tcp3/net/ipv4/tcp_cong.c @@ -13,8 +13,6 @@ #include #include -char sysctl_tcp_congestion_control[TCP_CA_NAME_MAX] = "bic"; - static DEFINE_SPINLOCK(tcp_cong_list_lock); static LIST_HEAD(tcp_cong_list); @@ -23,7 +21,7 @@ static struct tcp_congestion_ops *tcp_ca { struct tcp_congestion_ops *e; - list_for_each_entry_rcu(e, &tcp_cong_list, list) { + list_for_each_entry(e, &tcp_cong_list, list) { if (strcmp(e->name, name) == 0) return e; } @@ -46,7 +44,7 @@ int tcp_register_congestion_control(stru return -EINVAL; } - spin_lock_irq(&tcp_cong_list_lock); + spin_lock(&tcp_cong_list_lock); if (tcp_ca_find(ca->name)) { printk(KERN_NOTICE "TCP %s already registered\n", ca->name); ret = -EEXIST; @@ -54,7 +52,7 @@ int tcp_register_congestion_control(stru list_add_rcu(&ca->list, &tcp_cong_list); printk(KERN_INFO "TCP %s registered\n", ca->name); } - spin_unlock_irq(&tcp_cong_list_lock); + spin_unlock(&tcp_cong_list_lock); return ret; } @@ -69,7 +67,6 @@ EXPORT_SYMBOL_GPL(tcp_register_congestio void tcp_unregister_congestion_control(struct tcp_congestion_ops *ca) { spin_lock(&tcp_cong_list_lock); - BUG_ON(!tcp_ca_find(ca->name)); list_del_rcu(&ca->list); spin_unlock(&tcp_cong_list_lock); } @@ -78,34 +75,22 @@ EXPORT_SYMBOL_GPL(tcp_unregister_congest /* Assign choice of congestion control. */ void tcp_init_congestion_control(struct tcp_sock *tp) { - const char *cong_proto = sysctl_tcp_congestion_control; struct tcp_congestion_ops *ca; rcu_read_lock(); - ca = tcp_ca_find(cong_proto); -#ifdef CONFIG_KMOD - if (!ca) { - /* autoload and try again */ - rcu_read_unlock(); - request_module("tcp_%s", cong_proto); - rcu_read_lock(); - - ca = tcp_ca_find(cong_proto); - } -#endif - - /* If selection doesn't exist or is being removed use Reno */ - if (!ca || !try_module_get(ca->owner)) { - if (net_ratelimit()) - printk(KERN_WARNING "%s unavailable using TCP reno\n", - cong_proto); - ca = &tcp_reno; - } - tp->ca_ops = ca; - rcu_read_unlock(); + tp->ca_ops = NULL; + list_for_each_entry_rcu(ca, &tcp_cong_list, list) { + if (try_module_get(ca->owner)) { + tp->ca_ops = ca; + break; + } - if (ca->init) - ca->init(tp); + } + + /* We will always have reno to fallback on. */ + if (tp->ca_ops->init) + tp->ca_ops->init(tp); + rcu_read_unlock(); } EXPORT_SYMBOL(tcp_init_congestion_control); @@ -122,6 +107,36 @@ void tcp_release_congestion_control(stru } } +/* Used by sysctl to change default congestion control */ +int tcp_set_congestion_control(const char *name) +{ + struct tcp_congestion_ops *ca; + int ret = -ENOENT; + + spin_lock(&tcp_cong_list_lock); + ca = tcp_ca_find(name); + if (ca) { + list_move(&ca->list, &tcp_cong_list); + ret = 0; + } + spin_unlock(&tcp_cong_list_lock); + + return ret; +} + +/* Get current default congestion control */ +void tcp_get_congestion_control(char *name) +{ + struct tcp_congestion_ops *ca; + /* We will always have reno... */ + BUG_ON(list_empty(&tcp_cong_list)); + + rcu_read_lock(); + ca = list_entry(tcp_cong_list.next, struct tcp_congestion_ops, list); + strncpy(name, ca->name, TCP_CA_NAME_MAX); + rcu_read_lock(); +} + /* * TCP Reno congestion control * This is special case used for fallback as well. Index: 2.6.12-rc5-tcp3/net/ipv4/sysctl_net_ipv4.c =================================================================== --- 2.6.12-rc5-tcp3.orig/net/ipv4/sysctl_net_ipv4.c +++ 2.6.12-rc5-tcp3/net/ipv4/sysctl_net_ipv4.c @@ -48,9 +48,6 @@ extern int inet_peer_maxttl; extern int inet_peer_gc_mintime; extern int inet_peer_gc_maxtime; -/* From tcp_input.c */ -extern char sysctl_tcp_congestion_control[TCP_CA_NAME_MAX]; - #ifdef CONFIG_SYSCTL static int tcp_retr1_max = 255; static int ip_local_port_range_min[] = { 1, 1 }; @@ -120,6 +117,52 @@ static int ipv4_sysctl_forward_strategy( return 1; } +static int proc_tcp_congestion_control(ctl_table *ctl, int write, struct file * filp, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + char val[TCP_CA_NAME_MAX]; + ctl_table tbl = { + .data = val, + .maxlen = TCP_CA_NAME_MAX, + }; + int ret; + + tcp_get_congestion_control(val); + + ret = proc_dostring(&tbl, write, filp, buffer, lenp, ppos); + if (write && ret == 0) { + ret = tcp_set_congestion_control(val); +#ifdef CONFIG_KMOD + if (ret == -ENOENT) { + request_module("tcp_%s", val); + ret = tcp_set_congestion_control(val); + } +#endif + } + return ret; +} + +int sysctl_tcp_congestion_control(ctl_table *table, int __user *name, int nlen, + void __user *oldval, size_t __user *oldlenp, + void __user *newval, size_t newlen, + void **context) +{ + char val[TCP_CA_NAME_MAX]; + ctl_table tbl = { + .data = val, + .maxlen = TCP_CA_NAME_MAX, + }; + int ret; + + tcp_get_congestion_control(val); + ret = sysctl_string(&tbl, name, nlen, oldval, oldlenp, newval, newlen, + context); + if (ret == 0 && newval && newlen) + ret = tcp_set_congestion_control(val); + return ret; +} + + ctl_table ipv4_table[] = { { .ctl_name = NET_IPV4_TCP_TIMESTAMPS, @@ -624,11 +667,10 @@ ctl_table ipv4_table[] = { { .ctl_name = NET_TCP_CONG_CONTROL, .procname = "tcp_congestion_control", - .data = &sysctl_tcp_congestion_control, - .maxlen = TCP_CA_NAME_MAX, .mode = 0644, - .proc_handler = &proc_dostring, - .strategy = &sysctl_string, + .maxlen = TCP_CA_NAME_MAX, + .proc_handler = &proc_tcp_congestion_control, + .strategy = &sysctl_tcp_congestion_control, }, { .ctl_name = 0 } From hadi@cyberus.ca Fri Jun 3 15:31:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 15:31:42 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53MVUXq013872 for ; Fri, 3 Jun 2005 15:31:31 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DeKgW-0004KQ-O0 for netdev@oss.sgi.com; Fri, 03 Jun 2005 18:30:36 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DeKgR-0007c5-Gc; Fri, 03 Jun 2005 18:30:31 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Michael Chan Cc: Lennert Buytenhek , "David S. Miller" , mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <1117830922.4430.44.camel@rh4> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> <1117828169.4430.29.camel@rh4> <20050603205944.GC20623@xi.wantstofly.org> <1117830922.4430.44.camel@rh4> Content-Type: text/plain Organization: unknown Date: Fri, 03 Jun 2005 18:29:58 -0400 Message-Id: <1117837798.6266.25.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2075 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 1139 Lines: 29 On Fri, 2005-03-06 at 13:35 -0700, Michael Chan wrote: > On Fri, 2005-06-03 at 22:59 +0200, Lennert Buytenhek wrote: > > Which makes a lot more sense, since you'd rather do one MMIO write > > at the end of the loop than one per iteration, especially if your > > MMIO read (flush) latency is high. (Any subsequent MMIO read will > > have to flush out all pending writes, which'll be slow if there's > > a lot of writes still in the queue.) > > > I agree on the merit of issuing only one IO at the end. What I'm saying > is that doing so will make it similar to e1000 where all the buffers are > replenished at the end. Isn't that so or am I missing something? > I think the main issue would be a lot less CPU used in your case (because of the single MMIO). > By the way, in tg3 there is a buffer replenishment threshold programmed > to the chip and is currently set at rx_pending / 8 (200/8 = 25). This > means that the chip will replenish 25 rx buffers at a time. > So when you write the MMIO, 25 buffers are replenished or is this auto magically happening in the background? Sounds like a neat feature either way. cheers, jamal From baruch@ev-en.org Fri Jun 3 15:33:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 15:33:33 -0700 (PDT) Received: from galon.ev-en.org (rrcs-24-123-59-149.central.biz.rr.com [24.123.59.149]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53MXUXq014198 for ; Fri, 3 Jun 2005 15:33:30 -0700 Received: by galon.ev-en.org (Postfix, from userid 105) id 176AF11A953; Sat, 4 Jun 2005 01:32:30 +0300 (IDT) Received: from [10.220.3.66] (hamilton.nuim.ie [149.157.192.252]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by galon.ev-en.org (Postfix) with ESMTP id 52CA111A951; Sat, 4 Jun 2005 01:32:25 +0300 (IDT) Message-ID: <42A0DA78.2040804@ev-en.org> Date: Fri, 03 Jun 2005 23:32:24 +0100 From: Baruch Even User-Agent: Debian Thunderbird 1.0.2 (X11/20050331) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Stephen Hemminger Cc: Adrian Bunk , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 2.6.12-rc5-mm2: "bic unavailable using TCP reno" messages References: <20050601022824.33c8206e.akpm@osdl.org> <20050602121511.GE4992@stusta.de> <429F1079.5070701@ev-en.org> <20050602103805.6beb4f4e@dxpl.pdx.osdl.net> <20050602203823.GI4992@stusta.de> <20050603143702.0422101d@dxpl.pdx.osdl.net> In-Reply-To: <20050603143702.0422101d@dxpl.pdx.osdl.net> X-Enigmail-Version: 0.91.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-archive-position: 2076 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: baruch@ev-en.org Precedence: bulk X-list: netdev Content-Length: 1091 Lines: 33 Stephen Hemminger wrote: > Here is what I am working on as better way to make the sysctl selection. > I am not totally happy with the way the default congestion control value is determined > by the load order. But it does seem good that if you load "tcp_xxx" module and it > registers it becomes the default. Looks good. > @@ -120,6 +117,52 @@ static int ipv4_sysctl_forward_strategy( > return 1; > } > > +static int proc_tcp_congestion_control(ctl_table *ctl, int write, struct file * filp, > + void __user *buffer, size_t *lenp, loff_t *ppos) > +{ > + char val[TCP_CA_NAME_MAX]; > + ctl_table tbl = { > + .data = val, > + .maxlen = TCP_CA_NAME_MAX, > + }; > + int ret; > + > + tcp_get_congestion_control(val); Maybe we should call this tcp_get_current_congestion_control(), the current name implies (to me) that you give it a name and it returns the the ca struct. get_current might also just return the current one and the strcpy can be done here. Otherwise you probably should document the tcp_get_congestion_control() to say what size of string it accepts. Baruch From mmporter@cox.net Fri Jun 3 15:44:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 15:44:32 -0700 (PDT) Received: from fed1rmmtao04.cox.net (fed1rmmtao04.cox.net [68.230.241.35]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53MiSXq015431 for ; Fri, 3 Jun 2005 15:44:28 -0700 Received: from liberty.homelinux.org ([68.2.41.86]) by fed1rmmtao04.cox.net (InterMail vM.6.01.04.00 201-2131-118-20041027) with ESMTP id <20050603224327.ZYDV23392.fed1rmmtao04.cox.net@liberty.homelinux.org>; Fri, 3 Jun 2005 18:43:27 -0400 Received: (from mmporter@localhost) by liberty.homelinux.org (8.9.3/8.9.3/Debian 8.9.3-21) id PAA01451; Fri, 3 Jun 2005 15:43:25 -0700 Date: Fri, 3 Jun 2005 15:43:25 -0700 From: Matt Porter To: Stephen Hemminger Cc: torvalds@osdl.org, akpm@osdl.org, jgarzik@pobox.com, linux-kernel@vger.kernel.org, linuxppc-embedded@ozlabs.org, netdev@oss.sgi.com Subject: Re: [PATCH][5/5] RapidIO support: net driver over messaging Message-ID: <20050603154324.I32392@cox.net> References: <20050602140359.B24818@cox.net> <20050602141247.C24818@cox.net> <20050602141946.D24818@cox.net> <20050602142509.E24818@cox.net> <20050602143404.F24818@cox.net> <20050602150543.7e4326b6@dxpl.pdx.osdl.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20050602150543.7e4326b6@dxpl.pdx.osdl.net>; from shemminger@osdl.org on Thu, Jun 02, 2005 at 03:05:43PM -0700 X-archive-position: 2077 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mporter@kernel.crashing.org Precedence: bulk X-list: netdev Content-Length: 5146 Lines: 155 On Thu, Jun 02, 2005 at 03:05:43PM -0700, Stephen Hemminger wrote: > How much is this like ethernet? does it still do ARP? It's nothing like Ethernet, the only relation is that an Ethernet network driver is easy to implement over top of raw message ports on a switched fabric network. It gives easy access to RIO messaging from userspace without inventing a new interface. ARP works by the driver emulating a broadcast over RIO by sending the same ARP packet to each node that is participating in the rionet. Nodes join/leave the rionet by sending RIO-specific doorbell messages to potential participants on the switched fabric. A table is kept to flag active participants such that a fast lookup can be made to translate the dst MAC address to a RIO device struct that is used to actually send the Ethernet packet encapsulated into a standard RIO message to the appropriate node(s). > Can it do promiscious receive? No. > > +LIST_HEAD(rionet_peers); > > Does this have to be global? Nope, should be static. Fixing. > Not sure about the locking of this stuff, are you > relying on the RTNL? Yes, last I looked that was sufficient for all the entry points. I protect the driver-specific data (tx skb rings, etc.) with a private lock. > > + > > +static int rionet_change_mtu(struct net_device *ndev, int new_mtu) > > +{ > > + struct rionet_private *rnet = ndev->priv; > > + > > + if (netif_msg_drv(rnet)) > > + printk(KERN_WARNING > > + "%s: rionet_change_mtu(): not implemented\n", DRV_NAME); > > + > > + return 0; > > +} > > If you can allow any mtu then don't need this at all. > Or if you are limited then better return an error for bad values. Ok, I do have a upper limit of 4082 as the RIO messages have a max 4096 byte payload. That's the default on open as well. I'll fix this up. > > +static void rionet_set_multicast_list(struct net_device *ndev) > > +{ > > + struct rionet_private *rnet = ndev->priv; > > + > > + if (netif_msg_drv(rnet)) > > + printk(KERN_WARNING > > + "%s: rionet_set_multicast_list(): not implemented\n", > > + DRV_NAME); > > +} > > If you can't handle it then just leave dev->set_multicast_list > as NULL and all attempts to add or delete will get -EINVAL Will do. It was a placeholder at one point when I thought I might emulate multicast in the driver...it's fallen down my priority list. > > + > > +static int rionet_open(struct net_device *ndev) > > +{ > > > > + /* Initialize inbound message ring */ > > + for (i = 0; i < RIONET_RX_RING_SIZE; i++) > > + rnet->rx_skb[i] = NULL; > > + rnet->rx_slot = 0; > > + rionet_rx_fill(ndev, 0); > > + > > + rnet->tx_slot = 0; > > + rnet->tx_cnt = 0; > > + rnet->ack_slot = 0; > > + > > + spin_lock_init(&rnet->lock); > > + > > + rnet->msg_enable = RIONET_DEFAULT_MSGLEVEL; > > Better to do all initialization of the per device data > in the place it is allocated (rio_setup_netdev) Right, will do. > > +static int rionet_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd) > > +{ > > + return -EOPNOTSUPP; > > +} > > Unneeded, if dev->do_ioctl is NULL, then all private ioctl's will > return -EINVAL that is what you want. Ah, ok. Good, none of the MII stuff applies in this case. > > +static u32 rionet_get_link(struct net_device *ndev) > > +{ > > + return netif_carrier_ok(ndev); > > +} > > Use ethtool_op_get_link Ok > > + /* Fill in the driver function table */ > > + ndev->open = &rionet_open; > > + ndev->hard_start_xmit = &rionet_start_xmit; > > + ndev->stop = &rionet_close; > > + ndev->get_stats = &rionet_stats; > > + ndev->change_mtu = &rionet_change_mtu; > > + ndev->set_mac_address = &rionet_set_mac_address; > > + ndev->set_multicast_list = &rionet_set_multicast_list; > > + ndev->do_ioctl = &rionet_ioctl; > > + SET_ETHTOOL_OPS(ndev, &rionet_ethtool_ops); > > + > > + ndev->mtu = RIO_MAX_MSG_SIZE - 14; > > + > > + SET_MODULE_OWNER(ndev); > > Can you set any ndev->features to get better performance. > Can you take >32bit data addresses? then set HIGHDMA > You are doing your on locking, can you use LLTX? > Does the hardware support scatter gather? Some of these get tricky. In general, rionet could support SG and with driver help we can flag IP_CSUM. In practice, the current generation MPC85xx HW on my development system have some problems with their message port dma queues. In short, their implementation is such that the arch-specific code is forced to do a copy of the skb on both tx and rx. Because of this, adding SG/IP_CSUM doesn't have any value yet...it'll make sense to add the addtional features once we get a platform with better messaging hardware. HIGHDMA may not be suitable on all platforms. Since rionet sits on top of a hardware abstraction, it doesn't have full knowledge of the DMA capabilities of the hardware. We can eventually have some interfaces to the arch code to learn that info, but it's not there yet. I have to look into LLTX, I know what it stands for, but I'm not sure of the details. Do you have a good LLTX example reference? That said, my goal is to enable as many features as possible when we have hw to take advantage of them. -Matt From kernel@linuxace.com Fri Jun 3 16:25:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 16:25:17 -0700 (PDT) Received: from linuxace.com (adsl-67-120-171-161.dsl.lsan03.pacbell.net [67.120.171.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j53NPCXq021135 for ; Fri, 3 Jun 2005 16:25:12 -0700 Received: (qmail 29381 invoked by uid 0); 3 Jun 2005 23:24:13 -0000 Date: Fri, 3 Jun 2005 16:24:13 -0700 From: Phil Oester To: netdev@oss.sgi.com Subject: Unitialized queue_lock oops? Message-ID: <20050603232413.GA29308@linuxace.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-archive-position: 2078 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kernel@linuxace.com Precedence: bulk X-list: netdev Content-Length: 2797 Lines: 73 In my ongoing attempts to migrate to anything higher than 2.6.10, I decided to retest 2.6.11-rc2 but backout the problematic LLTX patch. I also enabled spinlock debugging, and hit an odd BUG. Full oops output below, but the summary is: kernel BUG at include/asm/spinlock.h:92! which is here: BUG_ON(lock->magic != SPINLOCK_MAGIC); And we got there via dev_queue_xmit: /* Grab device queue */ spin_lock(&dev->queue_lock); -- no complaints yet, so queue_lock must be initialized here rc = q->enqueue(skb, q); qdisc_run(dev); -- qdisc_run drops queue_lock briefly - it get mangled while it's dropped? spin_unlock(&dev->queue_lock); -- now we hit the BUG - queue_lock->magic != SPINLOCK_MAGIC. I know the proposed LLTX changes were meant to address a race while the queue_lock was dropped - is the above another illustration of the race potential? Phil kernel BUG at include/asm/spinlock.h:92! invalid operand: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 EIP: 0060:[] Not tainted VLI EFLAGS: 00010217 (2.6.11-rc2) EIP is at _spin_unlock+0x24/0x30 eax: f7ae7ec0 ebx: f6d5ff00 ecx: f6d5ffbc edx: f7ae7ec0 esi: f7ae3800 edi: c4a45f50 ebp: c0333d64 esp: c0333d64 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c0333000 task=c198aaf0) Stack: c0333d88 c023168a c0272eea f7ae3800 f7ae35bc 00000000 f590c89c f590c888 c63cc020 c0333da8 c0249873 c02497c0 f590c888 c4a45f50 00000000 00000004 00000002 c0333ddc c023b61e 00000000 f7ae3800 c0333dcc c02497c0 80000000 Call Trace: [] show_stack+0x7a/0x90 [] show_registers+0x14d/0x1b0 [] die+0xf9/0x180 [] do_invalid_op+0xa9/0xc0 [] error_code+0x2b/0x30 [] dev_queue_xmit+0x20a/0x290 [] ip_finish_output2+0xb3/0x1c0 [] nf_hook_slow+0xae/0xe0 [] ip_finish_output+0x1ee/0x200 [] ip_forward_finish+0x2c/0x50 [] nf_hook_slow+0xae/0xe0 [] ip_forward+0x19c/0x230 [] ip_rcv_finish+0x1b8/0x230 [] nf_hook_slow+0xae/0xe0 [] ip_rcv+0x3b5/0x470 [] netif_receive_skb+0x13a/0x190 [] e1000_clean_rx_irq+0x156/0x480 [] e1000_clean+0x45/0xf0 [] net_rx_action+0x90/0x130 [] __do_softirq+0xb8/0xd0 [] do_softirq+0x4d/0x60 ======================= [] do_IRQ+0x68/0xa0 [] common_interrupt+0x1a/0x20 [] cpu_idle+0x5f/0x70 [<00000000>] 0x0 [] 0xc198bfbc Code: 8d bc 27 00 00 00 00 55 89 c2 89 e5 81 78 04 ad 4e ad de 75 16 0f b6 02 84 c0 7f 05 c6 02 01 5d c3 0f 0b 5d 00 08 9b 29 c0 eb f1 <0f> 0b 5c 00 08 9b 29 c0 eb e0 89 f6 55 89 e5 f0 81 00 00 00 00 From buytenh@wantstofly.org Fri Jun 3 16:28:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 16:28:10 -0700 (PDT) Received: from xi.wantstofly.org (alephnull.demon.nl [212.238.201.82]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53NS2Xq021612 for ; Fri, 3 Jun 2005 16:28:03 -0700 Received: by xi.wantstofly.org (Postfix, from userid 500) id EA0A1945C8; Sat, 4 Jun 2005 01:26:56 +0200 (MEST) Date: Sat, 4 Jun 2005 01:26:56 +0200 From: Lennert Buytenhek To: Michael Chan Cc: "David S. Miller" , mitch.a.williams@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050603232656.GB21125@xi.wantstofly.org> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> <1117828169.4430.29.camel@rh4> <20050603205944.GC20623@xi.wantstofly.org> <1117830922.4430.44.camel@rh4> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1117830922.4430.44.camel@rh4> User-Agent: Mutt/1.4.1i X-archive-position: 2079 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: buytenh@wantstofly.org Precedence: bulk X-list: netdev Content-Length: 1520 Lines: 32 On Fri, Jun 03, 2005 at 01:35:22PM -0700, Michael Chan wrote: > > > Yes, in tg3, rx buffers are replenished and put back into the ring > > > as completed packets are taken off the ring. But we don't tell the > > > chip about these new buffers until we get to the end of the loop, > > > potentially after a full quota of packets. > > > > Which makes a lot more sense, since you'd rather do one MMIO write > > at the end of the loop than one per iteration, especially if your > > MMIO read (flush) latency is high. (Any subsequent MMIO read will > > have to flush out all pending writes, which'll be slow if there's > > a lot of writes still in the queue.) > > I agree on the merit of issuing only one IO at the end. What I'm saying > is that doing so will make it similar to e1000 where all the buffers are > replenished at the end. Isn't that so or am I missing something? I think you're right: for e1000 as well as tg3, the NIC cannot use the new RX buffers until the CPU breaks out of the poll loop. I don't understand why reducing the weight apparently makes the e1000 go faster. Perhaps as Robert said, the RX ring is not big enough and that's why feeding RX buffers back to the chip more agressively might help prevent overruns? I would say that running with a N+64-entry RX ring and a weight of 64 should not show any worse behavior than running with a N+16-entry RX ring with a weight of 16. If anything, weight=64 should show _better_ performance than weight=16. Something else must be going on. --L From buytenh@wantstofly.org Fri Jun 3 16:31:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 16:31:25 -0700 (PDT) Received: from xi.wantstofly.org (alephnull.demon.nl [212.238.201.82]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53NVLXq022336 for ; Fri, 3 Jun 2005 16:31:21 -0700 Received: by xi.wantstofly.org (Postfix, from userid 500) id 004CA945C8; Sat, 4 Jun 2005 01:30:21 +0200 (MEST) Date: Sat, 4 Jun 2005 01:30:21 +0200 From: Lennert Buytenhek To: Edgar E Iglesias Cc: Michael Chan , "David S. Miller" , mitch.a.williams@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050603233021.GC21125@xi.wantstofly.org> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> <1117828169.4430.29.camel@rh4> <20050603205944.GC20623@xi.wantstofly.org> <20050603210701.GA3263@edgar.se.axis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20050603210701.GA3263@edgar.se.axis.com> User-Agent: Mutt/1.4.1i X-archive-position: 2080 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: buytenh@wantstofly.org Precedence: bulk X-list: netdev Content-Length: 1295 Lines: 30 On Fri, Jun 03, 2005 at 11:07:01PM +0200, Edgar E Iglesias wrote: > > > Yes, in tg3, rx buffers are replenished and put back into the ring > > > as completed packets are taken off the ring. But we don't tell the > > > chip about these new buffers until we get to the end of the loop, > > > potentially after a full quota of packets. > > > > Which makes a lot more sense, since you'd rather do one MMIO write > > at the end of the loop than one per iteration, especially if your > > MMIO read (flush) latency is high. (Any subsequent MMIO read will > > have to flush out all pending writes, which'll be slow if there's > > a lot of writes still in the queue.) > > Maybe it would be better to put a fixed weight at this level, return > the descriptors to the HW after every X packets. That way you > can keep the NAPI weight at 64 (or what ever) and still give back > descriptors to HW more often. For this scheme to make any difference at all, the RX ring must be overflowing in the case where we refill the RX ring only once every 64 packets. If the RX ring _is_ overflowing but the system is otherwise capable of keeping up with the receive rate (i.e. the packet service times as seen by the NIC have a high variance), simply make the RX ring bigger. I don't see what's going on. --L From herbert@gondor.apana.org.au Fri Jun 3 16:47:43 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 16:47:51 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53NleXq023265 for ; Fri, 3 Jun 2005 16:47:42 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeLru-00089e-00; Sat, 04 Jun 2005 09:46:26 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeLrr-0005Fc-00; Sat, 04 Jun 2005 09:46:23 +1000 Date: Sat, 4 Jun 2005 09:46:23 +1000 To: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050603234623.GA20088@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2081 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1730 Lines: 45 Hi: I was looking at how we can move the IPsec input/output processing out of the critical section protected by the spin locks on the xfrm_state. This is useful because it would allow concurrent processing of IPsec packets for the same SA. It is also necessary if we're ever going to add support for asynchronous crypto to IPsec. The first requirement for this is that we need to stop using data that is shared across a single SA in the IPsec input/output routines. The biggest hurdle there as it stands is sgbuf in esp_data. This was introduced to reduce stack usage in esp_input/esp_output as sgbuf would consume up to 64 bytes of space. In order to move it back onto the stack (so we can run these things in parallel), I'm thinking of reducing the size of the scatterlist structure itself. The Crypto API doesn't need all the data contained in a scatterlist structure. For instance, it has no need for anything to do with DMA. When we implement hardware crypto (which might do DMA), they're going to have their own lists of descriptors so they can't use the scatterlist as is anyway. The skb_frag_t structure on the other hand is much more suited for our purpose. It is only half the size of scatterlist on i386. So what do you think about introducing a new crypto_frag structure which looks like this: struct crypto_frag { struct page *page; u16 offset; u16 length; }; We could then move sgbuf back into esp_input/esp_output at the cost of 32 bytes of stack. Is this stack cost acceptable? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Fri Jun 3 16:52:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 16:52:38 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j53NqXXq023918 for ; Fri, 3 Jun 2005 16:52:34 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeLwq-0008BS-00; Sat, 04 Jun 2005 09:51:32 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeLwo-0005GI-00; Sat, 04 Jun 2005 09:51:30 +1000 From: Herbert Xu To: kernel@linuxace.com (Phil Oester) Subject: Re: Unitialized queue_lock oops? Cc: netdev@oss.sgi.com Organization: Core In-Reply-To: <20050603232413.GA29308@linuxace.com> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Sat, 04 Jun 2005 09:51:30 +1000 X-archive-position: 2082 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 634 Lines: 16 Phil Oester wrote: > > I know the proposed LLTX changes were meant to address a race while > the queue_lock was dropped - is the above another illustration of the > race potential? I'd say that either you're using a dodgy qdisc, or your hardware is just stuffed. That is, if you are using the default qdisc, you should start looking at replacing pieces of the hardware to find the problem. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From kernel@linuxace.com Fri Jun 3 17:01:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 17:01:47 -0700 (PDT) Received: from linuxace.com (adsl-67-120-171-161.dsl.lsan03.pacbell.net [67.120.171.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5401iXq024784 for ; Fri, 3 Jun 2005 17:01:44 -0700 Received: (qmail 29514 invoked by uid 0); 4 Jun 2005 00:00:46 -0000 Date: Fri, 3 Jun 2005 17:00:46 -0700 From: Phil Oester To: Herbert Xu Cc: netdev@oss.sgi.com Subject: Re: Unitialized queue_lock oops? Message-ID: <20050604000046.GA29438@linuxace.com> References: <20050603232413.GA29308@linuxace.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-archive-position: 2083 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kernel@linuxace.com Precedence: bulk X-list: netdev Content-Length: 374 Lines: 9 On Sat, Jun 04, 2005 at 09:51:30AM +1000, Herbert Xu wrote: > I'd say that either you're using a dodgy qdisc, or your hardware is > just stuffed. That is, if you are using the default qdisc, you should > start looking at replacing pieces of the hardware to find the problem. Yes, default qdisc. Interesting that 2.6.10 is rock solid on the same hardware...oh well. Phil From jgarzik@pobox.com Fri Jun 3 17:03:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 17:04:01 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5403wXq025385 for ; Fri, 3 Jun 2005 17:03:58 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DeM7q-00026Z-Hb; Sat, 04 Jun 2005 00:02:54 +0000 Message-ID: <42A0EFAC.7070609@pobox.com> Date: Fri, 03 Jun 2005 20:02:52 -0400 From: Jeff Garzik User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050328 Fedora/1.7.6-1.2.5 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Herbert Xu CC: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag References: <20050603234623.GA20088@gondor.apana.org.au> In-Reply-To: <20050603234623.GA20088@gondor.apana.org.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2084 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 954 Lines: 25 Herbert Xu wrote: > The Crypto API doesn't need all the data contained in a scatterlist > structure. For instance, it has no need for anything to do with DMA. > When we implement hardware crypto (which might do DMA), they're going > to have their own lists of descriptors so they can't use the scatterlist > as is anyway. I'm not sure I agree with this. A standard feature of struct scatterlist is having the DMA mappings right next to the kernel virtual address/length info. Drivers use the arch-specific DMA-mapped part of struct scatterlist to fill the hardware-specific descriptions with addresses and other info. Since you -will- have to DMA map buffers before passing them to hardware, it seems like struct scatterlist is much more appropriate than crypto_frag when dealing with hardware. For pure software implementations, I don't see why you can't just ignore the extra fields that each arch puts into struct scatterlist. Jeff From niv@us.ibm.com Fri Jun 3 17:21:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 17:21:19 -0700 (PDT) Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.132]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j540L5Xq028640 for ; Fri, 3 Jun 2005 17:21:11 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j540K68E421694 for ; Fri, 3 Jun 2005 20:20:06 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j540K66g124576 for ; Fri, 3 Jun 2005 18:20:06 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j540K5tw020413 for ; Fri, 3 Jun 2005 18:20:05 -0600 Received: from [9.47.22.158] (dyn9047022158.beaverton.ibm.com [9.47.22.158]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j540K5mB020403 for ; Fri, 3 Jun 2005 18:20:05 -0600 Message-ID: <42A0F3B4.1060601@us.ibm.com> Date: Fri, 03 Jun 2005 17:20:04 -0700 From: Nivedita Singhvi User-Agent: Mozilla Thunderbird 0.8 (X11/20041020) X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Automated linux kernel testing results Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2085 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1458 Lines: 39 For those who don't read lkml, I thought I'd point to Martin Bligh's post regarding automated testing being set up, since some people on this list were interested. http://marc.theaimsgroup.com/?l=linux-kernel&m=111775021327595&w=2 Networking tests are in plan... thanks, Nivedita -------------------------- OK, I've finally got this to the point where I can publish it. http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/regression_matrix.html Currently it builds and boots any mainline, -mjb, -mm kernel within about 15 minutes of release. runs dbench, tbench, kernbench, reaim and fsx. Currently I'm using a 4x AMD64 box, a 16x NUMA-Q, 4x NUMA-Q, 32x x440 (ia32) PPC64 Power 5 LPAR, PPC64 Power 4 LPAR, and PPC64 Power 4 bare metal system. The config files it uses are linked by the machine names in the column headers. Thanks to all the other IBM people who've worked on the ABAT test system that this stuff relies on - too many to list, but especially Andy, Adam, and Enrique, who have fixed endless bugs, and put up with my incessant bitching about it all not working as it should ;-) Clicking on the failure ones error codes should take you to somewhere vaguely helpful to diagnose it. Clicking on the job number just below that takes you to the info I'm publishing right now, which should include perf results and profiles, etc. I'll add graphs, etc later, comparing performance across kernels (I have them ... just not automated). From herbert@gondor.apana.org.au Fri Jun 3 17:35:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 17:35:51 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j540ZjXq029730 for ; Fri, 3 Jun 2005 17:35:46 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeMcd-0008PT-00; Sat, 04 Jun 2005 10:34:43 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeMcb-0005KZ-00; Sat, 04 Jun 2005 10:34:41 +1000 Date: Sat, 4 Jun 2005 10:34:41 +1000 To: Phil Oester Cc: netdev@oss.sgi.com Subject: Re: Unitialized queue_lock oops? Message-ID: <20050604003441.GA20471@gondor.apana.org.au> References: <20050603232413.GA29308@linuxace.com> <20050604000046.GA29438@linuxace.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604000046.GA29438@linuxace.com> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2086 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 817 Lines: 20 On Fri, Jun 03, 2005 at 05:00:46PM -0700, Phil Oester wrote: > > Yes, default qdisc. Interesting that 2.6.10 is rock solid on the same > hardware...oh well. Well if you do have the time feel free to keep searching back to 2.6.10. Even though I'd say that this is most likely to turn out to be a hardware problem, there is no telling what you might find along the way. At least it might tell us what sort of hardware problems would result in only networking crashes :) If this were your average hardware problem I'd have expected to see crashes all over the place, especially under fs/ and mm/. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From kernel@linuxace.com Fri Jun 3 17:39:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 17:39:36 -0700 (PDT) Received: from linuxace.com (adsl-67-120-171-161.dsl.lsan03.pacbell.net [67.120.171.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j540dXXq030385 for ; Fri, 3 Jun 2005 17:39:33 -0700 Received: (qmail 29677 invoked by uid 0); 4 Jun 2005 00:38:35 -0000 Date: Fri, 3 Jun 2005 17:38:35 -0700 From: Phil Oester To: Herbert Xu Cc: netdev@oss.sgi.com Subject: Re: Unitialized queue_lock oops? Message-ID: <20050604003835.GA29635@linuxace.com> References: <20050603232413.GA29308@linuxace.com> <20050604000046.GA29438@linuxace.com> <20050604003441.GA20471@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604003441.GA20471@gondor.apana.org.au> User-Agent: Mutt/1.4.1i X-archive-position: 2087 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kernel@linuxace.com Precedence: bulk X-list: netdev Content-Length: 874 Lines: 20 On Sat, Jun 04, 2005 at 10:34:41AM +1000, Herbert Xu wrote: > On Fri, Jun 03, 2005 at 05:00:46PM -0700, Phil Oester wrote: > > > > Yes, default qdisc. Interesting that 2.6.10 is rock solid on the same > > hardware...oh well. > > Well if you do have the time feel free to keep searching back to 2.6.10. > Even though I'd say that this is most likely to turn out to be a hardware > problem, there is no telling what you might find along the way. > > At least it might tell us what sort of hardware problems would result in > only networking crashes :) If this were your average hardware problem > I'd have expected to see crashes all over the place, especially under > fs/ and mm/. Ok, how bout next week I adjust OSPF costs to make my secondary firewall primary, and see if I still have problems? At least then we can put the hardware problem theory behind us... Phil From herbert@gondor.apana.org.au Fri Jun 3 17:43:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 17:43:30 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j540hNXq031001 for ; Fri, 3 Jun 2005 17:43:24 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeMjm-0008VR-00; Sat, 04 Jun 2005 10:42:06 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeMjh-0005Lr-00; Sat, 04 Jun 2005 10:42:01 +1000 Date: Sat, 4 Jun 2005 10:42:01 +1000 To: Jeff Garzik Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604004201.GB20471@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <42A0EFAC.7070609@pobox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42A0EFAC.7070609@pobox.com> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2088 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1837 Lines: 42 Hi Jeff: On Fri, Jun 03, 2005 at 08:02:52PM -0400, Jeff Garzik wrote: > > A standard feature of struct scatterlist is having the DMA mappings > right next to the kernel virtual address/length info. Drivers use the > arch-specific DMA-mapped part of struct scatterlist to fill the > hardware-specific descriptions with addresses and other info. Agreed. > Since you -will- have to DMA map buffers before passing them to > hardware, it seems like struct scatterlist is much more appropriate than > crypto_frag when dealing with hardware. > > For pure software implementations, I don't see why you can't just ignore > the extra fields that each arch puts into struct scatterlist. It depends on who is going to do the mapping. When we implement hardware crypto, the DMA mapping will be done either by the crypto layer or under it by the driver itself. So the crypto layer is certainly going to need the scatterlist structure. However, the users of the crypto layer (such as IPsec/dmcrypt) don't have to know about DMA at all. Therefore the data structure between the users and the crypto layer itself doesn't have to carry DMA information. Compare this with the block layer. Between the users of the block layer and the block layer itself you have the bio_vec structure which carries no DMA information. The scatterlist structure only comes into play after DMA mapping has been carried out under the block layer. So this is really a sort of bio_vec for crypto structures. The objective here is to make the structure as compact as possible to allow users to allocate it on the stack most of the time. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From mchan@broadcom.com Fri Jun 3 18:24:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 18:24:49 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j541OdXq000881 for ; Fri, 3 Jun 2005 18:24:39 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Fri, 03 Jun 2005 18:23:23 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Fri, 3 Jun 2005 18:23:22 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BCC34449; Fri, 3 Jun 2005 18:23:11 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id SAA12263; Fri, 3 Jun 2005 18:23:11 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Sat, 4 Jun 2005 01:23:04 +0000 Received: from rh4 by nt-irva-0741; 03 Jun 2005 17:25:36 -0700 Subject: Re: RFC: NAPI packet weighting patch From: "Michael Chan" To: hadi@cyberus.ca cc: "Lennert Buytenhek" , "David S. Miller" , mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <1117837798.6266.25.camel@localhost.localdomain> References: <20050603.120126.41874584.davem@davemloft.net> <20050603.132257.23013342.davem@davemloft.net> <20050603.132922.63997492.davem@davemloft.net> <1117828169.4430.29.camel@rh4> <20050603205944.GC20623@xi.wantstofly.org> <1117830922.4430.44.camel@rh4> <1117837798.6266.25.camel@localhost.localdomain> Date: Fri, 03 Jun 2005 17:25:36 -0700 Message-ID: <1117844736.4430.51.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EBFDD011VO5052745-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2089 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 693 Lines: 17 On Fri, 2005-06-03 at 18:29 -0400, jamal wrote: > On Fri, 2005-03-06 at 13:35 -0700, Michael Chan wrote: > > > By the way, in tg3 there is a buffer replenishment threshold programmed > > to the chip and is currently set at rx_pending / 8 (200/8 = 25). This > > means that the chip will replenish 25 rx buffers at a time. > > > > So when you write the MMIO, 25 buffers are replenished or is this auto > magically happening in the background? Sounds like a neat feature either > way. > The MMIO writes a cumulative producer index of new rx descriptors in the ring. As the chip requires new buffers for rx packets, it will DMA 25 of these rx descriptors at a time up to the producer index. From jmorris@redhat.com Fri Jun 3 21:40:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 21:40:53 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j544eoXq013446 for ; Fri, 3 Jun 2005 21:40:50 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j544dhgU003834; Sat, 4 Jun 2005 00:39:43 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j544dgO20153; Sat, 4 Jun 2005 00:39:42 -0400 Received: from thoron.boston.redhat.com (thoron.boston.redhat.com [172.16.80.63]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j544dfDj029248; Sat, 4 Jun 2005 00:39:42 -0400 Date: Sat, 4 Jun 2005 00:39:41 -0400 (EDT) From: James Morris X-X-Sender: jmorris@thoron.boston.redhat.com To: Herbert Xu cc: Jeff Garzik , "David S. Miller" , Linux Crypto Mailing List , Subject: Re: [RFC] Replace scatterlist with crypto_frag In-Reply-To: <20050604004201.GB20471@gondor.apana.org.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2090 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@redhat.com Precedence: bulk X-list: netdev Content-Length: 314 Lines: 15 On Sat, 4 Jun 2005, Herbert Xu wrote: > So this is really a sort of bio_vec for crypto structures. The objective > here is to make the structure as compact as possible to allow users to > allocate it on the stack most of the time. Seems like a good idea to me. - James -- James Morris From herbert@gondor.apana.org.au Fri Jun 3 21:53:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 21:53:19 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j544r4Xq014289 for ; Fri, 3 Jun 2005 21:53:05 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeQd5-00012e-00; Sat, 04 Jun 2005 14:51:27 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeQcx-0006a6-00; Sat, 04 Jun 2005 14:51:19 +1000 Date: Sat, 4 Jun 2005 14:51:19 +1000 To: James Morris Cc: Jeff Garzik , "David S. Miller" , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604045119.GA25270@gondor.apana.org.au> References: <20050604004201.GB20471@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2091 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 782 Lines: 21 On Sat, Jun 04, 2005 at 12:39:41AM -0400, James Morris wrote: > On Sat, 4 Jun 2005, Herbert Xu wrote: > > > So this is really a sort of bio_vec for crypto structures. The objective > > here is to make the structure as compact as possible to allow users to > > allocate it on the stack most of the time. > > Seems like a good idea to me. Thanks James. What do you think about eating up 32 bytes on the stack in esp_input/esp_output? In fact, how did we come up with the number of four frags? Why wouldn't say two frags do for most users or perhaps even one? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From imipak@yahoo.com Fri Jun 3 22:02:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 22:02:24 -0700 (PDT) Received: from web31504.mail.mud.yahoo.com (web31504.mail.mud.yahoo.com [68.142.198.133]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5452LXq015391 for ; Fri, 3 Jun 2005 22:02:22 -0700 Received: (qmail 9899 invoked by uid 60001); 4 Jun 2005 05:01:23 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=UfBP4q3u6+tXOFdEMbW/rZmEjhIHiB6oK1XIRbqtuX5qLY2x0UEgNnJf5SY9iQFYWttGt+hrGCnrKC9WCszTo3YJphiSTY9d8PguECy22Uv9ARW68Qrfxb+pCOiTaFmymFcPdUEpPhoTv5+/Dlg7JvB+zZljnXp8dMyCU3uOql8= ; Message-ID: <20050604050123.9897.qmail@web31504.mail.mud.yahoo.com> Received: from [70.59.136.169] by web31504.mail.mud.yahoo.com via HTTP; Fri, 03 Jun 2005 22:01:23 PDT Date: Fri, 3 Jun 2005 22:01:23 -0700 (PDT) From: Jonathan Day Subject: Re: Automated linux kernel testing results To: Nivedita Singhvi , netdev@oss.sgi.com In-Reply-To: <42A0F3B4.1060601@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-archive-position: 2092 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: imipak@yahoo.com Precedence: bulk X-list: netdev Content-Length: 3157 Lines: 105 I am very impressed, especially as it sounds as though a lot more tests exist (he talks of only pushing small amounts of data to kernel.org) and a lot more are going to be added. It seems to me that there are a lot of disparate test suites out there - some test the APIs, some benchmark the performance, some validate the state at the end, some verify that the source obeys expected rules. What I have not (yet) seen is any work on relating the results. Is a bug in the design? The implementation? Some combination thereof? Is something correctly written but not functioning because something it depends on isn't working correctly? It would even be useful if we could cross-reference some of the benchmarks with the Linux graphing project, so that we could see how the complexity of the tested component differs between versions and variants. (A small degredation in performance, if related to a large increase in necessary sophistication, is not necessarily that bad. The same performance drop, if related to a massive simplification of the design, is an indication of a serious problem.) Test suites are necessary. Test suites are great. Anyone working on a test suite deserves many kudos and much praise. Test suites that are relatable enough that you can see the same problem from different angles -- those are worth their printout weight in gold. --- Nivedita Singhvi wrote: > For those who don't read lkml, I thought I'd point > to > Martin Bligh's post regarding automated testing > being > set up, since some people on this list were > interested. > > http://marc.theaimsgroup.com/?l=linux-kernel&m=111775021327595&w=2 > > Networking tests are in plan... > > thanks, > Nivedita > > -------------------------- > > OK, I've finally got this to the point where I can > publish it. > > http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/regression_matrix.html > > > Currently it builds and boots any mainline, -mjb, > -mm kernel within > about 15 minutes of release. runs dbench, tbench, > kernbench, reaim and fsx. > Currently I'm using a 4x AMD64 box, a 16x NUMA-Q, 4x > NUMA-Q, 32x x440 > (ia32) > PPC64 Power 5 LPAR, PPC64 Power 4 LPAR, and PPC64 > Power 4 bare metal > system. > The config files it uses are linked by the machine > names in the column > headers. > > Thanks to all the other IBM people who've worked on > the ABAT test system > that this stuff relies on - too many to list, but > especially Andy, Adam, > and Enrique, who have fixed endless bugs, and put up > with my incessant > bitching about it all not working as it should ;-) > > Clicking on the failure ones error codes should take > you to somewhere > vaguely helpful to diagnose it. Clicking on the job > number just below > that takes you to the info I'm publishing right now, > which should > include perf results and profiles, etc. I'll add > graphs, etc later, > comparing performance across kernels (I have them > ... just not automated). > > > > __________________________________ Discover Yahoo! Find restaurants, movies, travel and more fun for the weekend. Check it out! http://discover.yahoo.com/weekend.html From jmorris@redhat.com Fri Jun 3 22:25:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 22:25:14 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j545P7Xq016566 for ; Fri, 3 Jun 2005 22:25:07 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j545NvbQ010698; Sat, 4 Jun 2005 01:23:57 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j545NuO23822; Sat, 4 Jun 2005 01:23:56 -0400 Received: from thoron.boston.redhat.com (thoron.boston.redhat.com [172.16.80.63]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j545NtDj031527; Sat, 4 Jun 2005 01:23:55 -0400 Date: Sat, 4 Jun 2005 01:23:55 -0400 (EDT) From: James Morris X-X-Sender: jmorris@thoron.boston.redhat.com To: Herbert Xu cc: Jeff Garzik , "David S. Miller" , Linux Crypto Mailing List , Subject: Re: [RFC] Replace scatterlist with crypto_frag In-Reply-To: <20050604045119.GA25270@gondor.apana.org.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2093 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmorris@redhat.com Precedence: bulk X-list: netdev Content-Length: 425 Lines: 19 On Sat, 4 Jun 2005, Herbert Xu wrote: > Thanks James. What do you think about eating up 32 bytes on the > stack in esp_input/esp_output? Sounds like a low price to pay, given the general overhead of ipsec. > In fact, how did we come up with the number of four frags? Why wouldn't > say two frags do for most users or perhaps even one? I don't know where that came from. - James -- James Morris From jm@jm.kir.nu Fri Jun 3 22:34:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 22:34:15 -0700 (PDT) Received: from jm.kir.nu (dsl017-049-110.sfo4.dsl.speakeasy.net [69.17.49.110]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j545YAXq017326 for ; Fri, 3 Jun 2005 22:34:11 -0700 Received: from jm by jm.kir.nu with local (Exim 4.43) id 1DeRDV-0002DE-Es; Fri, 03 Jun 2005 22:29:05 -0700 Date: Fri, 3 Jun 2005 22:29:05 -0700 From: Jouni Malinen To: Jiri Benc Cc: gwingerde@home.nl, netdev@oss.sgi.com, jbohac@suse.cz Subject: Re: [PATCH] ieee80211: Update generic definitions to latest specs. Message-ID: <20050604052905.GA8130@jm.kir.nu> References: <20050602190232.340996282D7@mail.suse.cz> <20050603113343.55d19cfc@griffin.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050603113343.55d19cfc@griffin.suse.cz> User-Agent: Mutt/1.5.8i X-archive-position: 2094 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jkmaline@cc.hut.fi Precedence: bulk X-list: netdev Content-Length: 557 Lines: 13 On Fri, Jun 03, 2005 at 11:33:43AM +0200, Jiri Benc wrote: > and so on. Also WLAN_STATUS_ASSOC_DENIED_NOSHORT seems to be acceptable > for me. That would be just asking for problems. IEEE 802.11 uses "short" in number of terms and two of them happen to be already part of capabilities negotatiation (short preamble and short slot time) and both have status codes for rejecting association.. In other words, the constants/enums better include PREAMBLE and SLOTTIME in the name. -- Jouni Malinen PGP id EFC895FA From herbert@gondor.apana.org.au Fri Jun 3 22:35:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 22:35:22 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j545ZAXq017615 for ; Fri, 3 Jun 2005 22:35:11 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeRI9-0001JB-00; Sat, 04 Jun 2005 15:33:53 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeRI4-0001XZ-00; Sat, 04 Jun 2005 15:33:48 +1000 Date: Sat, 4 Jun 2005 15:33:48 +1000 To: James Morris Cc: Jeff Garzik , "David S. Miller" , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604053348.GA5877@gondor.apana.org.au> References: <20050604045119.GA25270@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2095 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 708 Lines: 19 On Sat, Jun 04, 2005 at 01:23:55AM -0400, James Morris wrote: > On Sat, 4 Jun 2005, Herbert Xu wrote: > > > Thanks James. What do you think about eating up 32 bytes on the > > stack in esp_input/esp_output? > > Sounds like a low price to pay, given the general overhead of ipsec. I agree with you on the stack usage. BTW, we can now pump 5Gb/s through the Crypto API using a 1Ghz VIA CPU with the Padlock so encryption is no longer necessarily the slowest piece along the pipeline :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From jm@jm.kir.nu Fri Jun 3 22:50:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 03 Jun 2005 22:50:14 -0700 (PDT) Received: from jm.kir.nu (dsl017-049-110.sfo4.dsl.speakeasy.net [69.17.49.110]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j545o9Xq018724 for ; Fri, 3 Jun 2005 22:50:09 -0700 Received: from jm by jm.kir.nu with local (Exim 4.43) id 1DeRT3-0002EJ-MY; Fri, 03 Jun 2005 22:45:09 -0700 Date: Fri, 3 Jun 2005 22:45:09 -0700 From: Jouni Malinen To: Jiri Benc Cc: NetDev , Jeff Garzik , Jirka Bohac Subject: Re: [6/9] ieee80211: ethernet independency Message-ID: <20050604054509.GB8130@jm.kir.nu> References: <20050603182625.64d33be3@griffin.suse.cz> <20050603183418.58c47b0c@griffin.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050603183418.58c47b0c@griffin.suse.cz> User-Agent: Mutt/1.5.8i X-archive-position: 2096 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jkmaline@cc.hut.fi Precedence: bulk X-list: netdev Content-Length: 1446 Lines: 29 On Fri, Jun 03, 2005 at 06:34:18PM +0200, Jiri Benc wrote: > Makes the 802.11 layer independent of ethernet. (The previous implementation > had the ethernet headers built by the ethernet layer and then parsed them and > rebuilt them into 802.11 headers.) Many (most?) parts of this change seems to be only for client (managed and ad-hoc) modes. Has anyone had chance to go through what would be needed for AP (master mode) and WDS links? What about extra bytes added for QoS information (IEEE 802.11e/WMM)? Are there places here that will not handle variable length header nicely? I haven't yet looked into details, but could someone explain what a user space program needs to do when receiving or sending packets with packet socket from a 802.11 netdev (e.g., ethertype=EAPOL)? Let's say in the "worst case" scenario: QoS enabled and pairwise keys configured and 4-address WDS link (i.e., 32-byte IEEE 802.11 header). Will the user space program need to parse (and generate) the IEEE 802.11 header, including the knowledge of four addresses and QoS data, and SNAP header? Packet socket with SOCK_DGRAM could otherwise be one way of doing this, but sockaddr_ll does not have places for many parameters.. Many of these questions are not really specifically related to this patch, but I haven't seen a good answer to these open areas (well, at least to me) so far. -- Jouni Malinen PGP id EFC895FA From jgarzik@pobox.com Sat Jun 4 01:39:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 01:39:34 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j548dQXq029891 for ; Sat, 4 Jun 2005 01:39:28 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DeUAk-0002Ly-RQ; Sat, 04 Jun 2005 08:38:27 +0000 Message-ID: <42A16880.4030802@pobox.com> Date: Sat, 04 Jun 2005 04:38:24 -0400 From: Jeff Garzik User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050328 Fedora/1.7.6-1.2.5 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Linus Torvalds , Andrew Morton CC: Netdev , Linux Kernel Subject: [git patches] 2.6.x net driver fixes Content-Type: multipart/mixed; boundary="------------050200070305000207090100" X-archive-position: 2097 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 3914 Lines: 138 This is a multi-part message in MIME format. --------------050200070305000207090100 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Please pull from the 'misc-fixes' branch of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git to obtain r8169 and 3c574_cs fixes. diffstat/shortlog/patch attached. Jeff --------------050200070305000207090100 Content-Type: text/plain; name="netdev-2.6.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="netdev-2.6.txt" drivers/net/pcmcia/3c574_cs.c | 3 +++ drivers/net/r8169.c | 31 +++++++++++++++++++++++++------ 2 files changed, 28 insertions(+), 6 deletions(-) : Automatic merge of /spare/repo/netdev-2.6 branch r8169-fix Automatic merge of rsync://rsync.kernel.org/.../torvalds/linux-2.6.git branch HEAD Daniel Ritz : 3c574_cs: disable interrupts in el3_close Francois Romieu : [PATCH] r8169: incoming frame length check diff --git a/drivers/net/pcmcia/3c574_cs.c b/drivers/net/pcmcia/3c574_cs.c --- a/drivers/net/pcmcia/3c574_cs.c +++ b/drivers/net/pcmcia/3c574_cs.c @@ -1274,6 +1274,9 @@ static int el3_close(struct net_device * spin_lock_irqsave(&lp->window_lock, flags); update_stats(dev); spin_unlock_irqrestore(&lp->window_lock, flags); + + /* force interrupts off */ + outw(SetIntrEnb | 0x0000, ioaddr + EL3_CMD); } link->open--; diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -1585,8 +1585,8 @@ rtl8169_hw_start(struct net_device *dev) RTL_W8(ChipCmd, CmdTxEnb | CmdRxEnb); RTL_W8(EarlyTxThres, EarlyTxThld); - /* For gigabit rtl8169, MTU + header + CRC + VLAN */ - RTL_W16(RxMaxSize, tp->rx_buf_sz); + /* Low hurts. Let's disable the filtering. */ + RTL_W16(RxMaxSize, 16383); /* Set Rx Config register */ i = rtl8169_rx_config | @@ -2127,6 +2127,11 @@ rtl8169_tx_interrupt(struct net_device * } } +static inline int rtl8169_fragmented_frame(u32 status) +{ + return (status & (FirstFrag | LastFrag)) != (FirstFrag | LastFrag); +} + static inline void rtl8169_rx_csum(struct sk_buff *skb, struct RxDesc *desc) { u32 opts1 = le32_to_cpu(desc->opts1); @@ -2177,27 +2182,41 @@ rtl8169_rx_interrupt(struct net_device * while (rx_left > 0) { unsigned int entry = cur_rx % NUM_RX_DESC; + struct RxDesc *desc = tp->RxDescArray + entry; u32 status; rmb(); - status = le32_to_cpu(tp->RxDescArray[entry].opts1); + status = le32_to_cpu(desc->opts1); if (status & DescOwn) break; if (status & RxRES) { - printk(KERN_INFO "%s: Rx ERROR!!!\n", dev->name); + printk(KERN_INFO "%s: Rx ERROR. status = %08x\n", + dev->name, status); tp->stats.rx_errors++; if (status & (RxRWT | RxRUNT)) tp->stats.rx_length_errors++; if (status & RxCRC) tp->stats.rx_crc_errors++; + rtl8169_mark_to_asic(desc, tp->rx_buf_sz); } else { - struct RxDesc *desc = tp->RxDescArray + entry; struct sk_buff *skb = tp->Rx_skbuff[entry]; int pkt_size = (status & 0x00001FFF) - 4; void (*pci_action)(struct pci_dev *, dma_addr_t, size_t, int) = pci_dma_sync_single_for_device; + /* + * The driver does not support incoming fragmented + * frames. They are seen as a symptom of over-mtu + * sized frames. + */ + if (unlikely(rtl8169_fragmented_frame(status))) { + tp->stats.rx_dropped++; + tp->stats.rx_length_errors++; + rtl8169_mark_to_asic(desc, tp->rx_buf_sz); + goto move_on; + } + rtl8169_rx_csum(skb, desc); pci_dma_sync_single_for_cpu(tp->pci_dev, @@ -2224,7 +2243,7 @@ rtl8169_rx_interrupt(struct net_device * tp->stats.rx_bytes += pkt_size; tp->stats.rx_packets++; } - +move_on: cur_rx++; rx_left--; } --------------050200070305000207090100-- From johnpol@2ka.mipt.ru Sat Jun 4 02:56:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 02:56:59 -0700 (PDT) Received: from 2ka.mipt.ru (relay.2ka.mipt.ru [194.85.82.65]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j549urXq032618 for ; Sat, 4 Jun 2005 02:56:54 -0700 Received: from zanzibar.2ka.mipt.ru (zanzibar.2ka.mipt.ru [194.85.82.77]) by 2ka.mipt.ru (8.12.11/8.12.11) with ESMTP id j549u8I5018923; Sat, 4 Jun 2005 13:56:08 +0400 Date: Sat, 4 Jun 2005 13:55:35 +0400 From: Evgeniy Polyakov To: Herbert Xu Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> In-Reply-To: <20050603234623.GA20088@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> Reply-To: johnpol@2ka.mipt.ru Organization: MIPT X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; i386-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.7.5 (2ka.mipt.ru [194.85.82.65]); Sat, 04 Jun 2005 13:56:09 +0400 (MSD) X-archive-position: 2098 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev Content-Length: 1547 Lines: 40 On Sat, 4 Jun 2005 09:46:23 +1000 Herbert Xu wrote: > Hi: > > I was looking at how we can move the IPsec input/output processing out > of the critical section protected by the spin locks on the xfrm_state. > This is useful because it would allow concurrent processing of IPsec > packets for the same SA. It is also necessary if we're ever going to > add support for asynchronous crypto to IPsec. Asynchronous schemas already works without any changes to scaterlist processing code. And you can not easily move away of SA lock due to synchronous problems with the same tfm. Existing asynchronous schemas do not use any shared object in SA, only skb. > The first requirement for this is that we need to stop using data that > is shared across a single SA in the IPsec input/output routines. The > biggest hurdle there as it stands is sgbuf in esp_data. This was > introduced to reduce stack usage in esp_input/esp_output as sgbuf > would consume up to 64 bytes of space. No need to have it at all, I think. > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > - > To unsubscribe from this list: send the line "unsubscribe linux-crypto" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt From herbert@gondor.apana.org.au Sat Jun 4 03:00:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 03:00:31 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54A0OXq000783 for ; Sat, 4 Jun 2005 03:00:27 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeVQf-0002ay-00; Sat, 04 Jun 2005 19:58:57 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeVQc-0000Gp-00; Sat, 04 Jun 2005 19:58:54 +1000 Date: Sat, 4 Jun 2005 19:58:54 +1000 To: Evgeniy Polyakov Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604095854.GA1003@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2099 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 681 Lines: 16 On Sat, Jun 04, 2005 at 01:55:35PM +0400, Evgeniy Polyakov wrote: > > processing code. And you can not easily move away of SA lock due to > synchronous problems with the same tfm. This is not true. The tfm context contains no shared state apart from the IV. As the IV can be specified through the *_iv functions, this allows crypto API users to process the same cipher tfm on two CPUs in parallel. If you don't believe me just wait for my upcoming patches to IPsec. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From johnpol@2ka.mipt.ru Sat Jun 4 03:01:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 03:01:36 -0700 (PDT) Received: from 2ka.mipt.ru (relay.2ka.mipt.ru [194.85.82.65]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54A1WXq001174 for ; Sat, 4 Jun 2005 03:01:33 -0700 Received: from zanzibar.2ka.mipt.ru (zanzibar.2ka.mipt.ru [194.85.82.77]) by 2ka.mipt.ru (8.12.11/8.12.11) with ESMTP id j54A0scj023143; Sat, 4 Jun 2005 14:00:54 +0400 Date: Sat, 4 Jun 2005 14:00:21 +0400 From: Evgeniy Polyakov To: Herbert Xu Cc: Jeff Garzik , "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604140021.62259ad3@zanzibar.2ka.mipt.ru> In-Reply-To: <20050604004201.GB20471@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <42A0EFAC.7070609@pobox.com> <20050604004201.GB20471@gondor.apana.org.au> Reply-To: johnpol@2ka.mipt.ru Organization: MIPT X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; i386-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.7.5 (2ka.mipt.ru [194.85.82.65]); Sat, 04 Jun 2005 14:00:54 +0400 (MSD) X-archive-position: 2100 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev Content-Length: 2582 Lines: 61 On Sat, 4 Jun 2005 10:42:01 +1000 Herbert Xu wrote: > Hi Jeff: > > On Fri, Jun 03, 2005 at 08:02:52PM -0400, Jeff Garzik wrote: > > > > A standard feature of struct scatterlist is having the DMA mappings > > right next to the kernel virtual address/length info. Drivers use the > > arch-specific DMA-mapped part of struct scatterlist to fill the > > hardware-specific descriptions with addresses and other info. > > Agreed. > > > Since you -will- have to DMA map buffers before passing them to > > hardware, it seems like struct scatterlist is much more appropriate than > > crypto_frag when dealing with hardware. > > > > For pure software implementations, I don't see why you can't just ignore > > the extra fields that each arch puts into struct scatterlist. > > It depends on who is going to do the mapping. When we implement hardware > crypto, the DMA mapping will be done either by the crypto layer or under > it by the driver itself. So the crypto layer is certainly going to need > the scatterlist structure. > > However, the users of the crypto layer (such as IPsec/dmcrypt) don't have > to know about DMA at all. Therefore the data structure between the users > and the crypto layer itself doesn't have to carry DMA information. > > Compare this with the block layer. Between the users of the block layer > and the block layer itself you have the bio_vec structure which carries > no DMA information. The scatterlist structure only comes into play after > DMA mapping has been carried out under the block layer. > > So this is really a sort of bio_vec for crypto structures. The objective > here is to make the structure as compact as possible to allow users to > allocate it on the stack most of the time. As far as I remember, IPsec has scterlists specially to _not_ remap from any inner strucutre to scaterlist later. Block layer was not designed in a such way because there is no easy mapping in block cache into scaterlist and bio_vec has much bigger usage than SA, so removing dma address is suitable there. > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > - > To unsubscribe from this list: send the line "unsubscribe linux-crypto" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt From johnpol@2ka.mipt.ru Sat Jun 4 03:18:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 03:18:53 -0700 (PDT) Received: from 2ka.mipt.ru (relay.2ka.mipt.ru [194.85.82.65]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54AIhXq002296 for ; Sat, 4 Jun 2005 03:18:44 -0700 Received: from zanzibar.2ka.mipt.ru (zanzibar.2ka.mipt.ru [194.85.82.77]) by 2ka.mipt.ru (8.12.11/8.12.11) with ESMTP id j54AI5Op007818; Sat, 4 Jun 2005 14:18:05 +0400 Date: Sat, 4 Jun 2005 14:17:31 +0400 From: Evgeniy Polyakov To: Herbert Xu Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604141731.37479347@zanzibar.2ka.mipt.ru> In-Reply-To: <20050604095854.GA1003@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> <20050604095854.GA1003@gondor.apana.org.au> Reply-To: johnpol@2ka.mipt.ru Organization: MIPT X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; i386-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.7.5 (2ka.mipt.ru [194.85.82.65]); Sat, 04 Jun 2005 14:18:05 +0400 (MSD) X-archive-position: 2101 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev Content-Length: 3462 Lines: 127 On Sat, 4 Jun 2005 19:58:54 +1000 Herbert Xu wrote: > On Sat, Jun 04, 2005 at 01:55:35PM +0400, Evgeniy Polyakov wrote: > > > > processing code. And you can not easily move away of SA lock due to > > synchronous problems with the same tfm. > > This is not true. The tfm context contains no shared state apart > from the IV. As the IV can be specified through the *_iv functions, > this allows crypto API users to process the same cipher tfm on two > CPUs in parallel. > > If you don't believe me just wait for my upcoming patches to IPsec. Sure I believe you, in tfm there are no shared objects except data. But can we catch the situation when we encrypting the same skb? As far as I can see skb_cow_data() must take care of it. You are right, encrypting is safe. Here is part of esp_output() I use for acrypto. Static scaterlists are not used and new are dinamically allocated. @@ -95,7 +239,90 @@ esph->spi = x->id.spi; esph->seq_no = htonl(++x->replay.oseq); + +#ifdef CONFIG_ACRYPTO + do { + struct crypto_session_initializer ci; + struct crypto_data data; + struct scatterlist *sg; + struct crypto_session *s; + u8 *key, *iv; + + nfrags++; /* key */ + + if (esp->conf.ivlen) + nfrags++; + memset(&ci, 0, sizeof(ci)); + memset(&data, 0, sizeof(data)); + + ci.operation = CRYPTO_OP_ENCRYPT; + ci.mode = crypto_tfm_get_mode(tfm); + ci.type = crypto_tfm_get_type(tfm); + ci.priority = 0; + ci.callback = &esp4_async_callback; + + if (ci.mode == 0xffff || ci.type == 0xffff) + goto sync_crypto; + + sg = kmalloc(sizeof(struct scatterlist)*nfrags, GFP_ATOMIC); + if (!sg) + goto error; + skb_to_sgvec(skb, sg, esph->enc_data+esp->conf.ivlen-skb->data, clen); + data.sg_src = data.sg_dst = sg; + + key = kmalloc(crypto_tfm_alg_ivsize(tfm) + esp->conf.key_len, GFP_ATOMIC); + if (!key) + goto err_out_free_sg; + + iv = key + esp->conf.key_len; + + if (esp->conf.ivlen) { + data.sg_key = &sg[nfrags - 2]; + data.sg_iv = &sg[nfrags - 1]; + data.sg_key_num = data.sg_iv_num = 1; + } else { + data.sg_key = &sg[nfrags - 1]; + data.sg_iv = NULL; + data.sg_key_num = 1; + data.sg_iv_num = 0; + } + + data.sg_src_num = data.sg_dst_num = nfrags - data.sg_key_num - data.sg_iv_num; + + memcpy(key, esp->conf.key, esp->conf.key_len); + data.sg_key[0].offset = offset_in_page(key); + data.sg_key[0].length = esp->conf.key_len; + data.sg_key[0].page = virt_to_page(key); + + if (esp->conf.ivlen) { + memcpy(iv, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); + data.sg_iv[0].offset = offset_in_page(iv); + data.sg_iv[0].length = crypto_tfm_alg_ivsize(tfm); + data.sg_iv[0].page = virt_to_page(iv); + } + + data.priv = esp_output_async_prepare(x, skb); + if (!data.priv) + goto err_out_free_key; + + s = crypto_session_alloc(&ci, &data); + if (!s) + goto err_out_free_ea; + + return 0; + +err_out_free_ea: + kfree(data.priv); +err_out_free_key: + kfree(key); +err_out_free_sg: + kfree(sg); + goto sync_crypto; + } while (0); + +sync_crypto: +#endif if (esp->conf.ivlen) crypto_cipher_set_iv(tfm, esp->conf.ivec, crypto_tfm_alg_ivsize(tfm)); > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt From herbert@gondor.apana.org.au Sat Jun 4 03:23:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 03:23:31 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54ANRXq002965 for ; Sat, 4 Jun 2005 03:23:28 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeVn4-0002jP-00; Sat, 04 Jun 2005 20:22:06 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeVn2-0000Ki-00; Sat, 04 Jun 2005 20:22:04 +1000 Date: Sat, 4 Jun 2005 20:22:04 +1000 To: Evgeniy Polyakov Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604102204.GA1214@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> <20050604095854.GA1003@gondor.apana.org.au> <20050604141731.37479347@zanzibar.2ka.mipt.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604141731.37479347@zanzibar.2ka.mipt.ru> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2102 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 713 Lines: 19 On Sat, Jun 04, 2005 at 02:17:31PM +0400, Evgeniy Polyakov wrote: > > Static scaterlists are not used and new are dinamically allocated. That's precisely why we're having this discussion. We can now encrypt/decrypt a 1500 byte packet in 2us so the last thing we want is to impose additional latencies on the common case unless it's absolutely required. If we can shrink the structure used between IPsec and the crypto layer then we can allocate the sgbuf off the stack for 99% of the users. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From johnpol@2ka.mipt.ru Sat Jun 4 03:30:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 03:30:59 -0700 (PDT) Received: from 2ka.mipt.ru (relay.2ka.mipt.ru [194.85.82.65]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54AUqXq003664 for ; Sat, 4 Jun 2005 03:30:52 -0700 Received: from zanzibar.2ka.mipt.ru (zanzibar.2ka.mipt.ru [194.85.82.77]) by 2ka.mipt.ru (8.12.11/8.12.11) with ESMTP id j54AUE9g019299; Sat, 4 Jun 2005 14:30:14 +0400 Date: Sat, 4 Jun 2005 14:29:39 +0400 From: Evgeniy Polyakov To: Herbert Xu Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604142939.4e2efc55@zanzibar.2ka.mipt.ru> In-Reply-To: <20050604102204.GA1214@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> <20050604095854.GA1003@gondor.apana.org.au> <20050604141731.37479347@zanzibar.2ka.mipt.ru> <20050604102204.GA1214@gondor.apana.org.au> Reply-To: johnpol@2ka.mipt.ru Organization: MIPT X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; i386-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.7.5 (2ka.mipt.ru [194.85.82.65]); Sat, 04 Jun 2005 14:30:14 +0400 (MSD) X-archive-position: 2103 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev Content-Length: 1157 Lines: 32 On Sat, 4 Jun 2005 20:22:04 +1000 Herbert Xu wrote: > On Sat, Jun 04, 2005 at 02:17:31PM +0400, Evgeniy Polyakov wrote: > > > > Static scaterlists are not used and new are dinamically allocated. > > That's precisely why we're having this discussion. We can now > encrypt/decrypt a 1500 byte packet in 2us so the last thing we > want is to impose additional latencies on the common case unless > it's absolutely required. > > If we can shrink the structure used between IPsec and the crypto > layer then we can allocate the sgbuf off the stack for 99% of > the users. I do see that 4 sg are enough for 99% of the users, I event think 2 is enough - it will be 8kb, almost the maximum seen 9kb jumbo frame. But without sg we sill save 4*sizeof(dma addr) - is it really a price? For hardware we will need to remap it later... > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt From herbert@gondor.apana.org.au Sat Jun 4 03:34:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 03:34:22 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54AYHXq004328 for ; Sat, 4 Jun 2005 03:34:18 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeVxU-0002ur-00; Sat, 04 Jun 2005 20:32:52 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeVxR-0000Mb-00; Sat, 04 Jun 2005 20:32:49 +1000 Date: Sat, 4 Jun 2005 20:32:49 +1000 To: Evgeniy Polyakov Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604103249.GA1378@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> <20050604095854.GA1003@gondor.apana.org.au> <20050604141731.37479347@zanzibar.2ka.mipt.ru> <20050604102204.GA1214@gondor.apana.org.au> <20050604142939.4e2efc55@zanzibar.2ka.mipt.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604142939.4e2efc55@zanzibar.2ka.mipt.ru> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2104 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 738 Lines: 20 On Sat, Jun 04, 2005 at 02:29:39PM +0400, Evgeniy Polyakov wrote: > > But without sg we sill save 4*sizeof(dma addr) - is it really a price? We're also reducing the offset/length to 16 bits from 32 bits so we're shaving off half the size. > For hardware we will need to remap it later... Well we can't modify the supplied scatterlist structure in the crypto API anyway since we don't have exclusive ownership of it. Since we can't expect the user of the crypto API to do the mapping this space is basically wasted. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From johnpol@2ka.mipt.ru Sat Jun 4 03:42:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 03:42:22 -0700 (PDT) Received: from 2ka.mipt.ru (relay.2ka.mipt.ru [194.85.82.65]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54AgGXq005163 for ; Sat, 4 Jun 2005 03:42:16 -0700 Received: from zanzibar.2ka.mipt.ru (zanzibar.2ka.mipt.ru [194.85.82.77]) by 2ka.mipt.ru (8.12.11/8.12.11) with ESMTP id j54AfYAh031712; Sat, 4 Jun 2005 14:41:35 +0400 Date: Sat, 4 Jun 2005 14:40:59 +0400 From: Evgeniy Polyakov To: Herbert Xu Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604144059.2be84671@zanzibar.2ka.mipt.ru> In-Reply-To: <20050604103249.GA1378@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604135535.3cfb631f@zanzibar.2ka.mipt.ru> <20050604095854.GA1003@gondor.apana.org.au> <20050604141731.37479347@zanzibar.2ka.mipt.ru> <20050604102204.GA1214@gondor.apana.org.au> <20050604142939.4e2efc55@zanzibar.2ka.mipt.ru> <20050604103249.GA1378@gondor.apana.org.au> Reply-To: johnpol@2ka.mipt.ru Organization: MIPT X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; i386-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.7.5 (2ka.mipt.ru [194.85.82.65]); Sat, 04 Jun 2005 14:41:35 +0400 (MSD) X-archive-position: 2105 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: johnpol@2ka.mipt.ru Precedence: bulk X-list: netdev Content-Length: 1417 Lines: 39 On Sat, 4 Jun 2005 20:32:49 +1000 Herbert Xu wrote: > On Sat, Jun 04, 2005 at 02:29:39PM +0400, Evgeniy Polyakov wrote: > > > > But without sg we sill save 4*sizeof(dma addr) - is it really a price? > > We're also reducing the offset/length to 16 bits from 32 bits so we're > shaving off half the size. > > > For hardware we will need to remap it later... > > Well we can't modify the supplied scatterlist structure in the > crypto API anyway since we don't have exclusive ownership of it. > Since we can't expect the user of the crypto API to do the mapping > this space is basically wasted. So why not remove it completely? Sycnhronous hardware (like VIA/freescale processors) do not use at all any scaterlists, so it is not needed there. CryptoAPI does not use half of the scaterlist structure. CryptoAPI design can not be used with "interruptible" hardware like HIFN, so for asynchronous hardware we need some kind of remapping anyway, so why just not to move to the new fragments Herbert introduced all over the place in CryptoAPI? But pleaso do not remove skb_to_sgvec() :) > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt From SRS0+ca62117fdd73a480a370+650+infradead.org+hch@pentafluge.srs.infradead.org Sat Jun 4 04:24:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 04:24:38 -0700 (PDT) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54BOKXq011188 for ; Sat, 4 Jun 2005 04:24:27 -0700 Received: from hch by pentafluge.infradead.org with local (Exim 4.43 #1 (Red Hat Linux)) id 1DeWkE-0005BW-Ip; Sat, 04 Jun 2005 12:23:14 +0100 Date: Sat, 4 Jun 2005 12:23:14 +0100 From: Christoph Hellwig To: Herbert Xu Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604112314.GA19819@infradead.org> References: <20050603234623.GA20088@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050603234623.GA20088@gondor.apana.org.au> User-Agent: Mutt/1.4.1i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 2106 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev Content-Length: 273 Lines: 10 On Sat, Jun 04, 2005 at 09:46:23AM +1000, Herbert Xu wrote: > struct crypto_frag { > struct page *page; > u16 offset; > u16 length; > }; we have this structure as skb_frag_struct and bio_vec already, care to use the same structure with a generic name for all of them? From herbert@gondor.apana.org.au Sat Jun 4 04:27:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 04:27:42 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54BRYXq011662 for ; Sat, 4 Jun 2005 04:27:35 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DeWn4-0003EA-00; Sat, 04 Jun 2005 21:26:10 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DeWn0-0000TP-00; Sat, 04 Jun 2005 21:26:06 +1000 Date: Sat, 4 Jun 2005 21:26:06 +1000 To: Christoph Hellwig Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604112606.GA1799@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604112314.GA19819@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604112314.GA19819@infradead.org> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2107 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 936 Lines: 25 On Sat, Jun 04, 2005 at 12:23:14PM +0100, Christoph Hellwig wrote: > On Sat, Jun 04, 2005 at 09:46:23AM +1000, Herbert Xu wrote: > > struct crypto_frag { > > struct page *page; > > u16 offset; > > u16 length; > > }; > > we have this structure as skb_frag_struct and bio_vec already, care > to use the same structure with a generic name for all of them? I certainly would have no problems merging with skb_frag_struct. However, merging with bio_vec would mean that either bio_vec would have to drop down to 16-bit counters, or crypto_frag would have to move up to 32-bit counters. The latter is problematic because I'm trying to shrink the size enough so that we can squeeze four of these things onto the stack. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From SRS0+ca62117fdd73a480a370+650+infradead.org+hch@pentafluge.srs.infradead.org Sat Jun 4 04:59:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 05:00:00 -0700 (PDT) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54BxuXq013796 for ; Sat, 4 Jun 2005 04:59:57 -0700 Received: from hch by pentafluge.infradead.org with local (Exim 4.43 #1 (Red Hat Linux)) id 1DeXIj-0005IC-Ht; Sat, 04 Jun 2005 12:58:53 +0100 Date: Sat, 4 Jun 2005 12:58:53 +0100 From: Christoph Hellwig To: Herbert Xu Cc: Christoph Hellwig , "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050604115853.GA20335@infradead.org> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604112314.GA19819@infradead.org> <20050604112606.GA1799@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604112606.GA1799@gondor.apana.org.au> User-Agent: Mutt/1.4.1i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 2108 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev Content-Length: 870 Lines: 21 On Sat, Jun 04, 2005 at 09:26:06PM +1000, Herbert Xu wrote: > On Sat, Jun 04, 2005 at 12:23:14PM +0100, Christoph Hellwig wrote: > > On Sat, Jun 04, 2005 at 09:46:23AM +1000, Herbert Xu wrote: > > > struct crypto_frag { > > > struct page *page; > > > u16 offset; > > > u16 length; > > > }; > > > > we have this structure as skb_frag_struct and bio_vec already, care > > to use the same structure with a generic name for all of them? > > I certainly would have no problems merging with skb_frag_struct. > However, merging with bio_vec would mean that either bio_vec would > have to drop down to 16-bit counters, or crypto_frag would have to > move up to 32-bit counters. the usage of 16bit counters in bio_vec doesn't make sense, and if did all others would have to move to 32bit aswell (in case we started supporting page sizes that aren't addressable by 16bits) From kaber@trash.net Sat Jun 4 09:41:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 09:41:44 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j54GfeXq028725 for ; Sat, 4 Jun 2005 09:41:40 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.50) id 1DebhT-00058G-BD; Sat, 04 Jun 2005 18:40:43 +0200 Message-ID: <42A1D98B.7030400@trash.net> Date: Sat, 04 Jun 2005 18:40:43 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.7) Gecko/20050420 Debian/1.7.7-2 X-Accept-Language: en MIME-Version: 1.0 To: mailinglist.chris@gmail.com CC: Andrew Morton , netdev@oss.sgi.com Subject: Re: Fw: kernel 2.6 libipq kernel hang References: <20050406155828.1584d7cd.akpm@osdl.org> In-Reply-To: <20050406155828.1584d7cd.akpm@osdl.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2109 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev Content-Length: 688 Lines: 21 Andrew Morton wrote: > > Begin forwarded message: > > Date: Wed, 6 Apr 2005 15:12:05 -0400 > From: Mailing List > To: linux-kernel@vger.kernel.org > Subject: kernel 2.6 libipq kernel hang > > /sbin/iptables -t mangle -A POSTROUTING -d 192.168.3.0/24 -j QUEUE > /sbin/iptables -t mangle -A PREROUTING -s 192.168.3.0/24 -j QUEUE > > If anyone has any suggestions about what I am doing wrong in either > the libipq program or the client or server programs, or any ideas > about what is going on with netlink, please let me know. Please try latest -git, Harald fixed a bug that could cause a deadlock when ip_queue was used in PRE_ROUTING. Regards Patrick From akpm@osdl.org Sat Jun 4 19:52:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 04 Jun 2005 19:52:47 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j552qdXq025446 for ; Sat, 4 Jun 2005 19:52:39 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j552pPjA030229 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sat, 4 Jun 2005 19:51:25 -0700 Received: from bix (shell0.pdx.osdl.net [10.9.0.31]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with SMTP id j552pOHd016709; Sat, 4 Jun 2005 19:51:24 -0700 Date: Sat, 4 Jun 2005 19:51:22 -0700 From: Andrew Morton To: netdev@oss.sgi.com Cc: Rommer Subject: Fw: PROBLEM: tcp_output.c bug Message-Id: <20050604195122.6a07abc7.akpm@osdl.org> X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart=_Sat__4_Jun_2005_19_51_22_-0700_Kp/TSOvd/GHsKqPd" X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2110 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akpm@osdl.org Precedence: bulk X-list: netdev Content-Length: 59528 Lines: 2610 This is a multi-part message in MIME format. --Multipart=_Sat__4_Jun_2005_19_51_22_-0700_Kp/TSOvd/GHsKqPd Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Begin forwarded message: Date: Sun, 05 Jun 2005 04:25:43 +0300 From: Rommer To: linux-kernel@vger.kernel.org Subject: PROBLEM: tcp_output.c bug [1.] My server goes to reboot for about 1 time per 2 weeks because of kernel bug in tcp_output.c [2.] My server goes to reboot because of /proc/sys/kernel/panic set to 1, but I determined the problem using netconsole module. It is a "kernel BUG at net/ipv4/tcp_output.c:919!" I looked the code on line 919 in tcp_ouput.c and found a macro BUG_ON in function tcp_retrans_try_collapse(...). I disabled calling of this function by running: echo 0 >/proc/sys/net/ipv4/tcp_retrans_collapse, and now server works fine about 4 weeks. Also I looked the code of this function in tcp_output.c from kernel 2.6.11.8 sources and it is the same. [3.] sh scripts/ver_linux Linux us401.activeby.net 2.6.9 #4 SMP Fri Apr 22 16:46:30 EEST 2005 i686 i686 i386 GNU/Linux Gnu C 3.3.2 Gnu make 3.79.1 binutils 2.14.90.0.6 util-linux 2.12 mount 2.12 module-init-tools 2.4.26 e2fsprogs 1.35 jfsutils 1.1.3 reiserfsprogs 2003-------------> reiser4progs line pcmcia-cs 3.1.31 quota-tools 3.06. PPP 2.4.1 isdn4k-utils 3.3 nfs-utils 1.0.6 Linux C Library 2.3.3 Dynamic linker (ldd) 2.3.3 Procps 3.2.0 Net-tools 1.60 Kbd 1.08 Sh-utils 5.2.1 Modules Loaded netconsole ipv6 ipt_TOS iptable_mangle ip_conntrack_ftp ip_conntrack_irc ipt_LOG ipt_limit ipt_multiport autofs ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables e100 mii ohci1394 ieee1394 sg scsi_mod parport_pc parport microcode loop thermal processor fan button battery ac ext3 jbd raid1 [4.] part of the of the log of netconsole ------------[ cut here ]------------ kernel BUG at net/ipv4/tcp_output.c:919! invalid operand: 0000 [#1] SMP Modules linked in: netconsole ipv6 ipt_TOS iptable_mangle ip_conntrack_ftp ip_co nntrack_irc ipt_LOG ipt_limit ipt_multiport autofs ipt_REJECT ipt_state iptable_ filter ip_conntrack ip_tables e100 mii ohci1394 ieee1394 sg scsi_mod parport_pc microcode parport thermal fan loop processor button battery ext3 tcp_v4_rcv+0x71 c/0x980 nf_hook_slow+0xc9/0x100 [] ip_rcv_finish+0x0/0x2a0 [] ip_rcv+0x41c/0x4e0 [] ip_rcv_finish+0x0/0x2a0 [] [] do_gettimeofday+0x20/0xc0 netif_receive_skb+0x1df/0x2d0 e100_poll+0x5ac/0x620 [e100] [] [] net_rx_action+0x81/0x110 [] __do_softirq+0xba/0xd0 [] do_softirq+0x2d/0x30 [] do_IRQ+0x105/0x130 [] unknown_bootoption+0x0/0x180 [] common_interrupt+0x18/0x20 [] default_idle+0x0/0x40 [] unknown_bootoption+0x0/0x180 [] default_idle+0x2c/0x40 [] cpu_idle+0x3b/0x50 [] [] start_kernel+0x19d/0x1e0 Code: fe unknown_bootoption+0x0/0x180e9 7f ff ff c7 44 24 08 e1 72 28 c0 54 89 24 04 24 e8 89 1c b3 7e fc ff fe 3a 0f e9 ff 0b ff c9 02 d7 c0 ca 2d 0a e9 fe ff ff 97 03 c0 8b 83 May be this log damaged because of UDP [6.] I don't know what cause the kernel panic [7.] [7.2.] cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 5 cpu MHz : 2807.502 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 5537.79 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 5 cpu MHz : 2807.502 cache size : 512 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 5603.32 processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 5 cpu MHz : 2807.502 cache size : 512 KB physical id : 3 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 5603.32 processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 5 cpu MHz : 2807.502 cache size : 512 KB physical id : 3 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 5603.32 [7.3.] cat /proc/modules netconsole 3040 - - Live 0xf8cce000 ipv6 258208 - - Live 0xf8e05000 ipt_TOS 2216 - - Live 0xf8d0c000 iptable_mangle 2536 - - Live 0xf8ca4000 ip_conntrack_ftp 72628 - - Live 0xf8d47000 ip_conntrack_irc 71636 - - Live 0xf8d34000 ipt_LOG 6856 - - Live 0xf8ce7000 ipt_limit 2248 - - Live 0xf8cd0000 ipt_multiport 1736 - - Live 0xf8ca6000 autofs 17096 - - Live 0xf8d0e000 ipt_REJECT 6792 - - Live 0xf8ce4000 ipt_state 1640 - - Live 0xf8cde000 ip_conntrack 47300 - - Live 0xf8d16000 iptable_filter 2632 - - Live 0xf8821000 ip_tables 17120 - - Live 0xf8c94000 e100 34664 - - Live 0xf8cd4000 mii 4744 - - Live 0xf8c91000 ohci1394 35564 - - Live 0xf8c9a000 ieee1394 114680 - - Live 0xf8cea000 sg 38408 - - Live 0xf8c81000 scsi_mod 124780 - - Live 0xf8ca8000 parport_pc 26208 - - Live 0xf8c79000 parport 41544 - - Live 0xf885f000 microcode 7200 - - Live 0xf884e000 loop 15696 - - Live 0xf882b000 thermal 13008 - - Live 0xf8830000 processor 17824 - - Live 0xf8848000 fan 3692 - - Live 0xf8829000 button 6328 - - Live 0xf8802000 battery 9260 - - Live 0xf8825000 ac 4524 - - Live 0xf8805000 ext3 126024 - - Live 0xf886d000 jbd 65760 - - Live 0xf8836000 raid1 16936 - - Live 0xf881b000 [7.4.] cat /proc/ioports 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 0400-047f : 0000:00:1f.0 0400-0403 : PM1a_EVT_BLK 0404-0405 : PM1a_CNT_BLK 0408-040b : PM_TMR 0428-042f : GPE0_BLK 0480-04bf : 0000:00:1f.0 0500-051f : 0000:00:1f.3 0cf8-0cff : PCI conf1 a000-afff : PCI Bus #01 a000-a07f : 0000:01:00.0 b000-b03f : 0000:02:0b.0 b000-b03f : e100 f000-f00f : 0000:00:1f.1 f000-f007 : ide0 f008-f00f : ide1 cat /proc/iomem 00000000-0009f7ff : System RAM 0009f800-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cffff : Video ROM 000d0000-000d17ff : Adapter ROM 000f0000-000fffff : System ROM 00100000-7fedffff : System RAM 00100000-002c15ac : Kernel code 002c15ad-0038721f : Kernel data 7fee0000-7fee2fff : ACPI Non-volatile Storage 7fee3000-7feeffff : ACPI Tables 7fef0000-7fefffff : reserved 7ff00000-7ff003ff : 0000:00:1f.1 e0000000-efffffff : PCI Bus #01 e0000000-efffffff : 0000:01:00.0 f0000000-f1ffffff : PCI Bus #01 f1000000-f103ffff : 0000:01:00.0 f3000000-f301ffff : 0000:02:0b.0 f3000000-f301ffff : e100 f3020000-f3020fff : 0000:02:0b.0 f3020000-f3020fff : e100 f4000000-f43fffff : 0000:00:00.0 fec00000-ffffffff : reserved [7.5.] /sbin/lspci -vvv 00:00.0 Host bridge: Intel Corp. 82875P Memory Controller Hub (rev 02) Subsystem: Asustek Computer, Inc.: Unknown device 80f6 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 00:01.0 PCI bridge: Intel Corp. 82875P Processor to AGP Controller (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Reset- FastB2B- 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev c2) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Reset- FastB2B- 00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- Region 1: I/O ports at Region 2: I/O ports at Region 3: I/O ports at Region 4: I/O ports at f000 [size=16] Region 5: Memory at 7ff00000 (32-bit, non-prefetchable) [size=1K] 00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) Subsystem: Asustek Computer, Inc. P4P800 Mainboard Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- [disabled] [size=64K] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] AGP version 2.0 Status: RQ=15 SBA- 64bit- FW- Rate=x1,x2,x4 Command: RQ=0 SBA- AGP- 64bit- FW- Rate= 02:0b.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0c) Subsystem: Intel Corp. EtherExpress PRO/100 S Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- [disabled] [size=64K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable+ DSel=0 DScale=2 PME- [7.6.] cat /proc/scsi/scsi Attached devices: Kernel config attached -- Best regards, Roman --Multipart=_Sat__4_Jun_2005_19_51_22_-0700_Kp/TSOvd/GHsKqPd Content-Type: text/plain; name="config" Content-Disposition: attachment; filename="config" Content-Transfer-Encoding: 7bit # # Automatically generated make config: don't edit # Linux kernel version: 2.6.9 # Mon Nov 22 12:11:25 2004 # CONFIG_X86=y CONFIG_MMU=y CONFIG_UID16=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y # # General setup # CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_POSIX_MQUEUE is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_LOG_BUF_SHIFT=15 CONFIG_HOTPLUG=y # CONFIG_IKCONFIG is not set # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SHMEM=y # CONFIG_TINY_SHMEM is not set # # Loadable module support # CONFIG_MODULES=y # CONFIG_MODULE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y CONFIG_MODVERSIONS=y CONFIG_KMOD=y # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set CONFIG_M686=y # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_PPRO_FENCE=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_USE_PPRO_CHECKSUM=y # CONFIG_HPET_TIMER is not set CONFIG_SMP=y CONFIG_NR_CPUS=8 # CONFIG_SCHED_SMT is not set # CONFIG_PREEMPT is not set CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_TSC=y CONFIG_X86_MCE=y # CONFIG_X86_MCE_NONFATAL is not set # CONFIG_X86_MCE_P4THERMAL is not set CONFIG_TOSHIBA=m CONFIG_I8K=m CONFIG_MICROCODE=m CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # CONFIG_EDD=m # CONFIG_NOHIGHMEM is not set # CONFIG_HIGHMEM4G is not set CONFIG_HIGHMEM64G=y CONFIG_HIGHMEM=y CONFIG_X86_PAE=y # CONFIG_HIGHPTE is not set # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y # CONFIG_EFI is not set CONFIG_IRQBALANCE=y CONFIG_HAVE_DEC_LOCK=y # CONFIG_REGPARM is not set # # Power management options (ACPI, APM) # CONFIG_PM=y # CONFIG_PM_DEBUG is not set # CONFIG_SOFTWARE_SUSPEND is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_BOOT=y CONFIG_ACPI_INTERPRETER=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y CONFIG_ACPI_AC=m CONFIG_ACPI_BATTERY=m CONFIG_ACPI_BUTTON=m CONFIG_ACPI_FAN=m CONFIG_ACPI_PROCESSOR=m CONFIG_ACPI_THERMAL=m CONFIG_ACPI_ASUS=m CONFIG_ACPI_TOSHIBA=m CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_BUS=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_PCI=y CONFIG_ACPI_SYSTEM=y # CONFIG_X86_PM_TIMER is not set # # APM (Advanced Power Management) BIOS Support # CONFIG_APM=y # CONFIG_APM_IGNORE_USER_SUSPEND is not set # CONFIG_APM_DO_ENABLE is not set CONFIG_APM_CPU_IDLE=y # CONFIG_APM_DISPLAY_BLANK is not set CONFIG_APM_RTC_IS_GMT=y # CONFIG_APM_ALLOW_INTS is not set # CONFIG_APM_REAL_MODE_POWER_OFF is not set # # CPU Frequency scaling # CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_PROC_INTF=y CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set CONFIG_CPU_FREQ_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_GOV_POWERSAVE is not set CONFIG_CPU_FREQ_GOV_USERSPACE=y # CONFIG_CPU_FREQ_GOV_ONDEMAND is not set CONFIG_CPU_FREQ_24_API=y CONFIG_CPU_FREQ_TABLE=y # # CPUFreq processor drivers # # CONFIG_X86_ACPI_CPUFREQ is not set CONFIG_X86_POWERNOW_K6=m CONFIG_X86_POWERNOW_K7=m CONFIG_X86_POWERNOW_K7_ACPI=y # CONFIG_X86_POWERNOW_K8 is not set # CONFIG_X86_GX_SUSPMOD is not set CONFIG_X86_SPEEDSTEP_CENTRINO=m CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE=y # CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI is not set CONFIG_X86_SPEEDSTEP_ICH=m # CONFIG_X86_SPEEDSTEP_SMI is not set CONFIG_X86_P4_CLOCKMOD=m CONFIG_X86_SPEEDSTEP_LIB=m # CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK is not set CONFIG_X86_LONGRUN=m CONFIG_X86_LONGHAUL=m # # Bus options (PCI, PCMCIA, EISA, MCA, ISA) # CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set # CONFIG_PCI_GOMMCONFIG is not set # CONFIG_PCI_GODIRECT is not set CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y # CONFIG_PCI_MSI is not set # CONFIG_PCI_LEGACY_PROC is not set CONFIG_PCI_NAMES=y CONFIG_ISA=y CONFIG_EISA=y # CONFIG_EISA_VLB_PRIMING is not set CONFIG_EISA_PCI_EISA=y CONFIG_EISA_VIRTUAL_ROOT=y CONFIG_EISA_NAMES=y # CONFIG_MCA is not set # CONFIG_SCx200 is not set # # PCMCIA/CardBus support # CONFIG_PCMCIA=m # CONFIG_PCMCIA_DEBUG is not set # CONFIG_YENTA is not set # CONFIG_PD6729 is not set CONFIG_I82092=m CONFIG_I82365=m CONFIG_TCIC=m CONFIG_PCMCIA_PROBE=y # # PCI Hotplug Support # CONFIG_HOTPLUG_PCI=y # CONFIG_HOTPLUG_PCI_FAKE is not set CONFIG_HOTPLUG_PCI_COMPAQ=m # CONFIG_HOTPLUG_PCI_COMPAQ_NVRAM is not set CONFIG_HOTPLUG_PCI_IBM=m CONFIG_HOTPLUG_PCI_ACPI=m # CONFIG_HOTPLUG_PCI_ACPI_IBM is not set # CONFIG_HOTPLUG_PCI_CPCI is not set # CONFIG_HOTPLUG_PCI_PCIE is not set # CONFIG_HOTPLUG_PCI_SHPC is not set # # Executable file formats # CONFIG_BINFMT_ELF=y CONFIG_BINFMT_AOUT=m CONFIG_BINFMT_MISC=m # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=m # CONFIG_DEBUG_DRIVER is not set # # Memory Technology Devices (MTD) # CONFIG_MTD=m # CONFIG_MTD_DEBUG is not set # CONFIG_MTD_PARTITIONS is not set CONFIG_MTD_CONCAT=m # # User Modules And Translation Layers # CONFIG_MTD_CHAR=m CONFIG_MTD_BLOCK=m CONFIG_MTD_BLOCK_RO=m CONFIG_FTL=m CONFIG_NFTL=m CONFIG_NFTL_RW=y CONFIG_INFTL=m # # RAM/ROM/Flash chip drivers # CONFIG_MTD_CFI=m CONFIG_MTD_JEDECPROBE=m CONFIG_MTD_GEN_PROBE=m # CONFIG_MTD_CFI_ADV_OPTIONS is not set CONFIG_MTD_MAP_BANK_WIDTH_1=y CONFIG_MTD_MAP_BANK_WIDTH_2=y CONFIG_MTD_MAP_BANK_WIDTH_4=y # CONFIG_MTD_MAP_BANK_WIDTH_8 is not set # CONFIG_MTD_MAP_BANK_WIDTH_16 is not set # CONFIG_MTD_MAP_BANK_WIDTH_32 is not set CONFIG_MTD_CFI_I1=y CONFIG_MTD_CFI_I2=y # CONFIG_MTD_CFI_I4 is not set # CONFIG_MTD_CFI_I8 is not set CONFIG_MTD_CFI_INTELEXT=m CONFIG_MTD_CFI_AMDSTD=m CONFIG_MTD_CFI_AMDSTD_RETRY=0 CONFIG_MTD_CFI_STAA=m CONFIG_MTD_CFI_UTIL=m CONFIG_MTD_RAM=m CONFIG_MTD_ROM=m CONFIG_MTD_ABSENT=m # # Mapping drivers for chip access # CONFIG_MTD_COMPLEX_MAPPINGS=y # CONFIG_MTD_PHYSMAP is not set CONFIG_MTD_SC520CDP=m CONFIG_MTD_SCx200_DOCFLASH=m CONFIG_MTD_AMD76XROM=m # CONFIG_MTD_ICHXROM is not set CONFIG_MTD_SCB2_FLASH=m CONFIG_MTD_L440GX=m CONFIG_MTD_PCI=m # # Self-contained MTD device drivers # CONFIG_MTD_PMC551=m # CONFIG_MTD_PMC551_BUGFIX is not set # CONFIG_MTD_PMC551_DEBUG is not set # CONFIG_MTD_SLRAM is not set # CONFIG_MTD_PHRAM is not set CONFIG_MTD_MTDRAM=m CONFIG_MTDRAM_TOTAL_SIZE=4096 CONFIG_MTDRAM_ERASE_SIZE=128 # CONFIG_MTD_BLKMTD is not set # # Disk-On-Chip Device Drivers # CONFIG_MTD_DOC2000=m # CONFIG_MTD_DOC2001 is not set CONFIG_MTD_DOC2001PLUS=m CONFIG_MTD_DOCPROBE=m CONFIG_MTD_DOCECC=m # CONFIG_MTD_DOCPROBE_ADVANCED is not set CONFIG_MTD_DOCPROBE_ADDRESS=0 # # NAND Flash Device Drivers # CONFIG_MTD_NAND=m # CONFIG_MTD_NAND_VERIFY_WRITE is not set CONFIG_MTD_NAND_IDS=m # CONFIG_MTD_NAND_DISKONCHIP is not set # # Parallel port support # CONFIG_PARPORT=m CONFIG_PARPORT_PC=m CONFIG_PARPORT_PC_CML1=m CONFIG_PARPORT_SERIAL=m # CONFIG_PARPORT_PC_FIFO is not set # CONFIG_PARPORT_PC_SUPERIO is not set CONFIG_PARPORT_PC_PCMCIA=m # CONFIG_PARPORT_OTHER is not set CONFIG_PARPORT_1284=y # # Plug and Play support # CONFIG_PNP=y # CONFIG_PNP_DEBUG is not set # # Protocols # CONFIG_ISAPNP=y # CONFIG_PNPBIOS is not set # # Block devices # CONFIG_BLK_DEV_FD=m CONFIG_BLK_DEV_XD=m CONFIG_PARIDE=m CONFIG_PARIDE_PARPORT=m # # Parallel IDE high-level drivers # CONFIG_PARIDE_PD=m CONFIG_PARIDE_PCD=m CONFIG_PARIDE_PF=m CONFIG_PARIDE_PT=m CONFIG_PARIDE_PG=m # # Parallel IDE protocol modules # CONFIG_PARIDE_ATEN=m CONFIG_PARIDE_BPCK=m CONFIG_PARIDE_BPCK6=m CONFIG_PARIDE_COMM=m CONFIG_PARIDE_DSTR=m CONFIG_PARIDE_FIT2=m CONFIG_PARIDE_FIT3=m CONFIG_PARIDE_EPAT=m CONFIG_PARIDE_EPATC8=y CONFIG_PARIDE_EPIA=m CONFIG_PARIDE_FRIQ=m CONFIG_PARIDE_FRPW=m CONFIG_PARIDE_KBIC=m CONFIG_PARIDE_KTTI=m CONFIG_PARIDE_ON20=m CONFIG_PARIDE_ON26=m CONFIG_BLK_CPQ_DA=m CONFIG_BLK_CPQ_CISS_DA=m CONFIG_CISS_SCSI_TAPE=y CONFIG_BLK_DEV_DAC960=m CONFIG_BLK_DEV_UMEM=m CONFIG_BLK_DEV_LOOP=m # CONFIG_BLK_DEV_CRYPTOLOOP is not set CONFIG_BLK_DEV_NBD=m # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_UB is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=8192 CONFIG_BLK_DEV_INITRD=y # CONFIG_LBD is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y CONFIG_BLK_DEV_IDECS=m CONFIG_BLK_DEV_IDECD=m CONFIG_BLK_DEV_IDETAPE=m CONFIG_BLK_DEV_IDEFLOPPY=y CONFIG_BLK_DEV_IDESCSI=m # CONFIG_IDE_TASK_IOCTL is not set # CONFIG_IDE_TASKFILE_IO is not set # # IDE chipset support/bugfixes # CONFIG_IDE_GENERIC=y CONFIG_BLK_DEV_CMD640=y # CONFIG_BLK_DEV_CMD640_ENHANCED is not set # CONFIG_BLK_DEV_IDEPNP is not set CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y # CONFIG_BLK_DEV_OFFBOARD is not set CONFIG_BLK_DEV_GENERIC=y # CONFIG_BLK_DEV_OPTI621 is not set CONFIG_BLK_DEV_RZ1000=y CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set CONFIG_BLK_DEV_AEC62XX=y CONFIG_BLK_DEV_ALI15X3=y # CONFIG_WDC_ALI15X3 is not set CONFIG_BLK_DEV_AMD74XX=y # CONFIG_BLK_DEV_ATIIXP is not set CONFIG_BLK_DEV_CMD64X=y CONFIG_BLK_DEV_TRIFLEX=y CONFIG_BLK_DEV_CY82C693=y # CONFIG_BLK_DEV_CS5520 is not set CONFIG_BLK_DEV_CS5530=y CONFIG_BLK_DEV_HPT34X=y # CONFIG_HPT34X_AUTODMA is not set CONFIG_BLK_DEV_HPT366=y # CONFIG_BLK_DEV_SC1200 is not set CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_NS87415 is not set CONFIG_BLK_DEV_PDC202XX_OLD=y # CONFIG_PDC202XX_BURST is not set CONFIG_BLK_DEV_PDC202XX_NEW=y CONFIG_PDC202XX_FORCE=y CONFIG_BLK_DEV_SVWKS=y CONFIG_BLK_DEV_SIIMAGE=y CONFIG_BLK_DEV_SIS5513=y CONFIG_BLK_DEV_SLC90E66=y # CONFIG_BLK_DEV_TRM290 is not set CONFIG_BLK_DEV_VIA82CXXX=y # CONFIG_IDE_ARM is not set # CONFIG_IDE_CHIPSETS is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_SCSI=m CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=m CONFIG_CHR_DEV_ST=m CONFIG_CHR_DEV_OSST=m CONFIG_BLK_DEV_SR=m CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=m # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # # CONFIG_SCSI_MULTI_LUN is not set CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_LOGGING=y # # SCSI Transport Attributes # CONFIG_SCSI_SPI_ATTRS=m # CONFIG_SCSI_FC_ATTRS is not set # # SCSI low-level drivers # CONFIG_BLK_DEV_3W_XXXX_RAID=m # CONFIG_SCSI_3W_9XXX is not set CONFIG_SCSI_7000FASST=m CONFIG_SCSI_ACARD=m CONFIG_SCSI_AHA152X=m CONFIG_SCSI_AHA1542=m CONFIG_SCSI_AHA1740=m CONFIG_SCSI_AACRAID=m CONFIG_SCSI_AIC7XXX=m CONFIG_AIC7XXX_CMDS_PER_DEVICE=32 CONFIG_AIC7XXX_RESET_DELAY_MS=15000 # CONFIG_AIC7XXX_PROBE_EISA_VL is not set # CONFIG_AIC7XXX_DEBUG_ENABLE is not set CONFIG_AIC7XXX_DEBUG_MASK=0 # CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set CONFIG_SCSI_AIC7XXX_OLD=m CONFIG_SCSI_AIC79XX=m CONFIG_AIC79XX_CMDS_PER_DEVICE=32 CONFIG_AIC79XX_RESET_DELAY_MS=15000 # CONFIG_AIC79XX_ENABLE_RD_STRM is not set # CONFIG_AIC79XX_DEBUG_ENABLE is not set CONFIG_AIC79XX_DEBUG_MASK=0 # CONFIG_AIC79XX_REG_PRETTY_PRINT is not set CONFIG_SCSI_DPT_I2O=m CONFIG_SCSI_IN2000=m # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set CONFIG_SCSI_SATA=y CONFIG_SCSI_SATA_SVW=m CONFIG_SCSI_ATA_PIIX=m # CONFIG_SCSI_SATA_NV is not set CONFIG_SCSI_SATA_PROMISE=m # CONFIG_SCSI_SATA_SX4 is not set CONFIG_SCSI_SATA_SIL=m # CONFIG_SCSI_SATA_SIS is not set CONFIG_SCSI_SATA_VIA=m # CONFIG_SCSI_SATA_VITESSE is not set CONFIG_SCSI_BUSLOGIC=m # CONFIG_SCSI_OMIT_FLASHPOINT is not set CONFIG_SCSI_DMX3191D=m CONFIG_SCSI_DTC3280=m CONFIG_SCSI_EATA=m CONFIG_SCSI_EATA_TAGGED_QUEUE=y # CONFIG_SCSI_EATA_LINKED_COMMANDS is not set CONFIG_SCSI_EATA_MAX_TAGS=16 CONFIG_SCSI_EATA_PIO=m CONFIG_SCSI_FUTURE_DOMAIN=m CONFIG_SCSI_GDTH=m CONFIG_SCSI_GENERIC_NCR5380=m # CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set # CONFIG_SCSI_GENERIC_NCR53C400 is not set CONFIG_SCSI_IPS=m CONFIG_SCSI_INIA100=m CONFIG_SCSI_PPA=m CONFIG_SCSI_IMM=m # CONFIG_SCSI_IZIP_EPP16 is not set # CONFIG_SCSI_IZIP_SLOW_CTR is not set CONFIG_SCSI_NCR53C406A=m CONFIG_53C700_IO_MAPPED=y CONFIG_SCSI_SYM53C8XX_2=m CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1 CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16 CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64 # CONFIG_SCSI_SYM53C8XX_IOMAPPED is not set # CONFIG_SCSI_IPR is not set CONFIG_SCSI_PAS16=m CONFIG_SCSI_PSI240I=m CONFIG_SCSI_QLOGIC_FAS=m CONFIG_SCSI_QLOGIC_ISP=m CONFIG_SCSI_QLOGIC_FC=m # CONFIG_SCSI_QLOGIC_FC_FIRMWARE is not set CONFIG_SCSI_QLOGIC_1280=m CONFIG_SCSI_QLA2XXX=m # CONFIG_SCSI_QLA21XX is not set # CONFIG_SCSI_QLA22XX is not set # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set # CONFIG_SCSI_QLA6322 is not set CONFIG_SCSI_SIM710=m CONFIG_SCSI_SYM53C416=m # CONFIG_SCSI_DC395x is not set CONFIG_SCSI_DC390T=m CONFIG_SCSI_T128=m CONFIG_SCSI_U14_34F=m # CONFIG_SCSI_U14_34F_TAGGED_QUEUE is not set # CONFIG_SCSI_U14_34F_LINKED_COMMANDS is not set CONFIG_SCSI_U14_34F_MAX_TAGS=8 CONFIG_SCSI_ULTRASTOR=m CONFIG_SCSI_NSP32=m CONFIG_SCSI_DEBUG=m # # PCMCIA SCSI adapter support # CONFIG_PCMCIA_AHA152X=m CONFIG_PCMCIA_FDOMAIN=m CONFIG_PCMCIA_NINJA_SCSI=m CONFIG_PCMCIA_QLOGIC=m # CONFIG_PCMCIA_SYM53C500 is not set # # Old CD-ROM drivers (not SCSI, not IDE) # # CONFIG_CD_NO_IDESCSI is not set # # Multi-device support (RAID and LVM) # CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=m CONFIG_MD_RAID0=m CONFIG_MD_RAID1=m # CONFIG_MD_RAID10 is not set CONFIG_MD_RAID5=m # CONFIG_MD_RAID6 is not set CONFIG_MD_MULTIPATH=m CONFIG_BLK_DEV_DM=m # CONFIG_DM_CRYPT is not set # CONFIG_DM_SNAPSHOT is not set # CONFIG_DM_MIRROR is not set # CONFIG_DM_ZERO is not set # # Fusion MPT device support # CONFIG_FUSION=m CONFIG_FUSION_MAX_SGE=40 CONFIG_FUSION_CTL=m CONFIG_FUSION_LAN=m # # IEEE 1394 (FireWire) support # CONFIG_IEEE1394=m # # Subsystem Options # # CONFIG_IEEE1394_VERBOSEDEBUG is not set # CONFIG_IEEE1394_OUI_DB is not set CONFIG_IEEE1394_EXTRA_CONFIG_ROMS=y CONFIG_IEEE1394_CONFIG_ROM_IP1394=y # # Device Drivers # # CONFIG_IEEE1394_PCILYNX is not set CONFIG_IEEE1394_OHCI1394=m # # Protocol Drivers # CONFIG_IEEE1394_VIDEO1394=m CONFIG_IEEE1394_SBP2=m CONFIG_IEEE1394_SBP2_PHYS_DMA=y CONFIG_IEEE1394_ETH1394=m CONFIG_IEEE1394_DV1394=m CONFIG_IEEE1394_RAWIO=m CONFIG_IEEE1394_CMP=m CONFIG_IEEE1394_AMDTP=m # # I2O device support # CONFIG_I2O=m # CONFIG_I2O_CONFIG is not set CONFIG_I2O_BLOCK=m CONFIG_I2O_SCSI=m CONFIG_I2O_PROC=m # # Networking support # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK_DEV=y CONFIG_UNIX=y CONFIG_NET_KEY=m CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_ROUTE_FWMARK=y CONFIG_IP_ROUTE_MULTIPATH=y CONFIG_IP_ROUTE_VERBOSE=y # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=m CONFIG_NET_IPGRE=m CONFIG_NET_IPGRE_BROADCAST=y CONFIG_IP_MROUTE=y CONFIG_IP_PIMSM_V1=y CONFIG_IP_PIMSM_V2=y # CONFIG_ARPD is not set CONFIG_SYN_COOKIES=y CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m CONFIG_INET_TUNNEL=m # # IP: Virtual Server Configuration # CONFIG_IP_VS=m # CONFIG_IP_VS_DEBUG is not set CONFIG_IP_VS_TAB_BITS=16 # # IPVS transport protocol load balancing support # # CONFIG_IP_VS_PROTO_TCP is not set # CONFIG_IP_VS_PROTO_UDP is not set # CONFIG_IP_VS_PROTO_ESP is not set # CONFIG_IP_VS_PROTO_AH is not set # # IPVS scheduler # CONFIG_IP_VS_RR=m CONFIG_IP_VS_WRR=m CONFIG_IP_VS_LC=m CONFIG_IP_VS_WLC=m CONFIG_IP_VS_LBLC=m CONFIG_IP_VS_LBLCR=m CONFIG_IP_VS_DH=m CONFIG_IP_VS_SH=m # CONFIG_IP_VS_SED is not set # CONFIG_IP_VS_NQ is not set # # IPVS application helper # CONFIG_IPV6=m # CONFIG_IPV6_PRIVACY is not set CONFIG_INET6_AH=m CONFIG_INET6_ESP=m CONFIG_INET6_IPCOMP=m CONFIG_INET6_TUNNEL=m # CONFIG_IPV6_TUNNEL is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_DEBUG is not set CONFIG_BRIDGE_NETFILTER=y # # IP: Netfilter Configuration # CONFIG_IP_NF_CONNTRACK=m # CONFIG_IP_NF_CT_ACCT is not set # CONFIG_IP_NF_CT_PROTO_SCTP is not set CONFIG_IP_NF_FTP=m CONFIG_IP_NF_IRC=m CONFIG_IP_NF_TFTP=m CONFIG_IP_NF_AMANDA=m CONFIG_IP_NF_QUEUE=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_MATCH_LIMIT=m # CONFIG_IP_NF_MATCH_IPRANGE is not set CONFIG_IP_NF_MATCH_MAC=m CONFIG_IP_NF_MATCH_PKTTYPE=m CONFIG_IP_NF_MATCH_MARK=m CONFIG_IP_NF_MATCH_MULTIPORT=m CONFIG_IP_NF_MATCH_TOS=m CONFIG_IP_NF_MATCH_RECENT=m CONFIG_IP_NF_MATCH_ECN=m CONFIG_IP_NF_MATCH_DSCP=m CONFIG_IP_NF_MATCH_AH_ESP=m CONFIG_IP_NF_MATCH_LENGTH=m CONFIG_IP_NF_MATCH_TTL=m CONFIG_IP_NF_MATCH_TCPMSS=m CONFIG_IP_NF_MATCH_HELPER=m CONFIG_IP_NF_MATCH_STATE=m CONFIG_IP_NF_MATCH_CONNTRACK=m CONFIG_IP_NF_MATCH_OWNER=m # CONFIG_IP_NF_MATCH_PHYSDEV is not set # CONFIG_IP_NF_MATCH_ADDRTYPE is not set # CONFIG_IP_NF_MATCH_REALM is not set # CONFIG_IP_NF_MATCH_SCTP is not set # CONFIG_IP_NF_MATCH_COMMENT is not set CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_TARGET_LOG=m CONFIG_IP_NF_TARGET_ULOG=m CONFIG_IP_NF_TARGET_TCPMSS=m CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=m CONFIG_IP_NF_TARGET_REDIRECT=m # CONFIG_IP_NF_TARGET_NETMAP is not set # CONFIG_IP_NF_TARGET_SAME is not set # CONFIG_IP_NF_NAT_LOCAL is not set CONFIG_IP_NF_NAT_SNMP_BASIC=m CONFIG_IP_NF_NAT_IRC=m CONFIG_IP_NF_NAT_FTP=m CONFIG_IP_NF_NAT_TFTP=m CONFIG_IP_NF_NAT_AMANDA=m CONFIG_IP_NF_MANGLE=m CONFIG_IP_NF_TARGET_TOS=m CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_DSCP=m CONFIG_IP_NF_TARGET_MARK=m # CONFIG_IP_NF_TARGET_CLASSIFY is not set # CONFIG_IP_NF_RAW is not set CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_ARP_MANGLE=m CONFIG_IP_NF_COMPAT_IPCHAINS=m CONFIG_IP_NF_COMPAT_IPFWADM=m # # IPv6: Netfilter Configuration # # CONFIG_IP6_NF_QUEUE is not set CONFIG_IP6_NF_IPTABLES=m CONFIG_IP6_NF_MATCH_LIMIT=m CONFIG_IP6_NF_MATCH_MAC=m CONFIG_IP6_NF_MATCH_RT=m CONFIG_IP6_NF_MATCH_OPTS=m CONFIG_IP6_NF_MATCH_FRAG=m CONFIG_IP6_NF_MATCH_HL=m CONFIG_IP6_NF_MATCH_MULTIPORT=m CONFIG_IP6_NF_MATCH_OWNER=m CONFIG_IP6_NF_MATCH_MARK=m CONFIG_IP6_NF_MATCH_IPV6HEADER=m CONFIG_IP6_NF_MATCH_AHESP=m CONFIG_IP6_NF_MATCH_LENGTH=m CONFIG_IP6_NF_MATCH_EUI64=m # CONFIG_IP6_NF_MATCH_PHYSDEV is not set CONFIG_IP6_NF_FILTER=m CONFIG_IP6_NF_TARGET_LOG=m CONFIG_IP6_NF_MANGLE=m CONFIG_IP6_NF_TARGET_MARK=m # CONFIG_IP6_NF_RAW is not set # # Bridge: Netfilter Configuration # # CONFIG_BRIDGE_NF_EBTABLES is not set CONFIG_XFRM=y CONFIG_XFRM_USER=m # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set CONFIG_ATM=y CONFIG_ATM_CLIP=y # CONFIG_ATM_CLIP_NO_ICMP is not set CONFIG_ATM_LANE=m CONFIG_ATM_MPOA=m CONFIG_ATM_BR2684=m CONFIG_ATM_BR2684_IPFILTER=y CONFIG_BRIDGE=m CONFIG_VLAN_8021Q=m # CONFIG_DECNET is not set CONFIG_LLC=y # CONFIG_LLC2 is not set CONFIG_IPX=m # CONFIG_IPX_INTERN is not set CONFIG_ATALK=m CONFIG_DEV_APPLETALK=y CONFIG_LTPC=m CONFIG_COPS=m CONFIG_COPS_DAYNA=y CONFIG_COPS_TANGENT=y CONFIG_IPDDP=m CONFIG_IPDDP_ENCAP=y CONFIG_IPDDP_DECAP=y # CONFIG_X25 is not set # CONFIG_LAPB is not set CONFIG_NET_DIVERT=y # CONFIG_ECONET is not set CONFIG_WAN_ROUTER=m # CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # CONFIG_NET_SCHED=y CONFIG_NET_SCH_CLK_JIFFIES=y # CONFIG_NET_SCH_CLK_GETTIMEOFDAY is not set # CONFIG_NET_SCH_CLK_CPU is not set CONFIG_NET_SCH_CBQ=m CONFIG_NET_SCH_HTB=m # CONFIG_NET_SCH_HFSC is not set # CONFIG_NET_SCH_ATM is not set CONFIG_NET_SCH_PRIO=m CONFIG_NET_SCH_RED=m CONFIG_NET_SCH_SFQ=m CONFIG_NET_SCH_TEQL=m CONFIG_NET_SCH_TBF=m CONFIG_NET_SCH_GRED=m CONFIG_NET_SCH_DSMARK=m # CONFIG_NET_SCH_NETEM is not set CONFIG_NET_SCH_INGRESS=m CONFIG_NET_QOS=y CONFIG_NET_ESTIMATOR=y CONFIG_NET_CLS=y CONFIG_NET_CLS_TCINDEX=m CONFIG_NET_CLS_ROUTE4=m CONFIG_NET_CLS_ROUTE=y CONFIG_NET_CLS_FW=m CONFIG_NET_CLS_U32=m # CONFIG_CLS_U32_PERF is not set # CONFIG_NET_CLS_IND is not set CONFIG_NET_CLS_RSVP=m CONFIG_NET_CLS_RSVP6=m # CONFIG_NET_CLS_ACT is not set CONFIG_NET_CLS_POLICE=y # # Network testing # # CONFIG_NET_PKTGEN is not set CONFIG_NETPOLL=y # CONFIG_NETPOLL_RX is not set # CONFIG_NETPOLL_TRAP is not set CONFIG_NET_POLL_CONTROLLER=y # CONFIG_HAMRADIO is not set CONFIG_IRDA=m # # IrDA protocols # CONFIG_IRLAN=m CONFIG_IRNET=m CONFIG_IRCOMM=m CONFIG_IRDA_ULTRA=y # # IrDA options # CONFIG_IRDA_CACHE_LAST_LSAP=y CONFIG_IRDA_FAST_RR=y # CONFIG_IRDA_DEBUG is not set # # Infrared-port device drivers # # # SIR device drivers # CONFIG_IRTTY_SIR=m # # Dongle support # CONFIG_DONGLE=y CONFIG_ESI_DONGLE=m CONFIG_ACTISYS_DONGLE=m CONFIG_TEKRAM_DONGLE=m CONFIG_LITELINK_DONGLE=m CONFIG_MA600_DONGLE=m CONFIG_GIRBIL_DONGLE=m CONFIG_MCP2120_DONGLE=m CONFIG_OLD_BELKIN_DONGLE=m CONFIG_ACT200L_DONGLE=m # # Old SIR device drivers # # # Old Serial dongle support # # # FIR device drivers # CONFIG_USB_IRDA=m # CONFIG_SIGMATEL_FIR is not set CONFIG_NSC_FIR=m CONFIG_WINBOND_FIR=m CONFIG_TOSHIBA_FIR=m CONFIG_SMC_IRCC_FIR=m CONFIG_ALI_FIR=m CONFIG_VLSI_FIR=m # CONFIG_VIA_FIR is not set # CONFIG_BT is not set CONFIG_NETDEVICES=y CONFIG_DUMMY=m CONFIG_BONDING=m CONFIG_EQUALIZER=m CONFIG_TUN=m CONFIG_ETHERTAP=m CONFIG_NET_SB1000=m # # ARCnet devices # # CONFIG_ARCNET is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=m CONFIG_HAPPYMEAL=m CONFIG_SUNGEM=m CONFIG_NET_VENDOR_3COM=y CONFIG_EL1=m CONFIG_EL2=m CONFIG_ELPLUS=m CONFIG_EL16=m CONFIG_EL3=m CONFIG_3C515=m CONFIG_VORTEX=m CONFIG_TYPHOON=m CONFIG_LANCE=m CONFIG_NET_VENDOR_SMC=y CONFIG_WD80x3=m CONFIG_ULTRA=m CONFIG_ULTRA32=m CONFIG_SMC9194=m CONFIG_NET_VENDOR_RACAL=y CONFIG_NI52=m CONFIG_NI65=m # # Tulip family network device support # # CONFIG_NET_TULIP is not set CONFIG_AT1700=m CONFIG_DEPCA=m CONFIG_HP100=m CONFIG_NET_ISA=y CONFIG_E2100=m # CONFIG_EWRK3 is not set CONFIG_EEXPRESS=m CONFIG_EEXPRESS_PRO=m CONFIG_HPLAN_PLUS=m CONFIG_HPLAN=m CONFIG_LP486E=m CONFIG_ETH16I=m CONFIG_NE2000=m # CONFIG_ZNET is not set # CONFIG_SEEQ8005 is not set CONFIG_NET_PCI=y CONFIG_PCNET32=m CONFIG_AMD8111_ETH=m # CONFIG_AMD8111E_NAPI is not set CONFIG_ADAPTEC_STARFIRE=m # CONFIG_ADAPTEC_STARFIRE_NAPI is not set CONFIG_AC3200=m CONFIG_APRICOT=m CONFIG_B44=m # CONFIG_FORCEDETH is not set CONFIG_CS89x0=m CONFIG_DGRS=m CONFIG_EEPRO100=m # CONFIG_EEPRO100_PIO is not set CONFIG_E100=m # CONFIG_E100_NAPI is not set CONFIG_LNE390=m CONFIG_FEALNX=m CONFIG_NATSEMI=m CONFIG_NE2K_PCI=m CONFIG_NE3210=m CONFIG_ES3210=m CONFIG_8139CP=m CONFIG_8139TOO=m CONFIG_8139TOO_PIO=y # CONFIG_8139TOO_TUNE_TWISTER is not set CONFIG_8139TOO_8129=y # CONFIG_8139_OLD_RX_RESET is not set CONFIG_SIS900=m CONFIG_EPIC100=m CONFIG_SUNDANCE=m # CONFIG_SUNDANCE_MMIO is not set CONFIG_TLAN=m CONFIG_VIA_RHINE=m # CONFIG_VIA_RHINE_MMIO is not set # CONFIG_VIA_VELOCITY is not set CONFIG_NET_POCKET=y CONFIG_ATP=m CONFIG_DE600=m CONFIG_DE620=m # # Ethernet (1000 Mbit) # CONFIG_ACENIC=m # CONFIG_ACENIC_OMIT_TIGON_I is not set CONFIG_DL2K=m CONFIG_E1000=m CONFIG_E1000_NAPI=y CONFIG_NS83820=m CONFIG_HAMACHI=m CONFIG_YELLOWFIN=m CONFIG_R8169=m # CONFIG_R8169_NAPI is not set CONFIG_SK98LIN=m CONFIG_TIGON3=m # # Ethernet (10000 Mbit) # # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # CONFIG_TR=y CONFIG_IBMTR=m CONFIG_IBMOL=m CONFIG_IBMLS=m CONFIG_3C359=m CONFIG_TMS380TR=m CONFIG_TMSPCI=m # CONFIG_SKISA is not set # CONFIG_PROTEON is not set CONFIG_ABYSS=m CONFIG_SMCTR=m # # Wireless LAN (non-hamradio) # CONFIG_NET_RADIO=y # # Obsolete Wireless cards support (pre-802.11) # CONFIG_STRIP=m CONFIG_ARLAN=m CONFIG_WAVELAN=m CONFIG_PCMCIA_WAVELAN=m CONFIG_PCMCIA_NETWAVE=m # # Wireless 802.11 Frequency Hopping cards support # CONFIG_PCMCIA_RAYCS=m # # Wireless 802.11b ISA/PCI cards support # CONFIG_AIRO=m CONFIG_HERMES=m CONFIG_PLX_HERMES=m # CONFIG_TMD_HERMES is not set CONFIG_PCI_HERMES=m # CONFIG_ATMEL is not set # # Wireless 802.11b Pcmcia/Cardbus cards support # CONFIG_PCMCIA_HERMES=m CONFIG_AIRO_CS=m # CONFIG_PCMCIA_WL3501 is not set # # Prism GT/Duette 802.11(a/b/g) PCI/Cardbus support # # CONFIG_PRISM54 is not set CONFIG_NET_WIRELESS=y # # PCMCIA network device support # CONFIG_NET_PCMCIA=y CONFIG_PCMCIA_3C589=m CONFIG_PCMCIA_3C574=m CONFIG_PCMCIA_FMVJ18X=m CONFIG_PCMCIA_PCNET=m CONFIG_PCMCIA_NMCLAN=m CONFIG_PCMCIA_SMC91C92=m CONFIG_PCMCIA_XIRC2PS=m CONFIG_PCMCIA_AXNET=m CONFIG_PCMCIA_IBMTR=m # # Wan interfaces # CONFIG_WAN=y CONFIG_HOSTESS_SV11=m CONFIG_COSA=m # CONFIG_DSCC4 is not set # CONFIG_LANMEDIA is not set CONFIG_SEALEVEL_4021=m # CONFIG_SYNCLINK_SYNCPPP is not set # CONFIG_HDLC is not set CONFIG_DLCI=m CONFIG_DLCI_COUNT=24 CONFIG_DLCI_MAX=8 CONFIG_SDLA=m CONFIG_WAN_ROUTER_DRIVERS=y CONFIG_CYCLADES_SYNC=m CONFIG_CYCLOMX_X25=y CONFIG_SBNI=m CONFIG_SBNI_MULTILINE=y # # ATM drivers # CONFIG_ATM_TCP=m CONFIG_ATM_LANAI=m CONFIG_ATM_ENI=m # CONFIG_ATM_ENI_DEBUG is not set # CONFIG_ATM_ENI_TUNE_BURST is not set CONFIG_ATM_FIRESTREAM=m CONFIG_ATM_ZATM=m # CONFIG_ATM_ZATM_DEBUG is not set CONFIG_ATM_NICSTAR=m CONFIG_ATM_NICSTAR_USE_SUNI=y CONFIG_ATM_NICSTAR_USE_IDT77105=y CONFIG_ATM_IDT77252=m # CONFIG_ATM_IDT77252_DEBUG is not set # CONFIG_ATM_IDT77252_RCV_ALL is not set CONFIG_ATM_IDT77252_USE_SUNI=y CONFIG_ATM_AMBASSADOR=m # CONFIG_ATM_AMBASSADOR_DEBUG is not set CONFIG_ATM_HORIZON=m # CONFIG_ATM_HORIZON_DEBUG is not set CONFIG_ATM_IA=m # CONFIG_ATM_IA_DEBUG is not set CONFIG_ATM_FORE200E_MAYBE=m CONFIG_ATM_FORE200E_PCA=y CONFIG_ATM_FORE200E_PCA_DEFAULT_FW=y # CONFIG_ATM_FORE200E_USE_TASKLET is not set CONFIG_ATM_FORE200E_TX_RETRY=16 CONFIG_ATM_FORE200E_DEBUG=0 CONFIG_ATM_FORE200E=m CONFIG_ATM_HE=m # CONFIG_ATM_HE_USE_SUNI is not set CONFIG_FDDI=y CONFIG_DEFXX=m CONFIG_SKFP=m # CONFIG_HIPPI is not set CONFIG_PLIP=m CONFIG_PPP=m CONFIG_PPP_MULTILINK=y CONFIG_PPP_FILTER=y CONFIG_PPP_ASYNC=m CONFIG_PPP_SYNC_TTY=m CONFIG_PPP_DEFLATE=m # CONFIG_PPP_BSDCOMP is not set CONFIG_PPPOE=m CONFIG_PPPOATM=m CONFIG_SLIP=m CONFIG_SLIP_COMPRESSED=y CONFIG_SLIP_SMART=y CONFIG_SLIP_MODE_SLIP6=y CONFIG_NET_FC=y CONFIG_SHAPER=m CONFIG_NETCONSOLE=m # # ISDN subsystem # CONFIG_ISDN=m # # Old ISDN4Linux # # CONFIG_ISDN_I4L is not set # # CAPI subsystem # CONFIG_ISDN_CAPI=m CONFIG_ISDN_DRV_AVMB1_VERBOSE_REASON=y CONFIG_ISDN_CAPI_MIDDLEWARE=y CONFIG_ISDN_CAPI_CAPI20=m CONFIG_ISDN_CAPI_CAPIFS_BOOL=y CONFIG_ISDN_CAPI_CAPIFS=m # # CAPI hardware drivers # # # Active AVM cards # # CONFIG_CAPI_AVM is not set # # Active Eicon DIVA Server cards # # CONFIG_CAPI_EICON is not set # # Telephony Support # CONFIG_PHONE=m CONFIG_PHONE_IXJ=m CONFIG_PHONE_IXJ_PCMCIA=m # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 CONFIG_INPUT_JOYDEV=m # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=m # CONFIG_INPUT_EVBUG is not set # # Input I/O drivers # # CONFIG_GAMEPORT is not set CONFIG_SOUND_GAMEPORT=y CONFIG_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=y # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PARKBD is not set # CONFIG_SERIO_PCIPS2 is not set # CONFIG_SERIO_RAW is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_INPORT is not set # CONFIG_MOUSE_LOGIBM is not set # CONFIG_MOUSE_PC110PAD is not set # CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_SERIAL_NONSTANDARD=y CONFIG_ROCKETPORT=m CONFIG_CYCLADES=m # CONFIG_CYZ_INTR is not set CONFIG_SYNCLINK=m # CONFIG_SYNCLINKMP is not set CONFIG_N_HDLC=m CONFIG_STALDRV=y # # Serial drivers # CONFIG_SERIAL_8250=m # CONFIG_SERIAL_8250_CS is not set # CONFIG_SERIAL_8250_ACPI is not set CONFIG_SERIAL_8250_NR_UARTS=4 # CONFIG_SERIAL_8250_EXTENDED is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=m CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 CONFIG_PRINTER=m CONFIG_LP_CONSOLE=y CONFIG_PPDEV=m CONFIG_TIPAR=m # # IPMI # CONFIG_IPMI_HANDLER=m # CONFIG_IPMI_PANIC_EVENT is not set CONFIG_IPMI_DEVICE_INTERFACE=m # CONFIG_IPMI_SI is not set CONFIG_IPMI_WATCHDOG=m # CONFIG_IPMI_POWEROFF is not set # # Watchdog Cards # CONFIG_WATCHDOG=y # CONFIG_WATCHDOG_NOWAYOUT is not set # # Watchdog Device Drivers # CONFIG_SOFT_WATCHDOG=m CONFIG_ACQUIRE_WDT=m CONFIG_ADVANTECH_WDT=m CONFIG_ALIM1535_WDT=m CONFIG_ALIM7101_WDT=m CONFIG_SC520_WDT=m CONFIG_EUROTECH_WDT=m CONFIG_IB700_WDT=m CONFIG_WAFER_WDT=m # CONFIG_I8XX_TCO is not set CONFIG_SC1200_WDT=m # CONFIG_SCx200_WDT is not set # CONFIG_60XX_WDT is not set # CONFIG_CPU5_WDT is not set # CONFIG_W83627HF_WDT is not set CONFIG_W83877F_WDT=m CONFIG_MACHZ_WDT=m # # ISA-based Watchdog Cards # CONFIG_PCWATCHDOG=m # CONFIG_MIXCOMWD is not set CONFIG_WDT=m # CONFIG_WDT_501 is not set # # PCI-based Watchdog Cards # # CONFIG_PCIPCWATCHDOG is not set CONFIG_WDTPCI=m # CONFIG_WDT_501_PCI is not set # # USB-based Watchdog Cards # # CONFIG_USBPCWATCHDOG is not set # CONFIG_HW_RANDOM is not set CONFIG_NVRAM=m CONFIG_RTC=y CONFIG_DTLK=m CONFIG_R3964=m # CONFIG_APPLICOM is not set CONFIG_SONYPI=m # # Ftape, the floppy tape device driver # CONFIG_AGP=m CONFIG_AGP_ALI=m CONFIG_AGP_ATI=m CONFIG_AGP_AMD=m # CONFIG_AGP_AMD64 is not set CONFIG_AGP_INTEL=m # CONFIG_AGP_INTEL_MCH is not set CONFIG_AGP_NVIDIA=m CONFIG_AGP_SIS=m CONFIG_AGP_SWORKS=m CONFIG_AGP_VIA=m # CONFIG_AGP_EFFICEON is not set CONFIG_DRM=y CONFIG_DRM_TDFX=m CONFIG_DRM_R128=m CONFIG_DRM_RADEON=m CONFIG_DRM_I810=m CONFIG_DRM_I830=m # CONFIG_DRM_I915 is not set CONFIG_DRM_MGA=m CONFIG_DRM_SIS=m # # PCMCIA character devices # CONFIG_SYNCLINK_CS=m CONFIG_MWAVE=m # CONFIG_RAW_DRIVER is not set # CONFIG_HPET is not set # CONFIG_HANGCHECK_TIMER is not set # # I2C support # CONFIG_I2C=m CONFIG_I2C_CHARDEV=m # # I2C Algorithms # CONFIG_I2C_ALGOBIT=m CONFIG_I2C_ALGOPCF=m # CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support # CONFIG_I2C_ALI1535=m # CONFIG_I2C_ALI1563 is not set CONFIG_I2C_ALI15X3=m CONFIG_I2C_AMD756=m # CONFIG_I2C_AMD8111 is not set CONFIG_I2C_I801=m CONFIG_I2C_I810=m CONFIG_I2C_ISA=m # CONFIG_I2C_NFORCE2 is not set CONFIG_I2C_PARPORT=m # CONFIG_I2C_PARPORT_LIGHT is not set CONFIG_I2C_PIIX4=m # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_SCx200_ACB is not set CONFIG_I2C_SIS5595=m # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set CONFIG_I2C_VIA=m CONFIG_I2C_VIAPRO=m CONFIG_I2C_VOODOO3=m # CONFIG_I2C_PCA_ISA is not set # # Hardware Sensors Chip support # CONFIG_I2C_SENSOR=m CONFIG_SENSORS_ADM1021=m CONFIG_SENSORS_ADM1025=m # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set CONFIG_SENSORS_DS1621=m # CONFIG_SENSORS_FSCHER is not set CONFIG_SENSORS_GL518SM=m CONFIG_SENSORS_IT87=m CONFIG_SENSORS_LM75=m # CONFIG_SENSORS_LM77 is not set CONFIG_SENSORS_LM78=m CONFIG_SENSORS_LM80=m # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set CONFIG_SENSORS_SMSC47M1=m CONFIG_SENSORS_VIA686A=m CONFIG_SENSORS_W83781D=m # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # # Other I2C Chip support # CONFIG_SENSORS_EEPROM=m CONFIG_SENSORS_PCF8574=m CONFIG_SENSORS_PCF8591=m # CONFIG_SENSORS_RTC8564 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Misc devices # # CONFIG_IBM_ASM is not set # # Multimedia devices # CONFIG_VIDEO_DEV=m # # Video For Linux # # # Video Adapters # CONFIG_VIDEO_BT848=m CONFIG_VIDEO_PMS=m CONFIG_VIDEO_BWQCAM=m CONFIG_VIDEO_CQCAM=m CONFIG_VIDEO_W9966=m CONFIG_VIDEO_CPIA=m CONFIG_VIDEO_CPIA_PP=m CONFIG_VIDEO_CPIA_USB=m # CONFIG_VIDEO_SAA5246A is not set CONFIG_VIDEO_SAA5249=m CONFIG_TUNER_3036=m CONFIG_VIDEO_STRADIS=m CONFIG_VIDEO_ZORAN=m CONFIG_VIDEO_ZORAN_BUZ=m CONFIG_VIDEO_ZORAN_DC10=m # CONFIG_VIDEO_ZORAN_DC30 is not set CONFIG_VIDEO_ZORAN_LML33=m # CONFIG_VIDEO_ZORAN_LML33R10 is not set # CONFIG_VIDEO_SAA7134 is not set # CONFIG_VIDEO_MXB is not set # CONFIG_VIDEO_DPC is not set # CONFIG_VIDEO_HEXIUM_ORION is not set # CONFIG_VIDEO_HEXIUM_GEMINI is not set # CONFIG_VIDEO_CX88 is not set # CONFIG_VIDEO_OVCAMCHIP is not set # # Radio Adapters # CONFIG_RADIO_CADET=m CONFIG_RADIO_RTRACK=m CONFIG_RADIO_RTRACK2=m CONFIG_RADIO_AZTECH=m CONFIG_RADIO_GEMTEK=m CONFIG_RADIO_GEMTEK_PCI=m CONFIG_RADIO_MAXIRADIO=m CONFIG_RADIO_MAESTRO=m CONFIG_RADIO_SF16FMI=m CONFIG_RADIO_SF16FMR2=m CONFIG_RADIO_TERRATEC=m CONFIG_RADIO_TRUST=m CONFIG_RADIO_TYPHOON=m CONFIG_RADIO_TYPHOON_PROC_FS=y CONFIG_RADIO_ZOLTRIX=m # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set CONFIG_VIDEO_TUNER=m CONFIG_VIDEO_BUF=m CONFIG_VIDEO_BTCX=m CONFIG_VIDEO_IR=m # # Graphics support # CONFIG_FB=y CONFIG_FB_MODE_HELPERS=y # CONFIG_FB_CIRRUS is not set CONFIG_FB_PM2=m # CONFIG_FB_PM2_FIFO_DISCONNECT is not set # CONFIG_FB_CYBER2000 is not set # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set CONFIG_FB_VGA16=m CONFIG_FB_VESA=y CONFIG_VIDEO_SELECT=y CONFIG_FB_HGA=m # CONFIG_FB_HGA_ACCEL is not set CONFIG_FB_RIVA=m # CONFIG_FB_RIVA_I2C is not set # CONFIG_FB_RIVA_DEBUG is not set # CONFIG_FB_I810 is not set CONFIG_FB_MATROX=m CONFIG_FB_MATROX_MILLENIUM=y CONFIG_FB_MATROX_MYSTIQUE=y CONFIG_FB_MATROX_G450=y CONFIG_FB_MATROX_G100=y CONFIG_FB_MATROX_I2C=m CONFIG_FB_MATROX_MAVEN=m CONFIG_FB_MATROX_MULTIHEAD=y # CONFIG_FB_RADEON_OLD is not set CONFIG_FB_RADEON=m CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set CONFIG_FB_ATY128=m CONFIG_FB_ATY=m CONFIG_FB_ATY_CT=y CONFIG_FB_ATY_GX=y # CONFIG_FB_ATY_XL_INIT is not set CONFIG_FB_SIS=m CONFIG_FB_SIS_300=y CONFIG_FB_SIS_315=y CONFIG_FB_NEOMAGIC=m # CONFIG_FB_KYRO is not set CONFIG_FB_3DFX=m # CONFIG_FB_3DFX_ACCEL is not set CONFIG_FB_VOODOO1=m # CONFIG_FB_TRIDENT is not set # CONFIG_FB_VIRTUAL is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y CONFIG_MDA_CONSOLE=m CONFIG_DUMMY_CONSOLE=y # CONFIG_FRAMEBUFFER_CONSOLE is not set # # Logo configuration # # CONFIG_LOGO is not set # # Sound # CONFIG_SOUND=m # # Advanced Linux Sound Architecture # # CONFIG_SND is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB=m # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_SUSPEND is not set # CONFIG_USB_OTG is not set # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=m # CONFIG_USB_EHCI_SPLIT_ISO is not set # CONFIG_USB_EHCI_ROOT_HUB_TT is not set # CONFIG_USB_OHCI_HCD is not set # CONFIG_USB_UHCI_HCD is not set # # USB Device Class drivers # CONFIG_USB_AUDIO=m # CONFIG_USB_BLUETOOTH_TTY is not set CONFIG_USB_MIDI=m CONFIG_USB_ACM=m CONFIG_USB_PRINTER=m CONFIG_USB_STORAGE=m # CONFIG_USB_STORAGE_DEBUG is not set # CONFIG_USB_STORAGE_RW_DETECT is not set CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_FREECOM=y CONFIG_USB_STORAGE_ISD200=y CONFIG_USB_STORAGE_DPCM=y CONFIG_USB_STORAGE_HP8200e=y CONFIG_USB_STORAGE_SDDR09=y CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y # # USB Human Interface Devices (HID) # CONFIG_USB_HID=m CONFIG_USB_HIDINPUT=y # CONFIG_HID_FF is not set # CONFIG_USB_HIDDEV is not set # # USB HID Boot Protocol drivers # # CONFIG_USB_KBD is not set # CONFIG_USB_MOUSE is not set CONFIG_USB_AIPTEK=m CONFIG_USB_WACOM=m CONFIG_USB_KBTAB=m CONFIG_USB_POWERMATE=m # CONFIG_USB_MTOUCH is not set # CONFIG_USB_EGALAX is not set # CONFIG_USB_XPAD is not set # CONFIG_USB_ATI_REMOTE is not set # # USB Imaging devices # CONFIG_USB_MDC800=m CONFIG_USB_MICROTEK=m CONFIG_USB_HPUSBSCSI=m # # USB Multimedia devices # CONFIG_USB_DABUSB=m CONFIG_USB_VICAM=m CONFIG_USB_DSBR=m CONFIG_USB_IBMCAM=m CONFIG_USB_KONICAWC=m CONFIG_USB_OV511=m CONFIG_USB_SE401=m # CONFIG_USB_SN9C102 is not set CONFIG_USB_STV680=m # # USB Network adaptors # CONFIG_USB_CATC=m CONFIG_USB_KAWETH=m CONFIG_USB_PEGASUS=m CONFIG_USB_RTL8150=m CONFIG_USB_USBNET=m # # USB Host-to-Host Cables # CONFIG_USB_ALI_M5632=y CONFIG_USB_AN2720=y CONFIG_USB_BELKIN=y CONFIG_USB_GENESYS=y CONFIG_USB_NET1080=y CONFIG_USB_PL2301=y # # Intelligent USB Devices/Gadgets # CONFIG_USB_ARMLINUX=y CONFIG_USB_EPSON2888=y CONFIG_USB_ZAURUS=y CONFIG_USB_CDCETHER=y # # USB Network Adapters # CONFIG_USB_AX8817X=y # # USB port drivers # CONFIG_USB_USS720=m # # USB Serial Converter support # CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_GENERIC=y CONFIG_USB_SERIAL_BELKIN=m CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m CONFIG_USB_SERIAL_EMPEG=m CONFIG_USB_SERIAL_FTDI_SIO=m CONFIG_USB_SERIAL_VISOR=m CONFIG_USB_SERIAL_IPAQ=m CONFIG_USB_SERIAL_IR=m CONFIG_USB_SERIAL_EDGEPORT=m CONFIG_USB_SERIAL_EDGEPORT_TI=m CONFIG_USB_SERIAL_KEYSPAN_PDA=m CONFIG_USB_SERIAL_KEYSPAN=m # CONFIG_USB_SERIAL_KEYSPAN_MPR is not set # CONFIG_USB_SERIAL_KEYSPAN_USA28 is not set CONFIG_USB_SERIAL_KEYSPAN_USA28X=y CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y # CONFIG_USB_SERIAL_KEYSPAN_USA19 is not set # CONFIG_USB_SERIAL_KEYSPAN_USA18X is not set CONFIG_USB_SERIAL_KEYSPAN_USA19W=y CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y CONFIG_USB_SERIAL_KEYSPAN_USA49W=y CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y CONFIG_USB_SERIAL_KLSI=m CONFIG_USB_SERIAL_KOBIL_SCT=m CONFIG_USB_SERIAL_MCT_U232=m CONFIG_USB_SERIAL_PL2303=m # CONFIG_USB_SERIAL_SAFE is not set CONFIG_USB_SERIAL_CYBERJACK=m CONFIG_USB_SERIAL_XIRCOM=m CONFIG_USB_SERIAL_OMNINET=m CONFIG_USB_EZUSB=y # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set CONFIG_USB_TIGL=m CONFIG_USB_AUERSWALD=m CONFIG_USB_RIO500=m # CONFIG_USB_LEGOTOWER is not set CONFIG_USB_LCD=m # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set CONFIG_USB_SPEEDTOUCH=m # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_TEST is not set # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # File systems # CONFIG_EXT2_FS=y # CONFIG_EXT2_FS_XATTR is not set CONFIG_EXT3_FS=m CONFIG_EXT3_FS_XATTR=y CONFIG_EXT3_FS_POSIX_ACL=y # CONFIG_EXT3_FS_SECURITY is not set CONFIG_JBD=m # CONFIG_JBD_DEBUG is not set CONFIG_FS_MBCACHE=y CONFIG_REISERFS_FS=m # CONFIG_REISERFS_CHECK is not set CONFIG_REISERFS_PROC_INFO=y # CONFIG_REISERFS_FS_XATTR is not set CONFIG_JFS_FS=m # CONFIG_JFS_POSIX_ACL is not set CONFIG_JFS_DEBUG=y # CONFIG_JFS_STATISTICS is not set CONFIG_FS_POSIX_ACL=y # CONFIG_XFS_FS is not set CONFIG_MINIX_FS=m CONFIG_ROMFS_FS=m CONFIG_QUOTA=y # CONFIG_QFMT_V1 is not set CONFIG_QFMT_V2=y CONFIG_QUOTACTL=y CONFIG_AUTOFS_FS=m CONFIG_AUTOFS4_FS=m # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set # CONFIG_DEVPTS_FS_XATTR is not set CONFIG_TMPFS=y # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set CONFIG_HFS_FS=m CONFIG_HFSPLUS_FS=m CONFIG_BEFS_FS=m # CONFIG_BEFS_DEBUG is not set CONFIG_BFS_FS=m # CONFIG_EFS_FS is not set CONFIG_JFFS_FS=m CONFIG_JFFS_FS_VERBOSE=0 CONFIG_JFFS_PROC_FS=y CONFIG_JFFS2_FS=m CONFIG_JFFS2_FS_DEBUG=0 # CONFIG_JFFS2_FS_NAND is not set # CONFIG_JFFS2_COMPRESSION_OPTIONS is not set CONFIG_JFFS2_ZLIB=y CONFIG_JFFS2_RTIME=y # CONFIG_JFFS2_RUBIN is not set CONFIG_CRAMFS=m CONFIG_VXFS_FS=m # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set CONFIG_SYSV_FS=m CONFIG_UFS_FS=m # CONFIG_UFS_FS_WRITE is not set # # Network File Systems # CONFIG_NFS_FS=m CONFIG_NFS_V3=y # CONFIG_NFS_V4 is not set CONFIG_NFS_DIRECTIO=y CONFIG_NFSD=m CONFIG_NFSD_V3=y # CONFIG_NFSD_V4 is not set CONFIG_NFSD_TCP=y CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_EXPORTFS=m CONFIG_SUNRPC=m # CONFIG_RPCSEC_GSS_KRB5 is not set # CONFIG_RPCSEC_GSS_SPKM3 is not set CONFIG_SMB_FS=m # CONFIG_SMB_NLS_DEFAULT is not set # CONFIG_CIFS is not set CONFIG_NCP_FS=m CONFIG_NCPFS_PACKET_SIGNING=y CONFIG_NCPFS_IOCTL_LOCKING=y CONFIG_NCPFS_STRONG=y CONFIG_NCPFS_NFS_NS=y CONFIG_NCPFS_OS2_NS=y CONFIG_NCPFS_SMALLDOS=y CONFIG_NCPFS_NLS=y CONFIG_NCPFS_EXTRAS=y CONFIG_CODA_FS=m # CONFIG_CODA_FS_OLD_API is not set CONFIG_AFS_FS=m CONFIG_RXRPC=m # # Partition Types # CONFIG_PARTITION_ADVANCED=y # CONFIG_ACORN_PARTITION is not set CONFIG_OSF_PARTITION=y # CONFIG_AMIGA_PARTITION is not set # CONFIG_ATARI_PARTITION is not set CONFIG_MAC_PARTITION=y CONFIG_MSDOS_PARTITION=y CONFIG_BSD_DISKLABEL=y CONFIG_MINIX_SUBPARTITION=y CONFIG_SOLARIS_X86_PARTITION=y CONFIG_UNIXWARE_DISKLABEL=y # CONFIG_LDM_PARTITION is not set CONFIG_SGI_PARTITION=y # CONFIG_ULTRIX_PARTITION is not set CONFIG_SUN_PARTITION=y # CONFIG_EFI_PARTITION is not set # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_CODEPAGE_737=m CONFIG_NLS_CODEPAGE_775=m CONFIG_NLS_CODEPAGE_850=m CONFIG_NLS_CODEPAGE_852=m CONFIG_NLS_CODEPAGE_855=m CONFIG_NLS_CODEPAGE_857=m CONFIG_NLS_CODEPAGE_860=m CONFIG_NLS_CODEPAGE_861=m CONFIG_NLS_CODEPAGE_862=m CONFIG_NLS_CODEPAGE_863=m CONFIG_NLS_CODEPAGE_864=m CONFIG_NLS_CODEPAGE_865=m CONFIG_NLS_CODEPAGE_866=m CONFIG_NLS_CODEPAGE_869=m CONFIG_NLS_CODEPAGE_936=m CONFIG_NLS_CODEPAGE_950=m CONFIG_NLS_CODEPAGE_932=m CONFIG_NLS_CODEPAGE_949=m CONFIG_NLS_CODEPAGE_874=m CONFIG_NLS_ISO8859_8=m CONFIG_NLS_CODEPAGE_1250=m CONFIG_NLS_CODEPAGE_1251=m # CONFIG_NLS_ASCII is not set CONFIG_NLS_ISO8859_1=m CONFIG_NLS_ISO8859_2=m CONFIG_NLS_ISO8859_3=m CONFIG_NLS_ISO8859_4=m CONFIG_NLS_ISO8859_5=m CONFIG_NLS_ISO8859_6=m CONFIG_NLS_ISO8859_7=m CONFIG_NLS_ISO8859_9=m CONFIG_NLS_ISO8859_13=m CONFIG_NLS_ISO8859_14=m CONFIG_NLS_ISO8859_15=m CONFIG_NLS_KOI8_R=m CONFIG_NLS_KOI8_U=m CONFIG_NLS_UTF8=m # # Profiling support # CONFIG_PROFILING=y CONFIG_OPROFILE=m # # Kernel hacking # CONFIG_DEBUG_KERNEL=y CONFIG_MAGIC_SYSRQ=y # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_SPINLOCK is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_HIGHMEM is not set # CONFIG_DEBUG_INFO is not set # CONFIG_FRAME_POINTER is not set CONFIG_EARLY_PRINTK=y # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_KPROBES is not set # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_PAGEALLOC is not set # CONFIG_4KSTACKS is not set # CONFIG_SCHEDSTATS is not set CONFIG_X86_FIND_SMP_CONFIG=y CONFIG_X86_MPPARSE=y # # Security options # # CONFIG_SECURITY is not set # # Cryptographic options # CONFIG_CRYPTO=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_NULL=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=m CONFIG_CRYPTO_SHA1=m CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA512=m # CONFIG_CRYPTO_WP512 is not set CONFIG_CRYPTO_DES=m CONFIG_CRYPTO_BLOWFISH=m # CONFIG_CRYPTO_TWOFISH is not set CONFIG_CRYPTO_SERPENT=m # CONFIG_CRYPTO_AES_586 is not set CONFIG_CRYPTO_CAST5=m # CONFIG_CRYPTO_CAST6 is not set # CONFIG_CRYPTO_TEA is not set # CONFIG_CRYPTO_ARC4 is not set # CONFIG_CRYPTO_KHAZAD is not set CONFIG_CRYPTO_DEFLATE=m # CONFIG_CRYPTO_MICHAEL_MIC is not set # CONFIG_CRYPTO_CRC32C is not set # CONFIG_CRYPTO_TEST is not set # # Library routines # CONFIG_CRC_CCITT=m CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=m CONFIG_X86_SMP=y CONFIG_X86_HT=y CONFIG_X86_BIOS_REBOOT=y CONFIG_X86_TRAMPOLINE=y CONFIG_PC=y --Multipart=_Sat__4_Jun_2005_19_51_22_-0700_Kp/TSOvd/GHsKqPd-- From herbert@gondor.apana.org.au Sun Jun 5 01:03:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 01:03:29 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5583LXq014420 for ; Sun, 5 Jun 2005 01:03:22 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1Deq5A-0001D7-00; Sun, 05 Jun 2005 18:02:08 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1Deq54-00013u-00; Sun, 05 Jun 2005 18:02:02 +1000 From: Herbert Xu To: akpm@osdl.org (Andrew Morton) Subject: Re: Fw: PROBLEM: tcp_output.c bug Cc: netdev@oss.sgi.com, rommer@active.by Organization: Core In-Reply-To: <20050604195122.6a07abc7.akpm@osdl.org> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Sun, 05 Jun 2005 18:02:02 +1000 X-archive-position: 2111 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 406 Lines: 12 Andrew Morton wrote: > > [3.] sh scripts/ver_linux > Linux us401.activeby.net 2.6.9 #4 SMP Fri Apr 22 16:46:30 EEST 2005 i686 i686 > i386 GNU/Linux This bug was fixed in 2.6.11. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From manfred@colorfullife.com Sun Jun 5 08:37:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 08:37:33 -0700 (PDT) Received: from dbl.q-ag.de (dbl.q-ag.de [213.172.117.3]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j55FbRXq016711 for ; Sun, 5 Jun 2005 08:37:29 -0700 Received: from [127.0.0.2] (dbl [127.0.0.1]) by dbl.q-ag.de (8.13.3/8.13.3/Debian-6) with ESMTP id j55Fc30h032241; Sun, 5 Jun 2005 17:38:04 +0200 Message-ID: <42A31BEB.7030900@colorfullife.com> Date: Sun, 05 Jun 2005 17:36:11 +0200 From: Manfred Spraul User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.7.7) Gecko/20050417 Fedora/1.7.7-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeff Garzik CC: AAbdulla@nvidia.com, Netdev Subject: [PATCH] forcedeth: add two new pci ids Content-Type: multipart/mixed; boundary="------------060003070105090106070501" X-archive-position: 2112 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: manfred@colorfullife.com Precedence: bulk X-list: netdev Content-Length: 2536 Lines: 77 This is a multi-part message in MIME format. --------------060003070105090106070501 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Jeff, Ayaz wrote a patch that adds two new pci ids to the forcedeth driver. Could you add it to your tree? I'm not sure if it's worth to sneak it into 2.6.12, but it looks to be obviously correct (tm). -- Manfred Signed-Off-By: Manfred Spraul --------------060003070105090106070501 Content-Type: text/plain; name="patch-forcedeth-mcp51" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch-forcedeth-mcp51" --- 2.6/drivers/net/forcedeth.c 2005-05-16 19:45:54.000000000 +0200 +++ build-2.6/drivers/net/forcedeth.c 2005-05-16 19:52:59.000000000 +0200 @@ -82,6 +82,7 @@ * 0.31: 14 Nov 2004: ethtool support for getting/setting link * capabilities. * 0.32: 16 Apr 2005: RX_ERROR4 handling added. + * 0.33: 16 Mai 2005: Support for MCP51 added. * * Known bugs: * We suspect that on some hardware no TX done interrupts are generated. @@ -93,7 +94,7 @@ * DEV_NEED_TIMERIRQ will not harm you on sane hardware, only generating a few * superfluous timer interrupts from the nic. */ -#define FORCEDETH_VERSION "0.32" +#define FORCEDETH_VERSION "0.33" #define DRV_NAME "forcedeth" #include @@ -1998,7 +1999,9 @@ /* handle different descriptor versions */ if (pci_dev->device == PCI_DEVICE_ID_NVIDIA_NVENET_1 || pci_dev->device == PCI_DEVICE_ID_NVIDIA_NVENET_2 || - pci_dev->device == PCI_DEVICE_ID_NVIDIA_NVENET_3) + pci_dev->device == PCI_DEVICE_ID_NVIDIA_NVENET_3 || + pci_dev->device == PCI_DEVICE_ID_NVIDIA_NVENET_12 || + pci_dev->device == PCI_DEVICE_ID_NVIDIA_NVENET_13) np->desc_ver = DESC_VER_1; else np->desc_ver = DESC_VER_2; @@ -2256,6 +2259,20 @@ .subdevice = PCI_ANY_ID, .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, }, + { /* MCP51 Ethernet Controller */ + .vendor = PCI_VENDOR_ID_NVIDIA, + .device = PCI_DEVICE_ID_NVIDIA_NVENET_12, + .subvendor = PCI_ANY_ID, + .subdevice = PCI_ANY_ID, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + }, + { /* MCP51 Ethernet Controller */ + .vendor = PCI_VENDOR_ID_NVIDIA, + .device = PCI_DEVICE_ID_NVIDIA_NVENET_13, + .subvendor = PCI_ANY_ID, + .subdevice = PCI_ANY_ID, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + }, {0,}, }; --------------060003070105090106070501-- From davem@davemloft.net Sun Jun 5 13:13:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 13:13:38 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j55KDRXq001717 for ; Sun, 5 Jun 2005 13:13:27 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Df1T8-0000e7-JU; Sun, 05 Jun 2005 13:11:38 -0700 Date: Sun, 05 Jun 2005 13:11:38 -0700 (PDT) Message-Id: <20050605.131138.21611278.davem@davemloft.net> To: mchan@broadcom.com Cc: buytenh@wantstofly.org, mitch.a.williams@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <1117830922.4430.44.camel@rh4> References: <1117828169.4430.29.camel@rh4> <20050603205944.GC20623@xi.wantstofly.org> <1117830922.4430.44.camel@rh4> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2113 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1265 Lines: 30 From: "Michael Chan" Date: Fri, 03 Jun 2005 13:35:22 -0700 > I agree on the merit of issuing only one IO at the end. What I'm saying > is that doing so will make it similar to e1000 where all the buffers are > replenished at the end. Isn't that so or am I missing something? You're totally right. I guess we don't see the e1000 behavior due to any of the following: 1) we set the RX ring sizes larger by default 2) we set it larger than what the e1000 tests were done with 3) we process the RX ring faster and thus the chip can't catch up and exhaust the ring We use a default of 200 in tg3, and e1000 seems to use a default of 256. This actually points more to the fact that what you're actually doing to process the packet has a huge influence on whether the chip can catch up and exhaust the RX ring. How much software work does the netif_receive_skb() call entail, on average, for the given workload? That is why the exact test being run is important in analyzing reports such as these. If you're doing a TCP transfer, then netif_receive_skb() can be _VERY_ expensive per-call. If, on the other hand, you're routing tiny 64-byte packets or responding to simple ICMP echo requests, the per-call cost can be significantly lower. From glen.turner@aarnet.edu.au Sun Jun 5 13:30:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 13:30:46 -0700 (PDT) Received: from clix.aarnet.edu.au (clix.aarnet.edu.au [192.94.63.10]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j55KUfXq002968 for ; Sun, 5 Jun 2005 13:30:42 -0700 Received: from [202.158.193.5] (andromache.adelaide.aarnet.edu.au [202.158.193.5]) (authenticated bits=0) by clix.aarnet.edu.au (8.12.8/8.12.8) with ESMTP id j55KTUpg008271 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 6 Jun 2005 06:29:31 +1000 Message-ID: <42A360A0.1040902@aarnet.edu.au> Date: Mon, 06 Jun 2005 05:59:20 +0930 From: Glen Turner Organization: Australia's Academic & Research Network User-Agent: Mozilla Thunderbird 1.0.2-1.3.3 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andy Fleming CC: Stephen Hemminger , Netdev , Kumar Gala Subject: Re: RFC: PHY Abstraction Layer II References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> <20050601144123.2bc11c06@dxpl.pdx.osdl.net> <9A2D608A-D818-455B-96F4-ED42413556C0@freescale.com> In-Reply-To: <9A2D608A-D818-455B-96F4-ED42413556C0@freescale.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-MDSA: Yes X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2114 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: glen.turner@aarnet.edu.au Precedence: bulk X-list: netdev Content-Length: 405 Lines: 12 Operationally, it would be very useful if the PHY printed the physical interface detail when detected (1000Base-LX, etc). Also, it would be nice to be able to retrieve PHY data independent of the interface status (eg, to retrieve asset serial numbers, GBIC make/models, etc). -- Glen Turner Tel: (08) 8303 3936 or +61 8 8303 3936 Australia's Academic & Research Network www.aarnet.edu.au From davem@davemloft.net Sun Jun 5 14:38:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 14:38:17 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j55Lc7Xq005931 for ; Sun, 5 Jun 2005 14:38:08 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Df2nd-0001Uq-UL; Sun, 05 Jun 2005 14:36:53 -0700 Date: Sun, 05 Jun 2005 14:36:53 -0700 (PDT) Message-Id: <20050605.143653.75191476.davem@davemloft.net> To: mchan@broadcom.com Cc: hadi@cyberus.ca, buytenh@wantstofly.org, mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <1117844736.4430.51.camel@rh4> References: <1117830922.4430.44.camel@rh4> <1117837798.6266.25.camel@localhost.localdomain> <1117844736.4430.51.camel@rh4> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2115 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 4378 Lines: 126 To illustrate my most recent point (that packet processing cost on RX is variable, and at times highly so) I made some hacks to the tg3 driver to record how many system clock ticks each netif_receive_skb() call consumed. This clock on my sparc64 box updates at a rate of 12MHZ and is used for system time keeping. Anyways, here is a log from a stream transfer to this system. So the packet trace is heavily TCP receive bound. Here is a sample from this. I take a tick sample before the netif_receive_skb() call, take one afterwards, and record the difference between the two: [ 52 73 41 65 38 61 58 63 37 62 36 62 50 74 38 64 ] [ 37 63 39 62 36 64 36 61 50 75 38 64 39 65 37 62 ] [ 36 60 36 62 50 76 39 67 38 63 35 62 35 64 35 62 ] [ 62 74 41 65 37 62 37 63 36 61 39 62 52 75 38 66 ] [ 37 63 35 61 38 62 36 60 49 75 38 64 37 62 36 66 ] [ 42 62 36 62 48 76 38 64 35 62 40 63 36 60 36 63 ] [ 49 76 36 64 35 64 38 64 37 61 36 62 60 74 37 80 ] [ 43 69 36 65 36 62 37 62 54 77 42 66 37 64 35 60 ] [ 36 61 38 62 51 75 40 64 35 62 36 61 37 61 39 61 ] [ 51 76 38 64 35 63 36 63 38 62 37 63 49 76 39 64 ] [ 35 64 35 64 38 62 36 62 61 85 42 65 38 79 38 62 ] [ 36 61 35 64 49 77 37 63 38 64 36 60 37 62 36 60 ] [ 51 76 38 66 38 62 37 63 36 62 37 60 50 77 41 64 ] [ 36 60 36 60 36 61 37 61 50 78 39 66 37 63 36 62 ] [ 36 61 39 63 60 74 38 66 37 61 35 63 37 65 36 65 ] [ 48 76 38 65 36 64 41 64 36 60 35 61 49 76 39 66 ] [ 36 64 39 60 37 60 36 59 51 73 37 64 40 64 36 62 ] [ 37 61 35 62 50 78 39 67 38 63 35 61 36 63 36 61 ] [ 66 75 41 66 37 65 36 61 36 62 38 63 50 75 38 65 ] [ 37 63 36 62 38 63 36 63 49 76 38 64 38 63 40 64 ] [ 35 63 36 60 50 74 39 65 37 65 38 62 36 62 36 60 ] [ 51 75 37 66 39 65 37 62 37 62 38 61 67 72 39 65 ] [ 37 62 35 61 37 61 54 63 53 75 42 67 35 63 36 61 ] [ 36 65 39 62 53 75 38 64 36 63 35 62 38 63 36 61 ] [ 49 77 39 66 38 62 36 62 38 61 35 59 83 91 77 25 ] [ 22 22 22 24 21 21 21 20 21 35 67 24 50 47 67 39 ] [ 65 34 65 36 63 65 74 38 64 35 64 37 63 37 62 36 ] [ 61 51 75 38 67 39 63 35 64 37 62 36 61 50 74 37 ] [ 66 37 62 35 63 35 61 36 65 52 76 40 65 38 61 37 ] [ 62 36 61 40 64 63 71 40 62 36 64 36 63 36 61 39 ] [ 62 49 76 37 65 36 62 36 61 38 65 41 64 50 75 39 ] [ 67 37 62 37 63 36 62 38 61 69 153 70 140 200 737 67 ] Notice how the packet trail seems to bounce back and forth between taking ~30 ticks to taking ~60 ticks? The ~60 tick packets are the TCP data packets that make us output an ACK packet. So this makes it cost double of what it takes to process a TCP data packet for which we do not immediately generate an ACK. It pretty much shows that we need to have something other than a blank "COUNT" to represent the NAPI weight, and we should instead try to measure the real "work" actually consumed, via some time measurement and limit, to implement this stuff properly. BTW, here is the patch implementing this stuff. --- ./drivers/net/tg3.c.~1~ 2005-06-03 11:13:14.000000000 -0700 +++ ./drivers/net/tg3.c 2005-06-05 14:16:32.000000000 -0700 @@ -2836,7 +2836,17 @@ static int tg3_rx(struct tg3 *tp, int bu desc->err_vlan & RXD_VLAN_MASK); } else #endif + { + unsigned long t = get_cycles(); + unsigned int ent; + netif_receive_skb(skb); + t = get_cycles() - t; + + ent = tp->rx_log_ent; + tp->rx_log[ent] = (u32) t; + tp->rx_log_ent = ((ent + 1) & RX_LOG_MASK); + } tp->dev->last_rx = jiffies; received++; @@ -6609,6 +6619,28 @@ static struct net_device_stats *tg3_get_ stats->rx_crc_errors = old_stats->rx_crc_errors + calc_crc_errors(tp); + /* XXX Yes, I know, do this right. :-) */ + { + unsigned int ent, pos; + + printk("TG3: RX LOG, current ent[%d]\n", tp->rx_log_ent); + ent = tp->rx_log_ent - 512; + pos = 0; + while (ent != tp->rx_log_ent) { + if (!pos) printk("[ "); + + printk("%u ", tp->rx_log[ent]); + + if (++pos >= 16) { + printk("]\n"); + pos = 0; + } + ent = (ent + 1) & RX_LOG_MASK; + } + if (pos != 0) + printk("]\n"); + } + return stats; } --- ./drivers/net/tg3.h.~1~ 2005-06-03 11:13:14.000000000 -0700 +++ ./drivers/net/tg3.h 2005-06-05 14:16:00.000000000 -0700 @@ -2232,6 +2232,11 @@ struct tg3 { #define SST_25VF0X0_PAGE_SIZE 4098 struct ethtool_coalesce coal; + +#define RX_LOG_SIZE (1 << 14) +#define RX_LOG_MASK (RX_LOG_SIZE - 1) + unsigned int rx_log_ent; + u32 rx_log[RX_LOG_SIZE]; }; #endif /* !(_T3_H) */ From davem@davemloft.net Sun Jun 5 23:02:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 23:02:39 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5662ZXq017763 for ; Sun, 5 Jun 2005 23:02:35 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfAfz-0000qY-5z; Sun, 05 Jun 2005 23:01:31 -0700 Date: Sun, 05 Jun 2005 23:01:31 -0700 (PDT) Message-Id: <20050605.230131.78711491.davem@davemloft.net> To: jgarzik@pobox.com Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 From: "David S. Miller" In-Reply-To: <42A0BC2B.4020409@pobox.com> References: <20050603.122558.88474819.davem@davemloft.net> <42A0BC2B.4020409@pobox.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2116 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1525 Lines: 45 From: Jeff Garzik Date: Fri, 03 Jun 2005 16:23:07 -0400 > overall, pretty spiffy :) Thanks. > As further work, I would like to see how much (alot? all?) of the timer > code could be moved into a workqueue, where we could kill the last of > the horrible-udelay loops in the driver. Particularly awful is > > while (++tick < 195000) { > status = tg3_fiber_aneg_smachine(tp, &aninfo); > if (status == ANEG_DONE || status == ANEG_FAILED) > break; > > udelay(1); > } I know :). > * This loop makes me nervous... If there's a fault on the PCI bus or > the hardware is unplugged, val will equal 0xffffffff. I agree, if the chip wedges for whatever reason and stops receiving interrupts, we will totally lock up here. I'll add a timeout to the final version. Remind me if I don't :) > * A few comments for normal humans like "force an interrupt" and "wait > for interrupt handler to complete" might be nice. Ok. > * a BUG_ON(if-interrupts-are-disabled) line might be nice Which interrupts? Local cpu interrupts? Tigon3 chip interrupts? > Rather than an 'irq_sync' arg, my instinct would have been to create > tg3_full_lock() and tg3_full_lock_sync(). This makes the action -much- > more obvious to the reader, and since its inline doesn't cost anything > (compiler's optimizer even does a tiny bit less work my way). This doesn't sound like a bad idea either. Thanks for the feedback Jeff. From yi.zhu@intel.com Sun Jun 5 23:33:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 23:33:53 -0700 (PDT) Received: from fmsfmr002.fm.intel.com (fmr14.intel.com [192.55.52.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j566XoXq020272 for ; Sun, 5 Jun 2005 23:33:50 -0700 Received: from fmsfmr100.fm.intel.com (fmsfmr100.fm.intel.com [10.1.192.58]) by fmsfmr002.fm.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j566WmVj029556; Mon, 6 Jun 2005 06:32:48 GMT Received: from fmsmsxvs043.fm.intel.com (fmsmsxvs043.fm.intel.com [132.233.42.129]) by fmsfmr100.fm.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j566WSbt023446; Mon, 6 Jun 2005 06:32:48 GMT Received: from debian.sh.intel.com ([172.16.219.38]) by fmsmsxvs043.fm.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060523324620993 ; Sun, 05 Jun 2005 23:32:47 -0700 Subject: Re: [3/9] ieee80211: fix ipw 64bit compilation warnings From: Zhu Yi To: Jiri Benc Cc: NetDev , Jeff Garzik , Jirka Bohac In-Reply-To: <20050603183048.7786f98b@griffin.suse.cz> References: <20050603182625.64d33be3@griffin.suse.cz> <20050603183048.7786f98b@griffin.suse.cz> Content-Type: text/plain Organization: Intel Corp. Date: Mon, 06 Jun 2005 14:29:52 +0800 Message-Id: <1118039392.5702.30.camel@debian.sh.intel.com> Mime-Version: 1.0 X-Mailer: Evolution 2.2.2 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2118 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yi.zhu@intel.com Precedence: bulk X-list: netdev Content-Length: 354 Lines: 13 On Fri, 2005-06-03 at 18:30 +0200, Jiri Benc wrote: > @@ -508,7 +508,7 @@ > /* verify we have enough room to store the value */ > if (*len < sizeof(u32)) { > IPW_DEBUG_ORD("ordinal buffer length too small, " > - "need %d\n", sizeof(u32)); > + "need %d\n", (int)sizeof(u32)); ("%zd", sizeof()) should be better. Thanks, -yi From davem@davemloft.net Sun Jun 5 23:44:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 05 Jun 2005 23:44:32 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j566iMXq021171 for ; Sun, 5 Jun 2005 23:44:22 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfBKG-0001OG-0R; Sun, 05 Jun 2005 23:43:08 -0700 Date: Sun, 05 Jun 2005 23:43:07 -0700 (PDT) Message-Id: <20050605.234307.92584592.davem@davemloft.net> To: mchan@broadcom.com Cc: hadi@cyberus.ca, buytenh@wantstofly.org, mitch.a.williams@intel.com, john.ronciak@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <20050605.143653.75191476.davem@davemloft.net> References: <1117837798.6266.25.camel@localhost.localdomain> <1117844736.4430.51.camel@rh4> <20050605.143653.75191476.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2119 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 5307 Lines: 190 From: "David S. Miller" Date: Sun, 05 Jun 2005 14:36:53 -0700 (PDT) > BTW, here is the patch implementing this stuff. A new patch and some more data. When we go to gigabit, and NAPI kicks in, the first RX packet costs a lot (cache misses etc.) but the rest are very efficient to process. I suspect this only holds for the single socket case, and on a real system processing many connections the cost drop might not be so clean. The log output format is: (TX_TICKS:RX_TICKS[ RX_TICK1 RX_TICK2 RX_TICK3 ... ]) Here is an example trace from a single socket TCP stream send over gigabit: (9:112[ 26 8 7 8 7 ]) (6:110[ 23 8 8 8 7 ]) (7:57[ 26 8 ]) (6:117[ 25 8 9 7 7 ]) (5:37[ 26 ]) (6:113[ 28 8 7 8 7 ]) (0:20[ 9 ]) (8:111[ 27 7 7 8 7 ]) (5:109[ 25 8 8 8 7 ]) (8:113[ 25 7 8 9 7 ]) (6:108[ 25 8 7 7 7 ]) (8:88[ 26 8 8 7 ]) (6:109[ 25 7 7 7 7 ]) (6:111[ 25 9 8 7 7 ]) (0:48[ 9 5 ]) This kind of trace reiterates some things we already know. For example, mitigation (HW, SW, or a combination of both) helps because processing multiple packets let's us "reuse" the cpu cache priming the handling of the first packet achieves for us. It would be great to stick something like this into the e1000 driver, and get some output from it with Intel's single NIC performance degradation test case. It is also necessary for the Intel folks to say whether the NIC is running out of RX descriptors in the single NIC case with dev->weight set to the default of 64. If so, does increasing the RX ring size to a larger value via ethtool help? If not, then why in the world are things running more slowly? I've got a crappy 1.5GHZ sparc64 box in my tg3 tests here, and it can handle gigabit line rate with much CPU to spare. So either Intel is doing something other than TCP stream tests, or something else is out of whack. I even tried to do things like having a memory touching program run in parallel with the TCP stream test, and this did not make the timing numbers in the logs increase much at all. --- ./drivers/net/tg3.c.~1~ 2005-06-03 11:13:14.000000000 -0700 +++ ./drivers/net/tg3.c 2005-06-05 23:21:11.000000000 -0700 @@ -2836,7 +2836,22 @@ static int tg3_rx(struct tg3 *tp, int bu desc->err_vlan & RXD_VLAN_MASK); } else #endif + { + unsigned long t = get_cycles(); + struct tg3_poll_log_ent *lp; + unsigned int ent; + netif_receive_skb(skb); + t = get_cycles() - t; + + ent = tp->poll_log_ent; + lp = &tp->poll_log[ent]; + ent = lp->rx_cur_ent; + if (ent < POLL_RX_SIZE) { + lp->rx_ents[ent] = (u16) t; + lp->rx_cur_ent = ent + 1; + } + } tp->dev->last_rx = jiffies; received++; @@ -2897,9 +2912,15 @@ static int tg3_poll(struct net_device *n /* run TX completion thread */ if (sblk->idx[0].tx_consumer != tp->tx_cons) { + unsigned long t; + spin_lock(&tp->tx_lock); + t = get_cycles(); tg3_tx(tp); + t = get_cycles() - t; spin_unlock(&tp->tx_lock); + + tp->poll_log[tp->poll_log_ent].tx_ticks = (u16) t; } spin_unlock_irqrestore(&tp->lock, flags); @@ -2911,16 +2932,28 @@ static int tg3_poll(struct net_device *n if (sblk->idx[0].rx_producer != tp->rx_rcb_ptr) { int orig_budget = *budget; int work_done; + unsigned long t; + unsigned int ent; if (orig_budget > netdev->quota) orig_budget = netdev->quota; + t = get_cycles(); work_done = tg3_rx(tp, orig_budget); + t = get_cycles() - t; + + ent = tp->poll_log_ent; + tp->poll_log[ent].rx_ticks = (u16) t; *budget -= work_done; netdev->quota -= work_done; } + tp->poll_log_ent = (tp->poll_log_ent + 1) & POLL_LOG_MASK; + tp->poll_log[tp->poll_log_ent].tx_ticks = 0; + tp->poll_log[tp->poll_log_ent].rx_ticks = 0; + tp->poll_log[tp->poll_log_ent].rx_cur_ent = 0; + if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS) tp->last_tag = sblk->status_tag; rmb(); @@ -6609,6 +6642,27 @@ static struct net_device_stats *tg3_get_ stats->rx_crc_errors = old_stats->rx_crc_errors + calc_crc_errors(tp); + /* XXX Yes, I know, do this right. :-) */ + { + unsigned int ent; + + printk("TG3: POLL LOG, current ent[%d]\n", tp->poll_log_ent); + ent = tp->poll_log_ent - (POLL_LOG_SIZE - 1); + ent &= POLL_LOG_MASK; + while (ent != tp->poll_log_ent) { + struct tg3_poll_log_ent *lp = &tp->poll_log[ent]; + int i; + + printk("(%u:%u[ ", + lp->tx_ticks, lp->rx_ticks); + for (i = 0; i < lp->rx_cur_ent; i++) + printk("%d ", lp->rx_ents[i]); + printk("])\n"); + + ent = (ent + 1) & POLL_LOG_MASK; + } + } + return stats; } --- ./drivers/net/tg3.h.~1~ 2005-06-03 11:13:14.000000000 -0700 +++ ./drivers/net/tg3.h 2005-06-05 23:21:05.000000000 -0700 @@ -2003,6 +2003,15 @@ struct tg3_ethtool_stats { u64 nic_tx_threshold_hit; }; +struct tg3_poll_log_ent { + u16 tx_ticks; + u16 rx_ticks; +#define POLL_RX_SIZE 8 +#define POLL_RX_MASK (POLL_RX_SIZE - 1) + u16 rx_cur_ent; + u16 rx_ents[POLL_RX_SIZE]; +}; + struct tg3 { /* begin "general, frequently-used members" cacheline section */ @@ -2232,6 +2241,11 @@ struct tg3 { #define SST_25VF0X0_PAGE_SIZE 4098 struct ethtool_coalesce coal; + +#define POLL_LOG_SIZE (1 << 7) +#define POLL_LOG_MASK (POLL_LOG_SIZE - 1) + unsigned int poll_log_ent; + struct tg3_poll_log_ent poll_log[POLL_LOG_SIZE]; }; #endif /* !(_T3_H) */ From hhh@imada.sdu.dk Mon Jun 6 02:36:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 02:36:14 -0700 (PDT) Received: from berlioz.imada.sdu.dk (berlioz.imada.sdu.dk [130.225.128.12]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j569a9Xq001968 for ; Mon, 6 Jun 2005 02:36:11 -0700 Received: from localhost (localhost [127.0.0.1]) by localhost.imada.sdu.dk (Postfix) with ESMTP id 4C3F262728 for ; Mon, 6 Jun 2005 11:35:07 +0200 (CEST) Received: from berlioz.imada.sdu.dk ([127.0.0.1]) by localhost (berlioz.imada.sdu.dk [127.0.0.1]) (amavisd-new, port 10025) with ESMTP id 28588-07 for ; Mon, 6 Jun 2005 09:35:06 +0000 (UTC) Received: from [139.91.76.186] (unknown [139.91.76.186]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by berlioz.imada.sdu.dk (Postfix) with ESMTP id 183A462745 for ; Mon, 6 Jun 2005 11:35:06 +0200 (CEST) From: Hans Henrik Happe Subject: PROBLEM: High TCP latency User-Agent: KMail/1.7.2 MIME-Version: 1.0 To: netdev@oss.sgi.com Date: Mon, 6 Jun 2005 11:35:09 +0200 Content-Type: Multipart/Mixed; boundary="Boundary-00=_NjBpCAIVJaMD5eg" Message-Id: <200506061135.09869.hhh@imada.sdu.dk> X-archive-position: 2120 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hhh@imada.sdu.dk Precedence: bulk X-list: netdev Content-Length: 21992 Lines: 1011 --Boundary-00=_NjBpCAIVJaMD5eg Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Short: TCP puts the system into the idle state even though there are data in transit. During coding a distributed application I discovered a TCP latency issue. The application does a lot of request forwarding like P2P protocols. I have tried to track down the problem and have written a small program (random-tcp.c) that shows the long latencies. In this program one message is passed round between a number om processes. Each time a process receives the message it randomly chooses a process to forward to next. This I have compared to a program that doesn't give long latencies (ring-tcp.c). In this program each process always forwards to the same process (ring topology). I have also made the same programs using SCTP and this protocol has no issue in the random case. The following is a test with 16 processes forwarding the message 100000 times. The avg. forwarding time from process to process is messured. $ ./random-tcp 16 100000 avg forwarding time: 0.000326 $ ./ring-tcp 16 100000 avg forwarding time: 0.000044 $ ./random-sctp 16 100000 avg forwarding time: 0.000068 $ ./ring-sctp 16 100000 avg forwarding time: 0.000067 Using 'top' i have observed that the system spends time in the idle state when running 'random-tcp'. This I have observed with just 3 processes. With 16 processes the CPU is only 20% loaded on my Mobile Intel(R) Celeron(R) CPU 1.60GHz. I have also tried with socketpair()'s which didn't have the problem. Therefore my conclusion is that it must be a TCP issue. Now this local use of TCP is not that usefull. Therefore, I tried a MPI version and tested this in a 16 node cluster. Here the random case is 5 times slower than the ring. I have tested on many kernel versions from 2.4.25 up until 2.6.12-rc5 and all had this issue. A few people on lkml also confirmed it, but I have not got any reply from someone with a greater knowledge of the inner working of Linux TCP (at least they didn't tell me that they had this knowledge :-). I hope this is helpfull. Regards Hans Henrik Happe --Boundary-00=_NjBpCAIVJaMD5eg Content-Type: text/x-csrc; charset="us-ascii"; name="random-sctp.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="random-sctp.c" /* By Hans Henrik Happe * * compile: gcc -o random-sctp random-sctp.c -lsctp * * usage: random-sctp <# processes> <# forwards> */ #include #include #include #include #include #include #include #include #include #include double second() { struct timeval tv; struct timezone tz; double t; gettimeofday(&tv,&tz); t= (double)(tv.tv_sec)+(double)(tv.tv_usec/1.0e6); return t; } typedef struct { struct sockaddr sockadr; int len; } adr_t; int get_adr(adr_t *adr, int port) { int n; struct addrinfo hints, *res; char str[6]; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_flags = AI_PASSIVE; hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; sprintf(str, "%d", port); n = getaddrinfo("localhost", str, &hints, &res); if (n != 0) { fprintf(stderr, "getaddrinfo error: [%s]\n", gai_strerror(n)); return -1; } memcpy(&adr->sockadr, res->ai_addr, sizeof(*res->ai_addr)); adr->len = sizeof(*res->ai_addr); freeaddrinfo(res); return 0; } int init_listen(int port) { int n, on=1; int sock; struct sockaddr_in name; sock = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP); if (sock == -1) { perror("socket"); return -1; } name.sin_family = PF_INET; name.sin_port = htons (port); name.sin_addr.s_addr = htonl (INADDR_ANY); if (bind (sock, (struct sockaddr *) &name, sizeof (name)) == -1) { perror("bind"); return -1; } if (listen(sock, 10) == -1) { perror("listen"); return -1; } return sock; } int do_recv(int sock, void *buf, int n) { struct sockaddr sa; struct sctp_sndrcvinfo info; int slen, flags, res; slen = sizeof(sa); res = sctp_recvmsg(sock, buf, n, &sa, &slen, &info, &flags); if (res == -1) { perror("recv"); } if (res != n) { fprintf(stderr, "recv incomplete\n"); } return res; } int do_send(int sock, adr_t *adr, void *buf, int n) { int res; res = sctp_sendmsg(sock, buf, n, &adr->sockadr, adr->len, 666, MSG_ADDR_OVER, 0, 0, 444); if (res == -1) { perror("send"); } if (res != n) { fprintf(stderr, "send incomplete\n"); } return res; } int main(int argc, char *argv[]) { int i, cnt, pid, src, dest, its; int lsock; char id, rank, data; int port = 11100; double t0, t1; /* # processes */ cnt = atoi(argv[1]); /* # forwards */ its = atoi(argv[2]); { adr_t dests[cnt]; /* Create processes */ rank = 0; for (i=1; i <# forwards> */ #include #include #include #include #include #include #include #include #include #include double second() { struct timeval tv; struct timezone tz; double t; gettimeofday(&tv,&tz); t= (double)(tv.tv_sec)+(double)(tv.tv_usec/1.0e6); return t; } int do_connect(int port) { int n, sock, on=1; struct addrinfo hints, *res; char str[6]; void *adr; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_flags = AI_PASSIVE; hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; sprintf(str, "%d", port); n = getaddrinfo("localhost", str, &hints, &res); if (n != 0) { fprintf(stderr, "getaddrinfo error: [%s]\n", gai_strerror(n)); return -1; } sock = socket(AF_INET, SOCK_STREAM, 0); if (sock == -1) { perror("socket"); return -1; } if (setsockopt(sock, SOL_TCP, TCP_NODELAY, &on, sizeof(on)) == -1) { perror("setsockopt"); return -1; } if (connect(sock, (struct sockaddr *)res->ai_addr, sizeof(*res->ai_addr)) == -1) { perror("connect"); return -1; } freeaddrinfo(res); return sock; } int start_listen(int port) { int n, on=1; int sock; struct sockaddr_in name; sock = socket(AF_INET, SOCK_STREAM, 0); if (sock == -1) { perror("socket"); return -1; } if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) == -1) { perror("setsockopt"); return -1; } name.sin_family = AF_INET; name.sin_port = htons (port); name.sin_addr.s_addr = htonl (INADDR_ANY); if (bind (sock, (struct sockaddr *) &name, sizeof (name)) == -1) { perror("bind"); return -1; } if (listen(sock, 10) == -1) { perror("listen"); return -1; } return sock; } int do_accept(int lsock) { struct sockaddr addr; socklen_t len = sizeof(addr); int sock, on=1; if ((sock = accept(lsock, &addr, &len)) == -1) { perror("accept"); return -1; } if (setsockopt(sock, SOL_TCP, TCP_NODELAY, &on, sizeof(on)) == -1) { perror("setsockopt"); return -1; } return sock; } int do_read(int fd, void *buf, int n) { int res; res = read(fd, buf, n); if (res == -1) { perror("read"); } if (res != n) { fprintf(stderr, "read incomplete\n"); } return res; } int do_write(int fd, void *buf, int n) { int res; res = write(fd, buf, n); if (res == -1) { perror("write"); } if (res != n) { fprintf(stderr, "write incomplete\n"); } return res; } int main(int argc, char *argv[]) { int i, cnt, pid, dest, src, its; int lsock, sock; char id, rank, data; int port = 11100; double t0, t1; /* # processes */ cnt = atoi(argv[1]); /* # forwards */ its = atoi(argv[2]); { int socks[cnt]; /* Create processes */ rank = 0; for (i=1; i <# forwards> */ #include #include #include #include #include #include #include #include #include #include double second() { struct timeval tv; struct timezone tz; double t; gettimeofday(&tv,&tz); t= (double)(tv.tv_sec)+(double)(tv.tv_usec/1.0e6); return t; } typedef struct { struct sockaddr sockadr; int len; } adr_t; int get_adr(adr_t *adr, int port) { int n; struct addrinfo hints, *res; char str[6]; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_flags = AI_PASSIVE; hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; sprintf(str, "%d", port); n = getaddrinfo("localhost", str, &hints, &res); if (n != 0) { fprintf(stderr, "getaddrinfo error: [%s]\n", gai_strerror(n)); return -1; } memcpy(&adr->sockadr, res->ai_addr, sizeof(*res->ai_addr)); adr->len = sizeof(*res->ai_addr); freeaddrinfo(res); return 0; } int init_listen(int port) { int n, on=1; int sock; struct sockaddr_in name; sock = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP); if (sock == -1) { perror("socket"); return -1; } name.sin_family = PF_INET; name.sin_port = htons (port); name.sin_addr.s_addr = htonl (INADDR_ANY); if (bind (sock, (struct sockaddr *) &name, sizeof (name)) == -1) { perror("bind"); return -1; } if (listen(sock, 10) == -1) { perror("listen"); return -1; } return sock; } int do_recv(int sock, void *buf, int n) { struct sockaddr sa; struct sctp_sndrcvinfo info; int slen, flags, res; slen = sizeof(sa); res = sctp_recvmsg(sock, buf, n, &sa, &slen, &info, &flags); if (res == -1) { perror("recv"); } if (res != n) { fprintf(stderr, "recv incomplete\n"); } return res; } int do_send(int sock, adr_t *adr, void *buf, int n) { int res; res = sctp_sendmsg(sock, buf, n, &adr->sockadr, adr->len, 666, MSG_ADDR_OVER, 0, 0, 444); if (res == -1) { perror("send"); } if (res != n) { fprintf(stderr, "send incomplete\n"); } return res; } int main(int argc, char *argv[]) { int i, cnt, pid, src, dest, its; int lsock; char id, rank, data; int port = 11100; double t0, t1; /* # processes */ cnt = atoi(argv[1]); /* # forwards */ its = atoi(argv[2]); { adr_t dests[cnt]; /* Create processes */ rank = 0; for (i=1; i <# forwards> */ #include #include #include #include #include #include #include #include #include #include double second() { struct timeval tv; struct timezone tz; double t; gettimeofday(&tv,&tz); t= (double)(tv.tv_sec)+(double)(tv.tv_usec/1.0e6); return t; } int do_connect(int port) { int n, sock, on=1; struct addrinfo hints, *res; char str[6]; void *adr; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_flags = AI_PASSIVE; hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; sprintf(str, "%d", port); n = getaddrinfo("localhost", str, &hints, &res); if (n != 0) { fprintf(stderr, "getaddrinfo error: [%s]\n", gai_strerror(n)); return -1; } sock = socket(AF_INET, SOCK_STREAM, 0); if (sock == -1) { perror("socket"); return -1; } if (setsockopt(sock, SOL_TCP, TCP_NODELAY, &on, sizeof(on)) == -1) { perror("setsockopt"); return -1; } if (connect(sock, (struct sockaddr *)res->ai_addr, sizeof(*res->ai_addr)) == -1) { perror("connect"); return -1; } freeaddrinfo(res); return sock; } int start_listen(int port) { int n, on=1; int sock; struct sockaddr_in name; sock = socket(AF_INET, SOCK_STREAM, 0); if (sock == -1) { perror("socket"); return -1; } if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) == -1) { perror("setsockopt"); return -1; } name.sin_family = AF_INET; name.sin_port = htons (port); name.sin_addr.s_addr = htonl (INADDR_ANY); if (bind (sock, (struct sockaddr *) &name, sizeof (name)) == -1) { perror("bind"); return -1; } if (listen(sock, 10) == -1) { perror("listen"); return -1; } return sock; } int do_accept(int lsock) { struct sockaddr addr; socklen_t len = sizeof(addr); int sock, on=1; if ((sock = accept(lsock, &addr, &len)) == -1) { perror("accept"); return -1; } if (setsockopt(sock, SOL_TCP, TCP_NODELAY, &on, sizeof(on)) == -1) { perror("setsockopt"); return -1; } return sock; } int do_read(int fd, void *buf, int n) { int res; res = read(fd, buf, n); if (res == -1) { perror("read"); } if (res != n) { fprintf(stderr, "read incomplete\n"); } return res; } int do_write(int fd, void *buf, int n) { int res; res = write(fd, buf, n); if (res == -1) { perror("write"); } if (res != n) { fprintf(stderr, "write incomplete\n"); } return res; } int main(int argc, char *argv[]) { int i, cnt, pid, dest, src, its; int lsock, sock; char id, rank, data; int port = 11100; double t0, t1; /* # processes */ cnt = atoi(argv[1]); /* # forwards */ its = atoi(argv[2]); { int socks[cnt]; /* Create processes */ rank = 0; for (i=1; i; Mon, 6 Jun 2005 03:40:04 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Mon, 6 Jun 2005 03:38:56 -0700 Message-ID: Received: from 144.16.64.4 by by24fd.bay24.hotmail.msn.com with HTTP; Mon, 06 Jun 2005 10:38:56 GMT X-Originating-IP: [144.16.64.4] X-Originating-Email: [rahulhsaxena@hotmail.com] X-Sender: rahulhsaxena@hotmail.com In-Reply-To: <20050605221106.GB15391@postel.suug.ch> From: "rahul hari" To: tgraf@suug.ch Cc: diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Linux Diffserv] GRED queueing discipline and the file sch_gred.c Date: Mon, 06 Jun 2005 16:08:56 +0530 Mime-Version: 1.0 Content-Type: text/plain; format=flowed X-OriginalArrivalTime: 06 Jun 2005 10:38:56.0611 (UTC) FILETIME=[F0FA1B30:01C56A83] X-archive-position: 2121 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rahulhsaxena@hotmail.com Precedence: bulk X-list: netdev Content-Length: 1388 Lines: 34 Dear Thomas, Thanks for the reply. Actually in my experiment, I am implementing 2 queues, in one of the queues, I use the prio scheme of tc and in another I define 3 virtual queues, out of which I want to provide absolute priority to one of the queue over the others (ie, if there is any packet in this queue, it should be dispatched immediately regardless of whatever happens to the other two virtual queues). For the other two virtual queues, I want to apply individual REDs (with different parameters but the average queue length should be equal to the total qave of these two virtual queues) on each but the dequeuing priority should be equal (the dequeuing takes place alternately). Can the current implementations somehow help me with this , or I would have to design this from scratch. Regards, Rahul ------- "The fear you let build up in your mind is worse than the situation that actually exists" taken from "who moved my cheese" ----------------------------------------------------------------------------- Rahul Hari Senior Undergraduate Student, Department of CSE, ITBHU, Varanasi. Ph: +91-9845347020 ----------------------------------------------------------------------------- _________________________________________________________________ Don’t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ From aharon.abramson@intel.com Mon Jun 6 04:15:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 04:15:42 -0700 (PDT) Received: from hermes.iil.intel.com (hermes.iil.intel.com [192.198.152.99]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56BFaXq007585 for ; Mon, 6 Jun 2005 04:15:39 -0700 Received: from petasus.iil.intel.com (petasus.iil.intel.com [143.185.77.3]) by hermes.iil.intel.com (8.12.9-20030918-01/8.12.10/d: large-outer.mc,v 1.2 2004/09/17 18:04:59 root Exp $) with ESMTP id j56BMm0W003358 for ; Mon, 6 Jun 2005 11:22:48 GMT Received: from hasmsxvs01.iil.intel.com (hasmsxvs01.iil.intel.com [143.185.63.58]) by petasus.iil.intel.com (8.12.9-20030918-01/8.12.10/d: large-inner.mc,v 1.2 2004/09/17 18:04:31 root Exp $) with SMTP id j56BQ6vH031616 for ; Mon, 6 Jun 2005 11:26:12 GMT Received: from hasmsx331.ger.corp.intel.com ([143.185.63.144]) by hasmsxvs01.iil.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060614143131764 for ; Mon, 06 Jun 2005 14:14:31 +0300 Received: from hasmsx402.ger.corp.intel.com ([143.185.63.156]) by hasmsx331.ger.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 6 Jun 2005 14:14:32 +0300 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C56A88.E972F0D2" Subject: constructing struct sk_buff objects from a pre-allocated buffer Date: Mon, 6 Jun 2005 14:14:31 +0300 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: constructing struct sk_buff objects from a pre-allocated buffer thread-index: AcVqiOkzUm0R5PdXRKi+rrqFWxuSMg== From: "Abramson, Aharon" To: X-OriginalArrivalTime: 06 Jun 2005 11:14:32.0167 (UTC) FILETIME=[E9DE1770:01C56A88] X-Scanned-By: MIMEDefang 2.31 (www . roaringpenguin . com / mimedefang) X-archive-position: 2122 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: aharon.abramson@intel.com Precedence: bulk X-list: netdev Content-Length: 1603 Lines: 49 This is a multi-part message in MIME format. ------_=_NextPart_001_01C56A88.E972F0D2 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello, all. I'm developing a network device driver. This device may deliver multiple frames in single pre-allocated receive buffer. How do I construct struct sk_buff objects for these frames, since alloc_skb allocates the object's data by itself? =20 Thanks, Aharon Abramson =20 ------_=_NextPart_001_01C56A88.E972F0D2 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hello, = all.
I'm = developing a=20 network device driver. This device may deliver multiple frames in single = pre-allocated receive buffer. How do I construct struct sk_buff objects = for=20 these frames, since alloc_skb allocates the object's data by=20 itself?
 
Thanks,
Aharon Abramson
 
------_=_NextPart_001_01C56A88.E972F0D2-- From tgraf@suug.ch Mon Jun 6 04:39:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 04:39:58 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56BdqXq013578 for ; Mon, 6 Jun 2005 04:39:55 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 7FF5F1C0EE; Mon, 6 Jun 2005 13:39:07 +0200 (CEST) Date: Mon, 6 Jun 2005 13:39:07 +0200 From: Thomas Graf To: rahul hari Cc: diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Linux Diffserv] GRED queueing discipline and the file sch_gred.c Message-ID: <20050606113907.GC15391@postel.suug.ch> References: <20050605221106.GB15391@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 2123 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1639 Lines: 36 Rahul, * rahul hari 2005-06-06 16:08 > Thanks for the reply. Actually in my experiment, I am implementing 2 > queues, in one of the queues, I use the prio scheme of tc and in another I > define 3 virtual queues, out of which I want to provide absolute priority > to one of the queue over the others (ie, if there is any packet in this > queue, it should be dispatched immediately regardless of whatever happens > to the other two virtual queues). Use a prio qdisc with RED leaf qdiscs. RED and GREDs purpose is to calculate a marking probability and not to provide any prioritizing schemes. RIO mode is a small exception from this but the used priority only describes the weight of the VQ and has no influence on the actual queue position later on. > For the other two virtual queues, I want to apply individual REDs (with > different parameters but the average queue length should be equal to the > total qave of these two virtual queues) on each but the dequeuing priority > should be equal (the dequeuing takes place alternately). Use a GRED qdisc, give both VQs the same prio (so they go into equalize mode) and enable RIO mode. The VQ you select as default will be used to store qavg and the idle time. CBQ cbq:queue_1 cbq:queue_2 | | prio GRED (rio mode) | | | | | RED_1 RED_2 RED_3 VQ1(prio=1) VQ2(prio=1) You did not talk about how to separate the two initial queues so I assumed CBQ but it doesn't really matter as long its a classful qdisc. From hadi@cyberus.ca Mon Jun 6 04:47:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 04:47:32 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56BlTXq014374 for ; Mon, 6 Jun 2005 04:47:29 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DfG3p-000700-Cv for netdev@oss.sgi.com; Mon, 06 Jun 2005 07:46:29 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DfG3n-0003ge-HV; Mon, 06 Jun 2005 07:46:27 -0400 Subject: Re: [Linux Diffserv] GRED queueing discipline and the file sch_gred.c From: jamal Reply-To: hadi@cyberus.ca To: rahul hari Cc: tgraf@suug.ch, diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Organization: unknown Date: Mon, 06 Jun 2005 07:45:51 -0400 Message-Id: <1118058351.6266.119.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2124 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 1505 Lines: 37 On Mon, 2005-06-06 at 16:08 +0530, rahul hari wrote: > Dear Thomas, > Thanks for the reply. Actually in my experiment, I am implementing 2 queues, > in one of the queues, I use the prio scheme of tc and in another I define 3 > virtual queues, out of which I want to provide absolute priority to one of > the queue over the others (ie, if there is any packet in this queue, it > should be dispatched immediately regardless of whatever happens to the other > two virtual queues). > For the other two virtual queues, I want to apply individual REDs (with > different parameters but the average queue length should be equal to the > total qave of these two virtual queues) on each but the dequeuing priority > should be equal (the dequeuing takes place alternately). > Can the current implementations somehow help me with this , or I would have > to design this from scratch. > It is not clear what your requirements are. You are stating what your solution is ;-> Assuming that you require to have the first queue to be of the utmost priority followed by the first red queue as being important and then the last two, then you need a prio qdisc with three bands: +---- pfifo | +---- RED | +---- GRED The pfifo will starved the lower 2. The RED will starve the GRED if it can and GRED virtual queues will need to be set in (CISCO) WRED mode i.e select GRIO but give them equal priority. Make sure those two VQs have exactly the same drop priorities and queue parameters. cheers, jamal From hadi@cyberus.ca Mon Jun 6 04:55:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 04:55:58 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56BtuXq015367 for ; Mon, 6 Jun 2005 04:55:56 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DfGBw-0000uc-J9 for netdev@oss.sgi.com; Mon, 06 Jun 2005 07:54:52 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DfGBv-0004uq-VZ; Mon, 06 Jun 2005 07:54:52 -0400 Subject: Re: [Linux Diffserv] GRED queueing discipline and the file sch_gred.c From: jamal Reply-To: hadi@cyberus.ca To: Thomas Graf Cc: rahul hari , diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com In-Reply-To: <20050606113907.GC15391@postel.suug.ch> References: <20050605221106.GB15391@postel.suug.ch> <20050606113907.GC15391@postel.suug.ch> Content-Type: text/plain Organization: unknown Date: Mon, 06 Jun 2005 07:54:18 -0400 Message-Id: <1118058859.6266.126.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2125 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 606 Lines: 18 On Mon, 2005-06-06 at 13:39 +0200, Thomas Graf wrote: > Use a prio qdisc with RED leaf qdiscs. RED and GREDs purpose is to > calculate a marking probability and not to provide any prioritizing > schemes. Prioritization is still implicitly provided if you vary the queue lengths or the drop probabilities. For example, if you set everything to be exactly the same, and varied only the drop probability - the VQ with the highest drop probability will be less important (i.e relatively more of its packets will be dropped; recall: the drop decision is made before the packet is queued). cheers, jamal From herbert@gondor.apana.org.au Mon Jun 6 05:01:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 05:01:39 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56C1VXq016397 for ; Mon, 6 Jun 2005 05:01:32 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DfGGm-0002Mz-00; Mon, 06 Jun 2005 21:59:52 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DfGGZ-00007C-00; Mon, 06 Jun 2005 21:59:39 +1000 Date: Mon, 6 Jun 2005 21:59:39 +1000 To: Christoph Hellwig Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050606115939.GA399@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604112314.GA19819@infradead.org> <20050604112606.GA1799@gondor.apana.org.au> <20050604115853.GA20335@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050604115853.GA20335@infradead.org> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2126 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1179 Lines: 37 On Sat, Jun 04, 2005 at 12:58:53PM +0100, Christoph Hellwig wrote: > > the usage of 16bit counters in bio_vec doesn't make sense, and if did > all others would have to move to 32bit aswell (in case we started > supporting page sizes that aren't addressable by 16bits) You know what? The more I think about this the more I think that your idea is brilliant. The reason is that the two main users of crypto API happen to be in possession of bio_vec and skb_frag_t respectively. Had we merged the three structures, they would not have to copy the structures as they do now or even worse, process the buffers one-by-one as dmcrypt is doing. Back to the topic of 16-bit vs. 32-bit counters. Could we do something like this? #if (PAGE_SHIFT > 16) || (BITS_PER_LONG > 32) typedef unsigned int page_offset_t #else typedef unsigned short page_offset_t #endif And then define struct foovec { struct page *page; page_offset_t offset; page_offset_t length; }; Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From SRS0+26fa12ab9fa0d64ac01b+652+infradead.org+hch@pentafluge.srs.infradead.org Mon Jun 6 05:10:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 05:10:26 -0700 (PDT) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56CALXq017189 for ; Mon, 6 Jun 2005 05:10:23 -0700 Received: from hch by pentafluge.infradead.org with local (Exim 4.43 #1 (Red Hat Linux)) id 1DfGPq-0002Az-Mv; Mon, 06 Jun 2005 13:09:14 +0100 Date: Mon, 6 Jun 2005 13:09:14 +0100 From: Christoph Hellwig To: Herbert Xu Cc: Christoph Hellwig , "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050606120914.GA8317@infradead.org> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604112314.GA19819@infradead.org> <20050604112606.GA1799@gondor.apana.org.au> <20050604115853.GA20335@infradead.org> <20050606115939.GA399@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606115939.GA399@gondor.apana.org.au> User-Agent: Mutt/1.4.1i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 2127 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev Content-Length: 1235 Lines: 33 On Mon, Jun 06, 2005 at 09:59:39PM +1000, Herbert Xu wrote: > On Sat, Jun 04, 2005 at 12:58:53PM +0100, Christoph Hellwig wrote: > > > > the usage of 16bit counters in bio_vec doesn't make sense, and if did > > all others would have to move to 32bit aswell (in case we started > > supporting page sizes that aren't addressable by 16bits) > > You know what? The more I think about this the more I think that your > idea is brilliant. The reason is that the two main users of crypto API > happen to be in possession of bio_vec and skb_frag_t respectively. > > Had we merged the three structures, they would not have to copy the > structures as they do now or even worse, process the buffers one-by-one > as dmcrypt is doing. > > Back to the topic of 16-bit vs. 32-bit counters. Could we do something > like this? > > #if (PAGE_SHIFT > 16) || (BITS_PER_LONG > 32) what is the BITS_PER_LONG check for? > typedef unsigned int page_offset_t > #else > typedef unsigned short page_offset_t > #endif the name is a) a little long and b) easy to confuse with pgoff_t as used in the pagecache. I'm not sure what a better name would be. We probably shouldn't care about this as the networking code didn't handle larger offsets either. From tgraf@suug.ch Mon Jun 6 05:16:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 05:16:12 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56CG7Xq021094 for ; Mon, 6 Jun 2005 05:16:07 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 4DE881C0EE; Mon, 6 Jun 2005 14:15:27 +0200 (CEST) Date: Mon, 6 Jun 2005 14:15:27 +0200 From: Thomas Graf To: jamal Cc: rahul hari , diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Linux Diffserv] GRED queueing discipline and the file sch_gred.c Message-ID: <20050606121527.GE15391@postel.suug.ch> References: <20050605221106.GB15391@postel.suug.ch> <20050606113907.GC15391@postel.suug.ch> <1118058859.6266.126.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1118058859.6266.126.camel@localhost.localdomain> X-archive-position: 2128 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1083 Lines: 20 * jamal <1118058859.6266.126.camel@localhost.localdomain> 2005-06-06 07:54 > On Mon, 2005-06-06 at 13:39 +0200, Thomas Graf wrote: > > > Use a prio qdisc with RED leaf qdiscs. RED and GREDs purpose is to > > calculate a marking probability and not to provide any prioritizing > > schemes. > > Prioritization is still implicitly provided if you vary the queue > lengths or the drop probabilities. > For example, if you set everything to be exactly the same, and varied > only the drop probability - the VQ with the highest drop probability > will be less important (i.e relatively more of its packets will be > dropped; recall: the drop decision is made before the packet is queued). Absolutely, what I meant is that GRED does not take influence on the actual ordering of packets not dropped. The priority together with the qavg parameters and the thresholds only have influence on the probability a packet gets marked/dropped, sure this is prioritization as well but Rahul wanted to have one VQ strave out another VQ completely. My point is that this is not possible with GRED. From herbert@gondor.apana.org.au Mon Jun 6 05:42:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 05:42:16 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56Cg8Xq022566 for ; Mon, 6 Jun 2005 05:42:09 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DfGuM-0002cN-00; Mon, 06 Jun 2005 22:40:46 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DfGuJ-0000BQ-00; Mon, 06 Jun 2005 22:40:43 +1000 Date: Mon, 6 Jun 2005 22:40:43 +1000 To: Christoph Hellwig Cc: "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050606124043.GA625@gondor.apana.org.au> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604112314.GA19819@infradead.org> <20050604112606.GA1799@gondor.apana.org.au> <20050604115853.GA20335@infradead.org> <20050606115939.GA399@gondor.apana.org.au> <20050606120914.GA8317@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606120914.GA8317@infradead.org> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2129 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1223 Lines: 34 On Mon, Jun 06, 2005 at 01:09:14PM +0100, Christoph Hellwig wrote: > > > #if (PAGE_SHIFT > 16) || (BITS_PER_LONG > 32) > > what is the BITS_PER_LONG check for? These structures are normally used in arrays. On a 64-bit machine the alignment requirement means that the 16-bit version will be padded to have the same length as the 32-bit version. Since 32-bit access is usually faster we might as well get it for free. > > typedef unsigned int page_offset_t > > the name is a) a little long and b) easy to confuse with pgoff_t as used in > the pagecache. I'm not sure what a better name would be. Alternatively we can put the ifdef around (or inside) the struct definition. > We probably shouldn't care about this as the networking code didn't handle > larger offsets either. I'm not sure what you mean here. However, for skb_frag_t at least going to the 32-bit version on i386 means at least 72 bytes extra for every skb->data allocation. Dave, what are your views on making skb_frag_t bigger? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From SRS0+26fa12ab9fa0d64ac01b+652+infradead.org+hch@pentafluge.srs.infradead.org Mon Jun 6 06:31:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 06:31:33 -0700 (PDT) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56DVTXq026084 for ; Mon, 6 Jun 2005 06:31:29 -0700 Received: from hch by pentafluge.infradead.org with local (Exim 4.43 #1 (Red Hat Linux)) id 1DfHgM-0002kw-Av; Mon, 06 Jun 2005 14:30:22 +0100 Date: Mon, 6 Jun 2005 14:30:22 +0100 From: Christoph Hellwig To: Herbert Xu Cc: Christoph Hellwig , "David S. Miller" , James Morris , Linux Crypto Mailing List , netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050606133022.GA10566@infradead.org> References: <20050603234623.GA20088@gondor.apana.org.au> <20050604112314.GA19819@infradead.org> <20050604112606.GA1799@gondor.apana.org.au> <20050604115853.GA20335@infradead.org> <20050606115939.GA399@gondor.apana.org.au> <20050606120914.GA8317@infradead.org> <20050606124043.GA625@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606124043.GA625@gondor.apana.org.au> User-Agent: Mutt/1.4.1i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 2130 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@infradead.org Precedence: bulk X-list: netdev Content-Length: 602 Lines: 15 On Mon, Jun 06, 2005 at 10:40:43PM +1000, Herbert Xu wrote: > On Mon, Jun 06, 2005 at 01:09:14PM +0100, Christoph Hellwig wrote: > > > > > #if (PAGE_SHIFT > 16) || (BITS_PER_LONG > 32) > > > > what is the BITS_PER_LONG check for? > > These structures are normally used in arrays. On a 64-bit machine > the alignment requirement means that the 16-bit version will be > padded to have the same length as the 32-bit version. Since 32-bit > access is usually faster we might as well get it for free. At this point it might be easiest to just say the architecture must declare the type in asm/types.h From john.ronciak@intel.com Mon Jun 6 08:37:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 08:38:08 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56FbsXq004055 for ; Mon, 6 Jun 2005 08:37:55 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j56FZYGO028451; Mon, 6 Jun 2005 15:35:34 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j56FXTRl025229; Mon, 6 Jun 2005 15:35:29 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060608352714440 ; Mon, 06 Jun 2005 08:35:27 -0700 Received: from orsmsx408.amr.corp.intel.com ([192.168.65.52]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 6 Jun 2005 08:35:27 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: RFC: NAPI packet weighting patch Date: Mon, 6 Jun 2005 08:35:26 -0700 Message-ID: <468F3FDA28AA87429AD807992E22D07E0450C002@orsmsx408> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: RFC: NAPI packet weighting patch Thread-Index: AcVqYwuQH8C8q1ZUSvOxuU2JvA44ugASN1rQ From: "Ronciak, John" To: "David S. Miller" , Cc: , , "Williams, Mitch A" , , , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" X-OriginalArrivalTime: 06 Jun 2005 15:35:27.0765 (UTC) FILETIME=[5D54C450:01C56AAD] X-Scanned-By: MIMEDefang 2.44 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j56FbsXq004055 X-archive-position: 2131 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john.ronciak@intel.com Precedence: bulk X-list: netdev Content-Length: 6820 Lines: 222 We are dropping packets at the HW level (FIFO errors) with 256 descriptors and the default weight of 64. As we said reducing the weight eliminates this which is understandable since the driver is being serviced more fequently. We also hacked the driver to do a buffer allocation per packet sent up the stack. This reduced the number of dropped pacekts by about 80% but it was still a significant number of drops (190K to 39K dropped). So I don't think this is where the problem is. This is also comfimed with the tg3 driver doing the buffer update to the HW every 25 descriptors. We did not up the descriptor ring size with the default weight but will try this today and report back. Cheers, John > -----Original Message----- > From: David S. Miller [mailto:davem@davemloft.net] > Sent: Sunday, June 05, 2005 11:43 PM > To: mchan@broadcom.com > Cc: hadi@cyberus.ca; buytenh@wantstofly.org; Williams, Mitch > A; Ronciak, John; jdmason@us.ibm.com; shemminger@osdl.org; > netdev@oss.sgi.com; Robert.Olsson@data.slu.se; Venkatesan, > Ganesh; Brandeburg, Jesse > Subject: Re: RFC: NAPI packet weighting patch > > > From: "David S. Miller" > Date: Sun, 05 Jun 2005 14:36:53 -0700 (PDT) > > > BTW, here is the patch implementing this stuff. > > A new patch and some more data. > > When we go to gigabit, and NAPI kicks in, the first RX > packet costs a lot (cache misses etc.) but the rest are > very efficient to process. I suspect this only holds > for the single socket case, and on a real system processing > many connections the cost drop might not be so clean. > > The log output format is: > > (TX_TICKS:RX_TICKS[ RX_TICK1 RX_TICK2 RX_TICK3 ... ]) > > Here is an example trace from a single socket TCP stream > send over gigabit: > > (9:112[ 26 8 7 8 7 ]) > (6:110[ 23 8 8 8 7 ]) > (7:57[ 26 8 ]) > (6:117[ 25 8 9 7 7 ]) > (5:37[ 26 ]) > (6:113[ 28 8 7 8 7 ]) > (0:20[ 9 ]) > (8:111[ 27 7 7 8 7 ]) > (5:109[ 25 8 8 8 7 ]) > (8:113[ 25 7 8 9 7 ]) > (6:108[ 25 8 7 7 7 ]) > (8:88[ 26 8 8 7 ]) > (6:109[ 25 7 7 7 7 ]) > (6:111[ 25 9 8 7 7 ]) > (0:48[ 9 5 ]) > > This kind of trace reiterates some things we already know. > For example, mitigation (HW, SW, or a combination of both) > helps because processing multiple packets let's us "reuse" > the cpu cache priming the handling of the first packet > achieves for us. > > It would be great to stick something like this into the e1000 > driver, and get some output from it with Intel's single NIC > performance degradation test case. > > It is also necessary for the Intel folks to say whether the > NIC is running out of RX descriptors in the single NIC > case with dev->weight set to the default of 64. If so, does > increasing the RX ring size to a larger value via ethtool > help? If not, then why in the world are things running more > slowly? > > I've got a crappy 1.5GHZ sparc64 box in my tg3 tests here, and it can > handle gigabit line rate with much CPU to spare. So either Intel is > doing something other than TCP stream tests, or something else is out > of whack. > > I even tried to do things like having a memory touching program > run in parallel with the TCP stream test, and this did not make > the timing numbers in the logs increase much at all. > > --- ./drivers/net/tg3.c.~1~ 2005-06-03 11:13:14.000000000 -0700 > +++ ./drivers/net/tg3.c 2005-06-05 23:21:11.000000000 -0700 > @@ -2836,7 +2836,22 @@ static int tg3_rx(struct tg3 *tp, int bu > desc->err_vlan & RXD_VLAN_MASK); > } else > #endif > + { > + unsigned long t = get_cycles(); > + struct tg3_poll_log_ent *lp; > + unsigned int ent; > + > netif_receive_skb(skb); > + t = get_cycles() - t; > + > + ent = tp->poll_log_ent; > + lp = &tp->poll_log[ent]; > + ent = lp->rx_cur_ent; > + if (ent < POLL_RX_SIZE) { > + lp->rx_ents[ent] = (u16) t; > + lp->rx_cur_ent = ent + 1; > + } > + } > > tp->dev->last_rx = jiffies; > received++; > @@ -2897,9 +2912,15 @@ static int tg3_poll(struct net_device *n > > /* run TX completion thread */ > if (sblk->idx[0].tx_consumer != tp->tx_cons) { > + unsigned long t; > + > spin_lock(&tp->tx_lock); > + t = get_cycles(); > tg3_tx(tp); > + t = get_cycles() - t; > spin_unlock(&tp->tx_lock); > + > + tp->poll_log[tp->poll_log_ent].tx_ticks = (u16) t; > } > > spin_unlock_irqrestore(&tp->lock, flags); > @@ -2911,16 +2932,28 @@ static int tg3_poll(struct net_device *n > if (sblk->idx[0].rx_producer != tp->rx_rcb_ptr) { > int orig_budget = *budget; > int work_done; > + unsigned long t; > + unsigned int ent; > > if (orig_budget > netdev->quota) > orig_budget = netdev->quota; > > + t = get_cycles(); > work_done = tg3_rx(tp, orig_budget); > + t = get_cycles() - t; > + > + ent = tp->poll_log_ent; > + tp->poll_log[ent].rx_ticks = (u16) t; > > *budget -= work_done; > netdev->quota -= work_done; > } > > + tp->poll_log_ent = (tp->poll_log_ent + 1) & POLL_LOG_MASK; > + tp->poll_log[tp->poll_log_ent].tx_ticks = 0; > + tp->poll_log[tp->poll_log_ent].rx_ticks = 0; > + tp->poll_log[tp->poll_log_ent].rx_cur_ent = 0; > + > if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS) > tp->last_tag = sblk->status_tag; > rmb(); > @@ -6609,6 +6642,27 @@ static struct net_device_stats *tg3_get_ > stats->rx_crc_errors = old_stats->rx_crc_errors + > calc_crc_errors(tp); > > + /* XXX Yes, I know, do this right. :-) */ > + { > + unsigned int ent; > + > + printk("TG3: POLL LOG, current ent[%d]\n", > tp->poll_log_ent); > + ent = tp->poll_log_ent - (POLL_LOG_SIZE - 1); > + ent &= POLL_LOG_MASK; > + while (ent != tp->poll_log_ent) { > + struct tg3_poll_log_ent *lp = > &tp->poll_log[ent]; > + int i; > + > + printk("(%u:%u[ ", > + lp->tx_ticks, lp->rx_ticks); > + for (i = 0; i < lp->rx_cur_ent; i++) > + printk("%d ", lp->rx_ents[i]); > + printk("])\n"); > + > + ent = (ent + 1) & POLL_LOG_MASK; > + } > + } > + > return stats; > } > > --- ./drivers/net/tg3.h.~1~ 2005-06-03 11:13:14.000000000 -0700 > +++ ./drivers/net/tg3.h 2005-06-05 23:21:05.000000000 -0700 > @@ -2003,6 +2003,15 @@ struct tg3_ethtool_stats { > u64 nic_tx_threshold_hit; > }; > > +struct tg3_poll_log_ent { > + u16 tx_ticks; > + u16 rx_ticks; > +#define POLL_RX_SIZE 8 > +#define POLL_RX_MASK (POLL_RX_SIZE - 1) > + u16 rx_cur_ent; > + u16 rx_ents[POLL_RX_SIZE]; > +}; > + > struct tg3 { > /* begin "general, frequently-used members" cacheline section */ > > @@ -2232,6 +2241,11 @@ struct tg3 { > #define SST_25VF0X0_PAGE_SIZE 4098 > > struct ethtool_coalesce coal; > + > +#define POLL_LOG_SIZE (1 << 7) > +#define POLL_LOG_MASK (POLL_LOG_SIZE - 1) > + unsigned int poll_log_ent; > + struct tg3_poll_log_ent poll_log[POLL_LOG_SIZE]; > }; > > #endif /* !(_T3_H) */ > From rahulhsaxena@gmail.com Mon Jun 6 10:49:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 10:49:48 -0700 (PDT) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.202]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56HndXq011644 for ; Mon, 6 Jun 2005 10:49:40 -0700 Received: by zproxy.gmail.com with SMTP id 34so1127690nzf for ; Mon, 06 Jun 2005 10:48:37 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition; b=HZuJJgd/+Kw9B+FY4TxM2zsPry53N0HoCIkATJkrNTAWFJd6gpnhq5RDQcRkVaKSqj1gMznfgf7WWhcfrbUaATqYdTCg+1moFFTYDtO/bcWFfMLTT7qmZv7W1+X1Ag1BN/hJxze0y3gIF77nY4tSUUyOp83DhlaT8P4QBo/BKe4= Received: by 10.36.220.9 with SMTP id s9mr730301nzg; Mon, 06 Jun 2005 10:48:37 -0700 (PDT) Received: by 10.36.4.6 with HTTP; Mon, 6 Jun 2005 10:48:37 -0700 (PDT) Message-ID: <4532f31705060610486ef106a1@mail.gmail.com> Date: Mon, 6 Jun 2005 23:18:37 +0530 From: Rahul Hari Reply-To: rahul.hari@cse06.itbhu.org To: hadi@cyberus.ca, tgraf@suug.ch Subject: Re: [Linux Diffserv] GRED queueing discipline and the filesch_gred.c Cc: diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j56HndXq011644 X-archive-position: 2132 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rahulhsaxena@gmail.com Precedence: bulk X-list: netdev Content-Length: 2970 Lines: 72 Thanks for all the suggestions Jamal and Thomas. From what you people have been suggesting, i feel that i should be giving a brief explaination of the problem I am currently working on. I have divided all the traffic on a network into 5 categories : Real time video (UDP1),Real time audio (UDP2), TCP not requiring any QoS (TCP1), TCP requiring QoS but with the size of the entire transaction very low(TCP2), and TCP requiring QoS with the size of the transaction in several MBs (TCP3). Now I am putting UDP1 and TCP1 in one particular queue (say q1) and giving priority to UDP1 (for dequeuing not caring if TCP1 is getting starved). I am putting UDP2 ,TCP2 and TCP3 in a different queue (thus keeping the average queue length almost constant) (say q2)and applying RED on each of TCP2 and TCP3 (the application of the two REDs being independent of each other). Here also I am providing priority to UDP2 (without caring if TCP2 or TCP3 is getting starved ). To schedule between q1 and q2, I am using WRR and to schedule between UDP1 and TCP1, I am using prio. For implementing q2, I am currently putting UDP2,TCP2 and TCP3 in 3 different virtual queues and applying GRED with grio. I am providing UDP2 the highest priority and providing TCP2 and TCP3 equal priorities. To ensure that RED does not apply on the UDP2, I have set Tmax=Tmin so that Pbmax=1. But the results I am getting with this configuration do not match with the results that I have got from the simulations. So I want to implement this stuff such that the UDP2 gets highest priority among the three, is not included while calculating the total average queue length and the qave used for the application of REDs on TCP2 and TCP3 should be equal to the qave of tcp2+ qave of tcp3. To schedule between TCP2 and TCP3, I want to use WRR or something that gives equal priority and prevents the starvation of any of these. Regards, Rahul -- ---------------------- "The fear you let build up in your mind is worse than the situation that actually exists" from "who moved my cheese" --------------------------------------------------------------------------------- Rahul Hari Senior Under Grad. Student, Department of CSE, ITBHU, Varanasi. Ph: +91-9845347020 rahul.hari@cse06.itbhu.org ------------------------------------------------------------------------------------------ > >On Mon, 2005-06-06 at 13:39 +0200, Thomas Graf wrote: > > > Use a prio qdisc with RED leaf qdiscs. RED and GREDs purpose is to > > calculate a marking probability and not to provide any prioritizing > > schemes. > >Prioritization is still implicitly provided if you vary the queue >lengths or the drop probabilities. >For example, if you set everything to be exactly the same, and varied >only the drop probability - the VQ with the highest drop probability >will be less important (i.e relatively more of its packets will be >dropped; recall: the drop decision is made before the packet is queued). > >cheers, >jamal > > > From romieu@fr.zoreil.com Mon Jun 6 11:12:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 11:12:40 -0700 (PDT) Received: from fr.zoreil.com (electric-eye.fr.zoreil.com [213.41.134.224]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56ICXXq013378 for ; Mon, 6 Jun 2005 11:12:34 -0700 Received: from electric-eye.fr.zoreil.com (localhost.localdomain [127.0.0.1]) by fr.zoreil.com (8.13.1/8.12.1) with ESMTP id j56I8JdY029814; Mon, 6 Jun 2005 20:08:19 +0200 Received: (from romieu@localhost) by electric-eye.fr.zoreil.com (8.13.1/8.13.1/Submit) id j56I8EKp029813; Mon, 6 Jun 2005 20:08:14 +0200 Date: Mon, 6 Jun 2005 20:08:13 +0200 From: Francois Romieu To: Wolfgang Empacher Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: Kernel 2.4.31 - netdriver r8169 Message-ID: <20050606180813.GA29537@electric-eye.fr.zoreil.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-Organisation: Land of Sunshine Inc. X-archive-position: 2133 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: romieu@fr.zoreil.com Precedence: bulk X-list: netdev Content-Length: 376 Lines: 12 Wolfgang Empacher : [...] > in kernel 2.4.31 there is version 1.2 of r8169 driver in use. this version > doesn't work well (RESETS of the device many and all the times). using > version 1.6 of this driver performs smooth and well. Where did you get your 1.6 version from ? (netdev added to Cc: as per the r8169 entry in the MAINTAINERS file) -- Ueimor From tgraf@suug.ch Mon Jun 6 11:28:59 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 11:29:02 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56ISwXq014464 for ; Mon, 6 Jun 2005 11:28:58 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 84B771C0EE; Mon, 6 Jun 2005 20:28:14 +0200 (CEST) Date: Mon, 6 Jun 2005 20:28:14 +0200 From: Thomas Graf To: rahul.hari@cse06.itbhu.org Cc: hadi@cyberus.ca, diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Linux Diffserv] GRED queueing discipline and the filesch_gred.c Message-ID: <20050606182814.GI15391@postel.suug.ch> References: <4532f31705060610486ef106a1@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4532f31705060610486ef106a1@mail.gmail.com> X-archive-position: 2134 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1015 Lines: 17 * Rahul Hari <4532f31705060610486ef106a1@mail.gmail.com> 2005-06-06 23:18 > UDP1 and TCP1, I am using prio. For implementing q2, I am currently > putting UDP2,TCP2 and TCP3 in 3 different virtual queues and applying > GRED with grio. I am providing UDP2 the highest priority and providing > TCP2 and TCP3 equal priorities. To ensure that RED does not apply on > the UDP2, I have set Tmax=Tmin so that Pbmax=1. But the results I am > getting with this configuration do not match with the results that I > have got from the simulations. I assume Tmax being qth_max so you basically disable probability drops which is the main point of RED. What you do is about equal as a simple FIFO with hard queue limit comparing against a EWMA based queue length. Depending on whether you want UDP2 to starve out the others use either prio or cbq/htb and a GRED in rio mode with equal vq prios for TCP2 and TCP3. The drops should be roughly proportional to their bandwidth share but I'm not sure if this is fair enough for you. From niv@us.ibm.com Mon Jun 6 11:32:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 11:32:16 -0700 (PDT) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56IWBXq015155 for ; Mon, 6 Jun 2005 11:32:11 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j56IUr9q707826 for ; Mon, 6 Jun 2005 14:30:57 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j56IUrXR238564 for ; Mon, 6 Jun 2005 12:30:53 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j56IUn4O005941 for ; Mon, 6 Jun 2005 12:30:49 -0600 Received: from [9.47.22.158] (dyn9047022158.beaverton.ibm.com [9.47.22.158]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j56IUnMK005880; Mon, 6 Jun 2005 12:30:49 -0600 Message-ID: <42A49658.7060608@us.ibm.com> Date: Mon, 06 Jun 2005 11:30:48 -0700 From: Nivedita Singhvi User-Agent: Mozilla Thunderbird 0.8 (X11/20041020) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jonathan Day CC: netdev@oss.sgi.com Subject: Re: Automated linux kernel testing results References: <20050604050123.9897.qmail@web31504.mail.mud.yahoo.com> In-Reply-To: <20050604050123.9897.qmail@web31504.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2135 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1152 Lines: 35 Jonathan Day wrote: > What I have not (yet) seen is any work on relating the > results. Is a bug in the design? The implementation? > Some combination thereof? Is something correctly > written but not functioning because something it > depends on isn't working correctly? Currently, you can get some idea (kernel didn't build, machine couldn't reboot, or if the system crashes during the tests, crash info etc. Looking into whether the cause is a design bug or an implementation bug is likely beyond automation. > It would even be useful if we could cross-reference > some of the benchmarks with the Linux graphing > project, so that we could see how the complexity of I believe they do (ping Martin for details) have some plans to graph stuff, and possibly info could be sucked out of the data/results provided to feed other people's needs. > Test suites are necessary. Test suites are great. > Anyone working on a test suite deserves many kudos and > much praise. Test suites that are relatable enough > that you can see the same problem from different > angles -- those are worth their printout weight in > gold. Yeah. :). thanks, Nivedita From tkoponen@iki.fi Mon Jun 6 12:00:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 12:00:50 -0700 (PDT) Received: from twilight.cs.hut.fi (twilight.cs.hut.fi [130.233.40.5]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56J0kXq017956 for ; Mon, 6 Jun 2005 12:00:47 -0700 Received: by twilight.cs.hut.fi (Postfix, from userid 60001) id 245452DD3; Mon, 6 Jun 2005 21:59:44 +0300 (EEST) Received: from [127.0.0.1] (kekkonen.cs.hut.fi [130.233.41.50]) by twilight.cs.hut.fi (Postfix) with ESMTP id 2CB182DBF for ; Mon, 6 Jun 2005 21:59:42 +0300 (EEST) Mime-Version: 1.0 (Apple Message framework v622) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; format=flowed To: netdev@oss.sgi.com From: Teemu Koponen Subject: New address announcements in RTMGRP_IPV4_IFADDR netlink group Date: Mon, 6 Jun 2005 11:59:38 -0700 X-Mailer: Apple Mail (2.622) X-archive-position: 2136 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tkoponen@iki.fi Precedence: bulk X-list: netdev Content-Length: 1061 Lines: 29 Netlink developers and gurus, While fine-tuning the handover speed for a certain L3 mobility daemon under Linux 2.6.11.10, I stumbled into the following behavior which intuitively does not follow the semantics of the RTMGRP_IPV4_IFADDR group: 0) A userspace daemon process is running and listening to the broadcast group. 1) Address is inserted to an interface (ip addr add ... at shell). 2) The daemon receives a NEWADDR message, just as is should, but the daemon is unable to bind to the address *immediately* (actually in the function that processes the netlink message). The result is "cannot assign an address" from the bind call. However, if I do insert a single nanosleep, even with an arbitrary low sleep value, before the bind call, the bind then succeeds. So, what is the semantics of NEWADDR? Should the address be bindable right after receiving the message? Or is there a race-condition between userspace and kernel that the inserted sleep helps to overcome by letting the kernel to run again before the bind call? TIA, Teemu -- From kamenzky@inf.fu-berlin.de Mon Jun 6 12:09:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 12:09:35 -0700 (PDT) Received: from math.fu-berlin.de (leibniz.math.fu-berlin.de [160.45.40.10]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56J9TXq018797 for ; Mon, 6 Jun 2005 12:09:30 -0700 Received: (qmail 12495 invoked from network); 6 Jun 2005 21:08:26 +0200 Received: from lusin.mi.fu-berlin.de (HELO mi.fu-berlin.de) (160.45.113.91) by leibniz.math.fu-berlin.de with SMTP; 6 Jun 2005 21:08:26 +0200 Received: (qmail 10674 invoked by uid 9804); 6 Jun 2005 21:08:25 +0200 Received: from localhost (HELO mi.fu-berlin.de) (127.0.0.1) by localhost with SMTP; 6 Jun 2005 21:08:23 +0200 Received: (qmail 10575 invoked by uid 9804); 6 Jun 2005 21:08:23 +0200 Received: from leibniz.math.fu-berlin.de (HELO math.fu-berlin.de) (160.45.40.10) by lusin.mi.fu-berlin.de with SMTP; 6 Jun 2005 21:08:23 +0200 Received: (Qmail 12464 invoked from network); 6 Jun 2005 21:08:23 +0200 Received: From rosine141.inf.fu-berlin.de (HELO ?160.45.116.141?) (160.45.116.141) by leibniz.math.fu-berlin.de with SMTP; 6 Jun 2005 19:08:23 -0000 X-Envelope-Sender: kamenzky@inf.fu-berlin.de X-Remote-IP: 160.45.116.141 Mime-Version: 1.0 (Apple Message framework v622) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Nico Subject: OT: Survey facing design patterns and communication Date: Mon, 6 Jun 2005 21:10:41 +0200 To: netdev@oss.sgi.com X-Mailer: Apple Mail (2.622) X-archive-position: 2137 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kamenzky@inf.fu-berlin.de Precedence: bulk X-list: netdev Content-Length: 631 Lines: 22 Hello everybody! We are a group of students at "Freie Universitaet Berlin". As part of our computer science studies we are going to do a survey facing the use of design patterns in communication. Examples of design patterns are "Abstract Factory", "Singleton", "Composite", "Iterator" and "Listener". If you know what we are talking about, you are welcome to take part in our survey. It takes about 5 minutes to fill out the form. Just jump to: http://study.beatdepot.de If you agree, we will send you the results of our survey. Thanks in advance for your participation! And sorry for the interruption of your discussion. From rahulhsaxena@hotmail.com Mon Jun 6 12:13:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 12:13:15 -0700 (PDT) Received: from hotmail.com (bay24-f18.bay24.hotmail.com [64.4.18.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56JDBXq020281 for ; Mon, 6 Jun 2005 12:13:11 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Mon, 6 Jun 2005 12:12:09 -0700 Message-ID: Received: from 144.16.64.4 by by24fd.bay24.hotmail.msn.com with HTTP; Mon, 06 Jun 2005 19:12:09 GMT X-Originating-IP: [144.16.64.4] X-Originating-Email: [rahulhsaxena@hotmail.com] X-Sender: rahulhsaxena@hotmail.com In-Reply-To: <20050606121527.GE15391@postel.suug.ch> From: "rahul hari" To: tgraf@suug.ch, hadi@cyberus.ca Cc: diffserv-general@lists.sourceforge.net, netdev@oss.sgi.com Subject: Re: [Linux Diffserv] GRED queueing discipline and the file sch_gred.c Date: Tue, 07 Jun 2005 00:42:09 +0530 Mime-Version: 1.0 Content-Type: text/plain; format=flowed X-OriginalArrivalTime: 06 Jun 2005 19:12:09.0587 (UTC) FILETIME=[A3055C30:01C56ACB] X-archive-position: 2138 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rahulhsaxena@hotmail.com Precedence: bulk X-list: netdev Content-Length: 3656 Lines: 80 Thanks for all the suggestions Jamal and Thomas. From what you people have been suggesting, i feel that i should be giving a detailed explaination of the problem I am currently working on. I have divided all the traffic on a network into 5 categories : Real time video (UDP1),Real time audio (UDP2), TCP not requiring any QoS (TCP1), TCP requiring QoS but with the size of the entire transaction very low(TCP2), and TCP requiring QoS with the size of the transaction in several MBs (TCP3). Now I am putting UDP1 and TCP1 in one particular queue (say q1) and giving priority to UDP1 (for dequeuing not caring if TCP1 is getting starved). I am putting UDP2 ,TCP2 and TCP3 in a different queue (thus keeping the average queue length almost constant) (say q2)and applying RED on each of TCP2 and TCP3 (the application of the two REDs being independent of each other). Here also I am providing priority to UDP2 (without caring if TCP2 or TCP3 is getting starved ). To schedule between q1 and q2, I am using WRR and to schedule between UDP1 and TCP1, I am using prio. For implementing q2, I am currently putting UDP2,TCP2 and TCP3 in 3 different virtual queues and applying GRED with grio. I am providing UDP2 the highest priority and providing TCP2 and TCP3 equal priorities. To ensure that RED does not apply on the UDP2, I have set Tmax=Tmin so that Pbmax=1. But the results I am getting with this configuration do not match with the results that I have got from the simulations. So I want to implement this stuff such that the UDP2 gets highest priority among the three, is not included while calculating the total average queue length and the qave used for the application of REDs on TCP2 and TCP3 should be equal to the qave of tcp2+ qave of tcp3. To schedule between TCP2 and TCP3, I want to use WRR or something that gives equal priority and prevents the starvation of any of these. PS: please send any further replies to rahul.hari@cse06.itbhu.org instead of this account Regards, Rahul ------- "The fear you let build up in your mind is worse than the situation that actually exists" taken from "who moved my cheese" ----------------------------------------------------------------------------- Rahul Hari Senior Undergraduate Student, Department of CSE, ITBHU, Varanasi. Ph: +91-9845347020 ----------------------------------------------------------------------------- > >* jamal <1118058859.6266.126.camel@localhost.localdomain> 2005-06-06 07:54 > > On Mon, 2005-06-06 at 13:39 +0200, Thomas Graf wrote: > > > > > Use a prio qdisc with RED leaf qdiscs. RED and GREDs purpose is to > > > calculate a marking probability and not to provide any prioritizing > > > schemes. > > > > Prioritization is still implicitly provided if you vary the queue > > lengths or the drop probabilities. > > For example, if you set everything to be exactly the same, and varied > > only the drop probability - the VQ with the highest drop probability > > will be less important (i.e relatively more of its packets will be > > dropped; recall: the drop decision is made before the packet is queued). > >Absolutely, what I meant is that GRED does not take influence on the >actual ordering of packets not dropped. The priority together with >the qavg parameters and the thresholds only have influence on the >probability a packet gets marked/dropped, sure this is prioritization >as well but Rahul wanted to have one VQ strave out another VQ >completely. My point is that this is not possible with GRED. _________________________________________________________________ Think Rani is the best? http://server1.msn.co.in/sp05/iifa/ Make sure she wins the award. From davem@davemloft.net Mon Jun 6 12:48:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 12:48:58 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56JmmXq025972 for ; Mon, 6 Jun 2005 12:48:48 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfNZF-0003gi-LX; Mon, 06 Jun 2005 12:47:25 -0700 Date: Mon, 06 Jun 2005 12:47:25 -0700 (PDT) Message-Id: <20050606.124725.85409439.davem@davemloft.net> To: john.ronciak@intel.com Cc: mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, mitch.a.williams@intel.com, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450C002@orsmsx408> References: <468F3FDA28AA87429AD807992E22D07E0450C002@orsmsx408> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2139 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1304 Lines: 31 From: "Ronciak, John" Date: Mon, 6 Jun 2005 08:35:26 -0700 > We are dropping packets at the HW level (FIFO errors) with 256 > descriptors and the default weight of 64. As we said reducing the > weight eliminates this which is understandable since the driver is being > serviced more fequently. We also hacked the driver to do a buffer > allocation per packet sent up the stack. This reduced the number of > dropped pacekts by about 80% but it was still a significant number of > drops (190K to 39K dropped). So I don't think this is where the problem > is. This is also comfimed with the tg3 driver doing the buffer update > to the HW every 25 descriptors. I reach a different conclusion, sorry. :-) Here is the invariant: If you force the e1000 driver to do RX replenishment every N packets it should reduce the packet drops the same (in the single NIC case) as if you reduced the dev->weight to that same value N. You have two test cases, single NIC and multi-NIC, so you should be very clear in which case your drop number applies to. They are two totally different problems. > We did not up the descriptor ring size with the default weight but will > try this today and report back. Thanks for all of your test data and hard work so far. It's very valuable. From dlstevens@us.ibm.com Mon Jun 6 12:49:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 12:49:50 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56JnjXq026248 for ; Mon, 6 Jun 2005 12:49:46 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e31.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j56Jmgua512976 for ; Mon, 6 Jun 2005 15:48:42 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j56JmgXR189852 for ; Mon, 6 Jun 2005 13:48:42 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j56JmSqr017129 for ; Mon, 6 Jun 2005 13:48:29 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j56JmSxk017110; Mon, 6 Jun 2005 13:48:28 -0600 To: davem@davemloft.net, yoshfuji@linux-ipv6.org Cc: netdev@oss.sgi.com MIME-Version: 1.0 Subject: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Mon, 6 Jun 2005 13:48:26 -0600 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/06/2005 13:48:27 Content-Type: multipart/mixed; boundary="=_mixed 006CCE2C88257018_=" X-archive-position: 2140 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 38257 Lines: 787 --=_mixed 006CCE2C88257018_= Content-Type: text/plain; charset="US-ASCII" I've been looking at RFC 3542 (Advanced Sockets API) compliance, and found the following: ("x" is one of {PKTINFO, HOPLIMIT, RTHDR, DSTOPTS, TCLASS }) What RFC 3542 says: 1) IPV6_x as socket options specify "sticky" option values; getsockopt() returns the current values of the sticky options setsockopt() sets the values for future sends 2) IPV6_RECVx are boolean socket options indicated whether the particular field will be returned in ancillary data on a recvmsg() getsockopt() gets the current value (1 or 0) setsockopt() sets or clears the boolean value 3) Ancillary data (send and receive) use IPV6_x for the corresponding data item What current kernel does: 1) IPV6_x are boolean options 2) the sticky versions are not implemented 3) TCLASS is not implemented The patch below adds sending and receiving of traffic class, the definitions for IPV6_RECVx and changes the boolean socket options to their RFC 3542 names. The original names are still there for use with sticky options in the future (not included here), and as the ancillary data message types. The bad news: This patch changes the argument lists of ip6_append_data() and datagram_send_ctl(). This, because traffic class is not an extension header, but part of the IPv6 header. This is analogous to the hop limit, which is an explicit argument to these functions. I've tested these pieces, but I have a couple open questions which may be relevant (will continue looking myself...): 1) In ipv6_pinfo, there is a "hop_limit" field at the top level and another "cork.hop_limit". Why aren't these the same? 2) The (old name) IPV6_RTHDR socket option allows a value of "2", used by TCP. Still need to see what that's about for relevance to other options (but this code leaves that unchanged, except the name). +-DLS in-line for view, attached for applying Signed-off-by: David L Stevens diff -ruNp linux-2.6.11.10/include/linux/in6.h linux-2.6.11.10T2/include/linux/in6.h --- linux-2.6.11.10/include/linux/in6.h 2005-05-16 10:51:43.000000000 -0700 +++ linux-2.6.11.10T2/include/linux/in6.h 2005-05-23 14:12:59.000000000 -0700 @@ -172,6 +172,7 @@ struct in6_flowlabel_req #define IPV6_V6ONLY 26 #define IPV6_JOIN_ANYCAST 27 #define IPV6_LEAVE_ANYCAST 28 +#define IPV6_TCLASS 30 /* IPV6_MTU_DISCOVER values */ #define IPV6_PMTUDISC_DONT 0 @@ -184,6 +185,12 @@ struct in6_flowlabel_req #define IPV6_IPSEC_POLICY 34 #define IPV6_XFRM_POLICY 35 +#define IPV6_RTHDRDSTOPTS 36 +#define IPV6_RECVPKTINFO 37 +#define IPV6_RECVHOPLIMIT 38 +#define IPV6_RECVRTHDR 39 +#define IPV6_RECVHOPOPTS 40 +#define IPV6_RECVDSTOPTS 41 /* * Multicast: @@ -198,4 +205,6 @@ struct in6_flowlabel_req * MCAST_MSFILTER 48 */ +#define IPV6_RECVTCLASS 49 + #endif diff -ruNp linux-2.6.11.10/include/linux/ipv6.h linux-2.6.11.10T2/include/linux/ipv6.h --- linux-2.6.11.10/include/linux/ipv6.h 2005-05-16 10:51:43.000000000 -0700 +++ linux-2.6.11.10T2/include/linux/ipv6.h 2005-05-24 13:18:27.000000000 -0700 @@ -221,7 +221,8 @@ struct ipv6_pinfo { rxhlim:1, hopopts:1, dstopts:1, - rxflow:1; + rxflow:1, + rxtclass:1; } bits; __u8 all; } rxopt; @@ -244,6 +245,7 @@ struct ipv6_pinfo { struct ipv6_txoptions *opt; struct rt6_info *rt; int hop_limit; + int tclass; } cork; }; diff -ruNp linux-2.6.11.10/include/net/ipv6.h linux-2.6.11.10T2/include/net/ipv6.h --- linux-2.6.11.10/include/net/ipv6.h 2005-05-16 10:51:49.000000000 -0700 +++ linux-2.6.11.10T2/include/net/ipv6.h 2005-05-24 14:57:23.000000000 -0700 @@ -347,6 +347,7 @@ extern int ip6_append_data(struct sock int length, int transhdrlen, int hlimit, + int tclass, struct ipv6_txoptions *opt, struct flowi *fl, struct rt6_info *rt, diff -ruNp linux-2.6.11.10/include/net/transp_v6.h linux-2.6.11.10T2/include/net/transp_v6.h --- linux-2.6.11.10/include/net/transp_v6.h 2005-05-16 10:51:51.000000000 -0700 +++ linux-2.6.11.10T2/include/net/transp_v6.h 2005-05-24 14:04:11.000000000 -0700 @@ -37,7 +37,7 @@ extern int datagram_recv_ctl(struct so extern int datagram_send_ctl(struct msghdr *msg, struct flowi *fl, struct ipv6_txoptions *opt, - int *hlimit); + int *hlimit, int *tclass); #define LOOPBACK4_IPV6 __constant_htonl(0x7f000006) diff -ruNp linux-2.6.11.10/net/ipv6/datagram.c linux-2.6.11.10T2/net/ipv6/datagram.c --- linux-2.6.11.10/net/ipv6/datagram.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T2/net/ipv6/datagram.c 2005-05-24 14:03:56.000000000 -0700 @@ -388,6 +388,11 @@ int datagram_recv_ctl(struct sock *sk, s int hlim = skb->nh.ipv6h->hop_limit; put_cmsg(msg, SOL_IPV6, IPV6_HOPLIMIT, sizeof(hlim), &hlim); } + if (np->rxopt.bits.rxtclass) { + u8 tclass = (skb->nh.ipv6h->priority << 4) | + ((skb->nh.ipv6h->flow_lbl[0]>>4) & 0xf); + put_cmsg(msg, SOL_IPV6, IPV6_TCLASS, sizeof(tclass), &tclass); + } if (np->rxopt.bits.rxflow && (*(u32*)skb->nh.raw & IPV6_FLOWINFO_MASK)) { u32 flowinfo = *(u32*)skb->nh.raw & IPV6_FLOWINFO_MASK; @@ -414,7 +419,7 @@ int datagram_recv_ctl(struct sock *sk, s int datagram_send_ctl(struct msghdr *msg, struct flowi *fl, struct ipv6_txoptions *opt, - int *hlimit) + int *hlimit, int *tclass) { struct in6_pktinfo *src_info; struct cmsghdr *cmsg; @@ -587,6 +592,15 @@ int datagram_send_ctl(struct msghdr *msg *hlimit = *(int *)CMSG_DATA(cmsg); break; + case IPV6_TCLASS: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(int))) { + err = -EINVAL; + goto exit_f; + } + + *tclass = *(int *)CMSG_DATA(cmsg); + break; + default: LIMIT_NETDEBUG( printk(KERN_DEBUG "invalid cmsg type: %d\n", cmsg->cmsg_type)); diff -ruNp linux-2.6.11.10/net/ipv6/icmp.c linux-2.6.11.10T2/net/ipv6/icmp.c --- linux-2.6.11.10/net/ipv6/icmp.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T2/net/ipv6/icmp.c 2005-05-24 15:05:14.000000000 -0700 @@ -287,7 +287,7 @@ void icmpv6_send(struct sk_buff *skb, in int iif = 0; int addr_type = 0; int len; - int hlimit; + int hlimit, tclass; int err = 0; if ((u8*)hdr < skb->head || (u8*)(hdr+1) > skb->tail) @@ -381,6 +381,9 @@ void icmpv6_send(struct sk_buff *skb, in hlimit = np->hop_limit; if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; msg.skb = skb; msg.offset = skb->nh.raw - skb->data; @@ -398,7 +401,7 @@ void icmpv6_send(struct sk_buff *skb, in err = ip6_append_data(sk, icmpv6_getfrag, &msg, len + sizeof(struct icmp6hdr), sizeof(struct icmp6hdr), - hlimit, NULL, &fl, (struct rt6_info*)dst, + hlimit, tclass, NULL, &fl, (struct rt6_info*)dst, MSG_DONTWAIT); if (err) { ip6_flush_pending_frames(sk); @@ -432,6 +435,7 @@ static void icmpv6_echo_reply(struct sk_ struct dst_entry *dst; int err = 0; int hlimit; + int tclass; saddr = &skb->nh.ipv6h->daddr; @@ -467,15 +471,18 @@ static void icmpv6_echo_reply(struct sk_ hlimit = np->hop_limit; if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; idev = in6_dev_get(skb->dev); msg.skb = skb; msg.offset = 0; - err = ip6_append_data(sk, icmpv6_getfrag, &msg, skb->len + sizeof(struct icmp6hdr), - sizeof(struct icmp6hdr), hlimit, NULL, &fl, - (struct rt6_info*)dst, MSG_DONTWAIT); + err = ip6_append_data(sk, icmpv6_getfrag, &msg, skb->len + + sizeof(struct icmp6hdr), sizeof(struct icmp6hdr), hlimit, + tclass, NULL, &fl, (struct rt6_info*)dst, MSG_DONTWAIT); if (err) { ip6_flush_pending_frames(sk); diff -ruNp linux-2.6.11.10/net/ipv6/ip6_flowlabel.c linux-2.6.11.10T2/net/ipv6/ip6_flowlabel.c --- linux-2.6.11.10/net/ipv6/ip6_flowlabel.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T2/net/ipv6/ip6_flowlabel.c 2005-05-24 14:04:28.000000000 -0700 @@ -311,7 +311,7 @@ fl_create(struct in6_flowlabel_req *freq msg.msg_control = (void*)(fl->opt+1); flowi.oif = 0; - err = datagram_send_ctl(&msg, &flowi, fl->opt, &junk); + err = datagram_send_ctl(&msg, &flowi, fl->opt, &junk, &junk); if (err) goto done; err = -EINVAL; diff -ruNp linux-2.6.11.10/net/ipv6/ip6_output.c linux-2.6.11.10T2/net/ipv6/ip6_output.c --- linux-2.6.11.10/net/ipv6/ip6_output.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T2/net/ipv6/ip6_output.c 2005-05-24 14:58:51.000000000 -0700 @@ -211,7 +211,7 @@ int ip6_xmit(struct sock *sk, struct sk_ struct ipv6hdr *hdr; u8 proto = fl->proto; int seg_len = skb->len; - int hlimit; + int hlimit, tclass; u32 mtu; if (opt) { @@ -253,6 +253,13 @@ int ip6_xmit(struct sock *sk, struct sk_ hlimit = np->hop_limit; if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); + tclass = -1; + if (np) + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; + hdr->priority = (np->cork.tclass>>4) &0xf; + hdr->flow_lbl[0] |= (np->cork.tclass & 0xf)<<4; hdr->payload_len = htons(seg_len); hdr->nexthdr = proto; @@ -806,10 +813,11 @@ out_err_release: return err; } -int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), - void *from, int length, int transhdrlen, - int hlimit, struct ipv6_txoptions *opt, struct flowi *fl, struct rt6_info *rt, - unsigned int flags) +int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, + int offset, int len, int odd, struct sk_buff *skb), + void *from, int length, int transhdrlen, + int hlimit, int tclass, struct ipv6_txoptions *opt, struct flowi *fl, + struct rt6_info *rt, unsigned int flags) { struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); @@ -847,6 +855,7 @@ int ip6_append_data(struct sock *sk, int np->cork.rt = rt; inet->cork.fl = *fl; np->cork.hop_limit = hlimit; + np->cork.tclass = tclass; inet->cork.fragsize = mtu = dst_pmtu(&rt->u.dst); inet->cork.length = 0; sk->sk_sndmsg_page = NULL; @@ -1130,6 +1139,10 @@ int ip6_push_pending_frames(struct sock *(u32*)hdr = fl->fl6_flowlabel | htonl(0x60000000); + /* traffic class */ + hdr->priority = (np->cork.tclass>>4) & 0xf; + hdr->flow_lbl[0] |= (np->cork.tclass & 0xf)<<4; + if (skb->len <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); else diff -ruNp linux-2.6.11.10/net/ipv6/ipv6_sockglue.c linux-2.6.11.10T2/net/ipv6/ipv6_sockglue.c --- linux-2.6.11.10/net/ipv6/ipv6_sockglue.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T2/net/ipv6/ipv6_sockglue.c 2005-06-06 11:52:15.000000000 -0700 @@ -208,33 +208,38 @@ int ipv6_setsockopt(struct sock *sk, int retv = 0; break; - case IPV6_PKTINFO: + case IPV6_RECVPKTINFO: np->rxopt.bits.rxinfo = valbool; retv = 0; break; - case IPV6_HOPLIMIT: + case IPV6_RECVHOPLIMIT: np->rxopt.bits.rxhlim = valbool; retv = 0; break; - case IPV6_RTHDR: + case IPV6_RECVRTHDR: if (val < 0 || val > 2) goto e_inval; np->rxopt.bits.srcrt = val; retv = 0; break; - case IPV6_HOPOPTS: + case IPV6_RECVHOPOPTS: np->rxopt.bits.hopopts = valbool; retv = 0; break; - case IPV6_DSTOPTS: + case IPV6_RECVDSTOPTS: np->rxopt.bits.dstopts = valbool; retv = 0; break; + case IPV6_RECVTCLASS: + np->rxopt.bits.rxtclass = valbool; + retv = 0; + break; + case IPV6_FLOWINFO: np->rxopt.bits.rxflow = valbool; retv = 0; @@ -274,7 +279,7 @@ int ipv6_setsockopt(struct sock *sk, int msg.msg_controllen = optlen; msg.msg_control = (void*)(opt+1); - retv = datagram_send_ctl(&msg, &fl, opt, &junk); + retv = datagram_send_ctl(&msg, &fl, opt, &junk, &junk); if (retv) goto done; update: @@ -620,26 +625,30 @@ int ipv6_getsockopt(struct sock *sk, int val = np->ipv6only; break; - case IPV6_PKTINFO: + case IPV6_RECVPKTINFO: val = np->rxopt.bits.rxinfo; break; - case IPV6_HOPLIMIT: + case IPV6_RECVHOPLIMIT: val = np->rxopt.bits.rxhlim; break; - case IPV6_RTHDR: + case IPV6_RECVRTHDR: val = np->rxopt.bits.srcrt; break; - case IPV6_HOPOPTS: + case IPV6_RECVHOPOPTS: val = np->rxopt.bits.hopopts; break; - case IPV6_DSTOPTS: + case IPV6_RECVDSTOPTS: val = np->rxopt.bits.dstopts; break; + case IPV6_RECVTCLASS: + val = np->rxopt.bits.rxtclass; + break; + case IPV6_FLOWINFO: val = np->rxopt.bits.rxflow; break; diff -ruNp linux-2.6.11.10/net/ipv6/raw.c linux-2.6.11.10T2/net/ipv6/raw.c --- linux-2.6.11.10/net/ipv6/raw.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T2/net/ipv6/raw.c 2005-05-24 15:09:42.000000000 -0700 @@ -617,6 +617,7 @@ static int rawv6_sendmsg(struct kiocb *i struct flowi fl; int addr_len = msg->msg_namelen; int hlimit = -1; + int tclass = -1; u16 proto; int err; @@ -702,7 +703,7 @@ static int rawv6_sendmsg(struct kiocb *i memset(opt, 0, sizeof(struct ipv6_txoptions)); opt->tot_len = sizeof(struct ipv6_txoptions); - err = datagram_send_ctl(msg, &fl, opt, &hlimit); + err = datagram_send_ctl(msg, &fl, opt, &hlimit, &tclass); if (err < 0) { fl6_sock_release(flowlabel); return err; @@ -758,6 +759,12 @@ static int rawv6_sendmsg(struct kiocb *i hlimit = dst_metric(dst, RTAX_HOPLIMIT); } + if (tclass < 0) { + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; + } + if (msg->msg_flags&MSG_CONFIRM) goto do_confirm; @@ -766,8 +773,9 @@ back_from_confirm: err = rawv6_send_hdrinc(sk, msg->msg_iov, len, &fl, (struct rt6_info*)dst, msg->msg_flags); } else { lock_sock(sk); - err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, len, 0, - hlimit, opt, &fl, (struct rt6_info*)dst, msg->msg_flags); + err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, + len, 0, hlimit, tclass, opt, &fl, (struct rt6_info*)dst, + msg->msg_flags); if (err) ip6_flush_pending_frames(sk); diff -ruNp linux-2.6.11.10/net/ipv6/udp.c linux-2.6.11.10T2/net/ipv6/udp.c --- linux-2.6.11.10/net/ipv6/udp.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T2/net/ipv6/udp.c 2005-05-24 15:11:58.000000000 -0700 @@ -637,6 +637,7 @@ static int udpv6_sendmsg(struct kiocb *i int addr_len = msg->msg_namelen; int ulen = len; int hlimit = -1; + int tclass = -1; int corkreq = up->corkflag || msg->msg_flags&MSG_MORE; int err; @@ -758,7 +759,7 @@ do_udp_sendmsg: memset(opt, 0, sizeof(struct ipv6_txoptions)); opt->tot_len = sizeof(*opt); - err = datagram_send_ctl(msg, fl, opt, &hlimit); + err = datagram_send_ctl(msg, fl, opt, &hlimit, &tclass); if (err < 0) { fl6_sock_release(flowlabel); return err; @@ -812,6 +813,11 @@ do_udp_sendmsg: if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); } + if (tclass < 0) { + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; + } if (msg->msg_flags&MSG_CONFIRM) goto do_confirm; @@ -832,9 +838,10 @@ back_from_confirm: do_append_data: up->len += ulen; - err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen, sizeof(struct udphdr), - hlimit, opt, fl, (struct rt6_info*)dst, - corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags); + err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen, + sizeof(struct udphdr), hlimit, tclass, opt, fl, + (struct rt6_info*)dst, + corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags); if (err) udp_v6_flush_pending_frames(sk); else if (!corkreq) --=_mixed 006CCE2C88257018_= Content-Type: application/octet-stream; name="rfc3542.patch" Content-Disposition: attachment; filename="rfc3542.patch" Content-Transfer-Encoding: base64 ZGlmZiAtcnVOcCBsaW51eC0yLjYuMTEuMTAvaW5jbHVkZS9saW51eC9pbjYuaCBsaW51eC0yLjYu MTEuMTBUMi9pbmNsdWRlL2xpbnV4L2luNi5oCi0tLSBsaW51eC0yLjYuMTEuMTAvaW5jbHVkZS9s aW51eC9pbjYuaAkyMDA1LTA1LTE2IDEwOjUxOjQzLjAwMDAwMDAwMCAtMDcwMAorKysgbGludXgt Mi42LjExLjEwVDIvaW5jbHVkZS9saW51eC9pbjYuaAkyMDA1LTA1LTIzIDE0OjEyOjU5LjAwMDAw MDAwMCAtMDcwMApAQCAtMTcyLDYgKzE3Miw3IEBAIHN0cnVjdCBpbjZfZmxvd2xhYmVsX3JlcQog I2RlZmluZSBJUFY2X1Y2T05MWQkJMjYKICNkZWZpbmUgSVBWNl9KT0lOX0FOWUNBU1QJMjcKICNk ZWZpbmUgSVBWNl9MRUFWRV9BTllDQVNUCTI4CisjZGVmaW5lIElQVjZfVENMQVNTCQkzMAogCiAv KiBJUFY2X01UVV9ESVNDT1ZFUiB2YWx1ZXMgKi8KICNkZWZpbmUgSVBWNl9QTVRVRElTQ19ET05U CQkwCkBAIC0xODQsNiArMTg1LDEyIEBAIHN0cnVjdCBpbjZfZmxvd2xhYmVsX3JlcQogCiAjZGVm aW5lIElQVjZfSVBTRUNfUE9MSUNZCTM0CiAjZGVmaW5lIElQVjZfWEZSTV9QT0xJQ1kJMzUKKyNk ZWZpbmUgSVBWNl9SVEhEUkRTVE9QVFMJMzYKKyNkZWZpbmUgSVBWNl9SRUNWUEtUSU5GTwkzNwor I2RlZmluZSBJUFY2X1JFQ1ZIT1BMSU1JVAkzOAorI2RlZmluZSBJUFY2X1JFQ1ZSVEhEUgkJMzkK KyNkZWZpbmUgSVBWNl9SRUNWSE9QT1BUUwk0MAorI2RlZmluZSBJUFY2X1JFQ1ZEU1RPUFRTCTQx CiAKIC8qCiAgKiBNdWx0aWNhc3Q6CkBAIC0xOTgsNCArMjA1LDYgQEAgc3RydWN0IGluNl9mbG93 bGFiZWxfcmVxCiAgKiBNQ0FTVF9NU0ZJTFRFUgkJNDgKICAqLwogCisjZGVmaW5lIElQVjZfUkVD VlRDTEFTUwkJNDkKKwogI2VuZGlmCmRpZmYgLXJ1TnAgbGludXgtMi42LjExLjEwL2luY2x1ZGUv bGludXgvaXB2Ni5oIGxpbnV4LTIuNi4xMS4xMFQyL2luY2x1ZGUvbGludXgvaXB2Ni5oCi0tLSBs aW51eC0yLjYuMTEuMTAvaW5jbHVkZS9saW51eC9pcHY2LmgJMjAwNS0wNS0xNiAxMDo1MTo0My4w MDAwMDAwMDAgLTA3MDAKKysrIGxpbnV4LTIuNi4xMS4xMFQyL2luY2x1ZGUvbGludXgvaXB2Ni5o CTIwMDUtMDUtMjQgMTM6MTg6MjcuMDAwMDAwMDAwIC0wNzAwCkBAIC0yMjEsNyArMjIxLDggQEAg c3RydWN0IGlwdjZfcGluZm8gewogCQkJCXJ4aGxpbToxLAogCQkJCWhvcG9wdHM6MSwKIAkJCQlk c3RvcHRzOjEsCi0gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHJ4ZmxvdzoxOworICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICByeGZsb3c6MSwKKwkJCQlyeHRjbGFzczoxOwog CQl9IGJpdHM7CiAJCV9fdTgJCWFsbDsKIAl9IHJ4b3B0OwpAQCAtMjQ0LDYgKzI0NSw3IEBAIHN0 cnVjdCBpcHY2X3BpbmZvIHsKIAkJc3RydWN0IGlwdjZfdHhvcHRpb25zICpvcHQ7CiAJCXN0cnVj dCBydDZfaW5mbwkqcnQ7CiAJCWludCBob3BfbGltaXQ7CisJCWludCB0Y2xhc3M7CiAJfSBjb3Jr OwogfTsKIApkaWZmIC1ydU5wIGxpbnV4LTIuNi4xMS4xMC9pbmNsdWRlL25ldC9pcHY2LmggbGlu dXgtMi42LjExLjEwVDIvaW5jbHVkZS9uZXQvaXB2Ni5oCi0tLSBsaW51eC0yLjYuMTEuMTAvaW5j bHVkZS9uZXQvaXB2Ni5oCTIwMDUtMDUtMTYgMTA6NTE6NDkuMDAwMDAwMDAwIC0wNzAwCisrKyBs aW51eC0yLjYuMTEuMTBUMi9pbmNsdWRlL25ldC9pcHY2LmgJMjAwNS0wNS0yNCAxNDo1NzoyMy4w MDAwMDAwMDAgLTA3MDAKQEAgLTM0Nyw2ICszNDcsNyBAQCBleHRlcm4gaW50CQkJaXA2X2FwcGVu ZF9kYXRhKHN0cnVjdCBzb2NrCiAJCQkJCQlpbnQgbGVuZ3RoLAogCQkJCQkJaW50IHRyYW5zaGRy bGVuLAogCQkgICAgICAJCQkJaW50IGhsaW1pdCwKKwkJICAgICAgCQkJCWludCB0Y2xhc3MsCiAJ CQkJCQlzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKm9wdCwKIAkJCQkJCXN0cnVjdCBmbG93aSAqZmws CiAJCQkJCQlzdHJ1Y3QgcnQ2X2luZm8gKnJ0LApkaWZmIC1ydU5wIGxpbnV4LTIuNi4xMS4xMC9p bmNsdWRlL25ldC90cmFuc3BfdjYuaCBsaW51eC0yLjYuMTEuMTBUMi9pbmNsdWRlL25ldC90cmFu c3BfdjYuaAotLS0gbGludXgtMi42LjExLjEwL2luY2x1ZGUvbmV0L3RyYW5zcF92Ni5oCTIwMDUt MDUtMTYgMTA6NTE6NTEuMDAwMDAwMDAwIC0wNzAwCisrKyBsaW51eC0yLjYuMTEuMTBUMi9pbmNs dWRlL25ldC90cmFuc3BfdjYuaAkyMDA1LTA1LTI0IDE0OjA0OjExLjAwMDAwMDAwMCAtMDcwMApA QCAtMzcsNyArMzcsNyBAQCBleHRlcm4gaW50CQkJZGF0YWdyYW1fcmVjdl9jdGwoc3RydWN0IHNv CiBleHRlcm4gaW50CQkJZGF0YWdyYW1fc2VuZF9jdGwoc3RydWN0IG1zZ2hkciAqbXNnLAogCQkJ CQkJICBzdHJ1Y3QgZmxvd2kgKmZsLAogCQkJCQkJICBzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMgKm9w dCwKLQkJCQkJCSAgaW50ICpobGltaXQpOworCQkJCQkJICBpbnQgKmhsaW1pdCwgaW50ICp0Y2xh c3MpOwogCiAjZGVmaW5lCQlMT09QQkFDSzRfSVBWNgkJX19jb25zdGFudF9odG9ubCgweDdmMDAw MDA2KQogCmRpZmYgLXJ1TnAgbGludXgtMi42LjExLjEwL25ldC9pcHY2L2RhdGFncmFtLmMgbGlu dXgtMi42LjExLjEwVDIvbmV0L2lwdjYvZGF0YWdyYW0uYwotLS0gbGludXgtMi42LjExLjEwL25l dC9pcHY2L2RhdGFncmFtLmMJMjAwNS0wNS0xNiAxMDo1MjowMC4wMDAwMDAwMDAgLTA3MDAKKysr IGxpbnV4LTIuNi4xMS4xMFQyL25ldC9pcHY2L2RhdGFncmFtLmMJMjAwNS0wNS0yNCAxNDowMzo1 Ni4wMDAwMDAwMDAgLTA3MDAKQEAgLTM4OCw2ICszODgsMTEgQEAgaW50IGRhdGFncmFtX3JlY3Zf Y3RsKHN0cnVjdCBzb2NrICpzaywgcwogCQlpbnQgaGxpbSA9IHNrYi0+bmguaXB2NmgtPmhvcF9s aW1pdDsKIAkJcHV0X2Ntc2cobXNnLCBTT0xfSVBWNiwgSVBWNl9IT1BMSU1JVCwgc2l6ZW9mKGhs aW0pLCAmaGxpbSk7CiAJfQorCWlmIChucC0+cnhvcHQuYml0cy5yeHRjbGFzcykgeworCQl1OCB0 Y2xhc3MgPSAoc2tiLT5uaC5pcHY2aC0+cHJpb3JpdHkgPDwgNCkgfAorCQkJKChza2ItPm5oLmlw djZoLT5mbG93X2xibFswXT4+NCkgJiAweGYpOworCQlwdXRfY21zZyhtc2csIFNPTF9JUFY2LCBJ UFY2X1RDTEFTUywgc2l6ZW9mKHRjbGFzcyksICZ0Y2xhc3MpOworCX0KIAogCWlmIChucC0+cnhv cHQuYml0cy5yeGZsb3cgJiYgKCoodTMyKilza2ItPm5oLnJhdyAmIElQVjZfRkxPV0lORk9fTUFT SykpIHsKIAkJdTMyIGZsb3dpbmZvID0gKih1MzIqKXNrYi0+bmgucmF3ICYgSVBWNl9GTE9XSU5G T19NQVNLOwpAQCAtNDE0LDcgKzQxOSw3IEBAIGludCBkYXRhZ3JhbV9yZWN2X2N0bChzdHJ1Y3Qg c29jayAqc2ssIHMKIAogaW50IGRhdGFncmFtX3NlbmRfY3RsKHN0cnVjdCBtc2doZHIgKm1zZywg c3RydWN0IGZsb3dpICpmbCwKIAkJICAgICAgc3RydWN0IGlwdjZfdHhvcHRpb25zICpvcHQsCi0J CSAgICAgIGludCAqaGxpbWl0KQorCQkgICAgICBpbnQgKmhsaW1pdCwgaW50ICp0Y2xhc3MpCiB7 CiAJc3RydWN0IGluNl9wa3RpbmZvICpzcmNfaW5mbzsKIAlzdHJ1Y3QgY21zZ2hkciAqY21zZzsK QEAgLTU4Nyw2ICs1OTIsMTUgQEAgaW50IGRhdGFncmFtX3NlbmRfY3RsKHN0cnVjdCBtc2doZHIg Km1zZwogCQkJKmhsaW1pdCA9ICooaW50ICopQ01TR19EQVRBKGNtc2cpOwogCQkJYnJlYWs7CiAK KwkJY2FzZSBJUFY2X1RDTEFTUzoKKwkJCWlmIChjbXNnLT5jbXNnX2xlbiAhPSBDTVNHX0xFTihz aXplb2YoaW50KSkpIHsKKwkJCQllcnIgPSAtRUlOVkFMOworCQkJCWdvdG8gZXhpdF9mOworCQkJ fQorCisJCQkqdGNsYXNzID0gKihpbnQgKilDTVNHX0RBVEEoY21zZyk7CisJCQlicmVhazsKKwog CQlkZWZhdWx0OgogCQkJTElNSVRfTkVUREVCVUcoCiAJCQkJcHJpbnRrKEtFUk5fREVCVUcgImlu dmFsaWQgY21zZyB0eXBlOiAlZFxuIiwgY21zZy0+Y21zZ190eXBlKSk7CmRpZmYgLXJ1TnAgbGlu dXgtMi42LjExLjEwL25ldC9pcHY2L2ljbXAuYyBsaW51eC0yLjYuMTEuMTBUMi9uZXQvaXB2Ni9p Y21wLmMKLS0tIGxpbnV4LTIuNi4xMS4xMC9uZXQvaXB2Ni9pY21wLmMJMjAwNS0wNS0xNiAxMDo1 MjowMC4wMDAwMDAwMDAgLTA3MDAKKysrIGxpbnV4LTIuNi4xMS4xMFQyL25ldC9pcHY2L2ljbXAu YwkyMDA1LTA1LTI0IDE1OjA1OjE0LjAwMDAwMDAwMCAtMDcwMApAQCAtMjg3LDcgKzI4Nyw3IEBA IHZvaWQgaWNtcHY2X3NlbmQoc3RydWN0IHNrX2J1ZmYgKnNrYiwgaW4KIAlpbnQgaWlmID0gMDsK IAlpbnQgYWRkcl90eXBlID0gMDsKIAlpbnQgbGVuOwotCWludCBobGltaXQ7CisJaW50IGhsaW1p dCwgdGNsYXNzOwogCWludCBlcnIgPSAwOwogCiAJaWYgKCh1OCopaGRyIDwgc2tiLT5oZWFkIHx8 ICh1OCopKGhkcisxKSA+IHNrYi0+dGFpbCkKQEAgLTM4MSw2ICszODEsOSBAQCB2b2lkIGljbXB2 Nl9zZW5kKHN0cnVjdCBza19idWZmICpza2IsIGluCiAJCWhsaW1pdCA9IG5wLT5ob3BfbGltaXQ7 CiAJaWYgKGhsaW1pdCA8IDApCiAJCWhsaW1pdCA9IGRzdF9tZXRyaWMoZHN0LCBSVEFYX0hPUExJ TUlUKTsKKwl0Y2xhc3MgPSBucC0+Y29yay50Y2xhc3M7CisJaWYgKHRjbGFzcyA8IDApCisJCXRj bGFzcyA9IDA7CiAKIAltc2cuc2tiID0gc2tiOwogCW1zZy5vZmZzZXQgPSBza2ItPm5oLnJhdyAt IHNrYi0+ZGF0YTsKQEAgLTM5OCw3ICs0MDEsNyBAQCB2b2lkIGljbXB2Nl9zZW5kKHN0cnVjdCBz a19idWZmICpza2IsIGluCiAJZXJyID0gaXA2X2FwcGVuZF9kYXRhKHNrLCBpY21wdjZfZ2V0ZnJh ZywgJm1zZywKIAkJCSAgICAgIGxlbiArIHNpemVvZihzdHJ1Y3QgaWNtcDZoZHIpLAogCQkJICAg ICAgc2l6ZW9mKHN0cnVjdCBpY21wNmhkciksCi0JCQkgICAgICBobGltaXQsIE5VTEwsICZmbCwg KHN0cnVjdCBydDZfaW5mbyopZHN0LAorCQkJICAgICAgaGxpbWl0LCB0Y2xhc3MsIE5VTEwsICZm bCwgKHN0cnVjdCBydDZfaW5mbyopZHN0LAogCQkJICAgICAgTVNHX0RPTlRXQUlUKTsKIAlpZiAo ZXJyKSB7CiAJCWlwNl9mbHVzaF9wZW5kaW5nX2ZyYW1lcyhzayk7CkBAIC00MzIsNiArNDM1LDcg QEAgc3RhdGljIHZvaWQgaWNtcHY2X2VjaG9fcmVwbHkoc3RydWN0IHNrXwogCXN0cnVjdCBkc3Rf ZW50cnkgKmRzdDsKIAlpbnQgZXJyID0gMDsKIAlpbnQgaGxpbWl0OworCWludCB0Y2xhc3M7CiAK IAlzYWRkciA9ICZza2ItPm5oLmlwdjZoLT5kYWRkcjsKIApAQCAtNDY3LDE1ICs0NzEsMTggQEAg c3RhdGljIHZvaWQgaWNtcHY2X2VjaG9fcmVwbHkoc3RydWN0IHNrXwogCQlobGltaXQgPSBucC0+ aG9wX2xpbWl0OwogCWlmIChobGltaXQgPCAwKQogCQlobGltaXQgPSBkc3RfbWV0cmljKGRzdCwg UlRBWF9IT1BMSU1JVCk7CisJdGNsYXNzID0gbnAtPmNvcmsudGNsYXNzOworCWlmICh0Y2xhc3Mg PCAwKQorCQl0Y2xhc3MgPSAwOwogCiAJaWRldiA9IGluNl9kZXZfZ2V0KHNrYi0+ZGV2KTsKIAog CW1zZy5za2IgPSBza2I7CiAJbXNnLm9mZnNldCA9IDA7CiAKLQllcnIgPSBpcDZfYXBwZW5kX2Rh dGEoc2ssIGljbXB2Nl9nZXRmcmFnLCAmbXNnLCBza2ItPmxlbiArIHNpemVvZihzdHJ1Y3QgaWNt cDZoZHIpLAotCQkJCXNpemVvZihzdHJ1Y3QgaWNtcDZoZHIpLCBobGltaXQsIE5VTEwsICZmbCwK LQkJCQkoc3RydWN0IHJ0Nl9pbmZvKilkc3QsIE1TR19ET05UV0FJVCk7CisJZXJyID0gaXA2X2Fw cGVuZF9kYXRhKHNrLCBpY21wdjZfZ2V0ZnJhZywgJm1zZywgc2tiLT5sZW4gKworCQlzaXplb2Yo c3RydWN0IGljbXA2aGRyKSwgc2l6ZW9mKHN0cnVjdCBpY21wNmhkciksIGhsaW1pdCwKKwkJdGNs YXNzLCBOVUxMLCAmZmwsIChzdHJ1Y3QgcnQ2X2luZm8qKWRzdCwgTVNHX0RPTlRXQUlUKTsKIAog CWlmIChlcnIpIHsKIAkJaXA2X2ZsdXNoX3BlbmRpbmdfZnJhbWVzKHNrKTsKZGlmZiAtcnVOcCBs aW51eC0yLjYuMTEuMTAvbmV0L2lwdjYvaXA2X2Zsb3dsYWJlbC5jIGxpbnV4LTIuNi4xMS4xMFQy L25ldC9pcHY2L2lwNl9mbG93bGFiZWwuYwotLS0gbGludXgtMi42LjExLjEwL25ldC9pcHY2L2lw Nl9mbG93bGFiZWwuYwkyMDA1LTA1LTE2IDEwOjUyOjAwLjAwMDAwMDAwMCAtMDcwMAorKysgbGlu dXgtMi42LjExLjEwVDIvbmV0L2lwdjYvaXA2X2Zsb3dsYWJlbC5jCTIwMDUtMDUtMjQgMTQ6MDQ6 MjguMDAwMDAwMDAwIC0wNzAwCkBAIC0zMTEsNyArMzExLDcgQEAgZmxfY3JlYXRlKHN0cnVjdCBp bjZfZmxvd2xhYmVsX3JlcSAqZnJlcQogCQltc2cubXNnX2NvbnRyb2wgPSAodm9pZCopKGZsLT5v cHQrMSk7CiAJCWZsb3dpLm9pZiA9IDA7CiAKLQkJZXJyID0gZGF0YWdyYW1fc2VuZF9jdGwoJm1z ZywgJmZsb3dpLCBmbC0+b3B0LCAmanVuayk7CisJCWVyciA9IGRhdGFncmFtX3NlbmRfY3RsKCZt c2csICZmbG93aSwgZmwtPm9wdCwgJmp1bmssICZqdW5rKTsKIAkJaWYgKGVycikKIAkJCWdvdG8g ZG9uZTsKIAkJZXJyID0gLUVJTlZBTDsKZGlmZiAtcnVOcCBsaW51eC0yLjYuMTEuMTAvbmV0L2lw djYvaXA2X291dHB1dC5jIGxpbnV4LTIuNi4xMS4xMFQyL25ldC9pcHY2L2lwNl9vdXRwdXQuYwot LS0gbGludXgtMi42LjExLjEwL25ldC9pcHY2L2lwNl9vdXRwdXQuYwkyMDA1LTA1LTE2IDEwOjUy OjAwLjAwMDAwMDAwMCAtMDcwMAorKysgbGludXgtMi42LjExLjEwVDIvbmV0L2lwdjYvaXA2X291 dHB1dC5jCTIwMDUtMDUtMjQgMTQ6NTg6NTEuMDAwMDAwMDAwIC0wNzAwCkBAIC0yMTEsNyArMjEx LDcgQEAgaW50IGlwNl94bWl0KHN0cnVjdCBzb2NrICpzaywgc3RydWN0IHNrXwogCXN0cnVjdCBp cHY2aGRyICpoZHI7CiAJdTggIHByb3RvID0gZmwtPnByb3RvOwogCWludCBzZWdfbGVuID0gc2ti LT5sZW47Ci0JaW50IGhsaW1pdDsKKwlpbnQgaGxpbWl0LCB0Y2xhc3M7CiAJdTMyIG10dTsKIAog CWlmIChvcHQpIHsKQEAgLTI1Myw2ICsyNTMsMTMgQEAgaW50IGlwNl94bWl0KHN0cnVjdCBzb2Nr ICpzaywgc3RydWN0IHNrXwogCQlobGltaXQgPSBucC0+aG9wX2xpbWl0OwogCWlmIChobGltaXQg PCAwKQogCQlobGltaXQgPSBkc3RfbWV0cmljKGRzdCwgUlRBWF9IT1BMSU1JVCk7CisJdGNsYXNz ID0gLTE7CisJaWYgKG5wKQorCQl0Y2xhc3MgPSBucC0+Y29yay50Y2xhc3M7CisJaWYgKHRjbGFz cyA8IDApCisJCXRjbGFzcyA9IDA7CisJaGRyLT5wcmlvcml0eSA9IChucC0+Y29yay50Y2xhc3M+ PjQpICYweGY7CisJaGRyLT5mbG93X2xibFswXSB8PSAobnAtPmNvcmsudGNsYXNzICYgMHhmKTw8 NDsKIAogCWhkci0+cGF5bG9hZF9sZW4gPSBodG9ucyhzZWdfbGVuKTsKIAloZHItPm5leHRoZHIg PSBwcm90bzsKQEAgLTgwNiwxMCArODEzLDExIEBAIG91dF9lcnJfcmVsZWFzZToKIAlyZXR1cm4g ZXJyOwogfQogCi1pbnQgaXA2X2FwcGVuZF9kYXRhKHN0cnVjdCBzb2NrICpzaywgaW50IGdldGZy YWcodm9pZCAqZnJvbSwgY2hhciAqdG8sIGludCBvZmZzZXQsIGludCBsZW4sIGludCBvZGQsIHN0 cnVjdCBza19idWZmICpza2IpLAotCQkgICAgdm9pZCAqZnJvbSwgaW50IGxlbmd0aCwgaW50IHRy YW5zaGRybGVuLAotCQkgICAgaW50IGhsaW1pdCwgc3RydWN0IGlwdjZfdHhvcHRpb25zICpvcHQs IHN0cnVjdCBmbG93aSAqZmwsIHN0cnVjdCBydDZfaW5mbyAqcnQsCi0JCSAgICB1bnNpZ25lZCBp bnQgZmxhZ3MpCitpbnQgaXA2X2FwcGVuZF9kYXRhKHN0cnVjdCBzb2NrICpzaywgaW50IGdldGZy YWcodm9pZCAqZnJvbSwgY2hhciAqdG8sCisJaW50IG9mZnNldCwgaW50IGxlbiwgaW50IG9kZCwg c3RydWN0IHNrX2J1ZmYgKnNrYiksCisJdm9pZCAqZnJvbSwgaW50IGxlbmd0aCwgaW50IHRyYW5z aGRybGVuLAorCWludCBobGltaXQsIGludCB0Y2xhc3MsIHN0cnVjdCBpcHY2X3R4b3B0aW9ucyAq b3B0LCBzdHJ1Y3QgZmxvd2kgKmZsLAorCXN0cnVjdCBydDZfaW5mbyAqcnQsIHVuc2lnbmVkIGlu dCBmbGFncykKIHsKIAlzdHJ1Y3QgaW5ldF9zb2NrICppbmV0ID0gaW5ldF9zayhzayk7CiAJc3Ry dWN0IGlwdjZfcGluZm8gKm5wID0gaW5ldDZfc2soc2spOwpAQCAtODQ3LDYgKzg1NSw3IEBAIGlu dCBpcDZfYXBwZW5kX2RhdGEoc3RydWN0IHNvY2sgKnNrLCBpbnQKIAkJbnAtPmNvcmsucnQgPSBy dDsKIAkJaW5ldC0+Y29yay5mbCA9ICpmbDsKIAkJbnAtPmNvcmsuaG9wX2xpbWl0ID0gaGxpbWl0 OworCQlucC0+Y29yay50Y2xhc3MgPSB0Y2xhc3M7CiAJCWluZXQtPmNvcmsuZnJhZ3NpemUgPSBt dHUgPSBkc3RfcG10dSgmcnQtPnUuZHN0KTsKIAkJaW5ldC0+Y29yay5sZW5ndGggPSAwOwogCQlz ay0+c2tfc25kbXNnX3BhZ2UgPSBOVUxMOwpAQCAtMTEzMCw2ICsxMTM5LDEwIEBAIGludCBpcDZf cHVzaF9wZW5kaW5nX2ZyYW1lcyhzdHJ1Y3Qgc29jayAKIAkKIAkqKHUzMiopaGRyID0gZmwtPmZs Nl9mbG93bGFiZWwgfCBodG9ubCgweDYwMDAwMDAwKTsKIAorCS8qIHRyYWZmaWMgY2xhc3MgKi8K KwloZHItPnByaW9yaXR5ID0gKG5wLT5jb3JrLnRjbGFzcz4+NCkgJiAweGY7CisJaGRyLT5mbG93 X2xibFswXSB8PSAobnAtPmNvcmsudGNsYXNzICYgMHhmKTw8NDsKKwogCWlmIChza2ItPmxlbiA8 PSBzaXplb2Yoc3RydWN0IGlwdjZoZHIpICsgSVBWNl9NQVhQTEVOKQogCQloZHItPnBheWxvYWRf bGVuID0gaHRvbnMoc2tiLT5sZW4gLSBzaXplb2Yoc3RydWN0IGlwdjZoZHIpKTsKIAllbHNlCmRp ZmYgLXJ1TnAgbGludXgtMi42LjExLjEwL25ldC9pcHY2L2lwdjZfc29ja2dsdWUuYyBsaW51eC0y LjYuMTEuMTBUMi9uZXQvaXB2Ni9pcHY2X3NvY2tnbHVlLmMKLS0tIGxpbnV4LTIuNi4xMS4xMC9u ZXQvaXB2Ni9pcHY2X3NvY2tnbHVlLmMJMjAwNS0wNS0xNiAxMDo1MjowMC4wMDAwMDAwMDAgLTA3 MDAKKysrIGxpbnV4LTIuNi4xMS4xMFQyL25ldC9pcHY2L2lwdjZfc29ja2dsdWUuYwkyMDA1LTA2 LTA2IDExOjUyOjE1LjAwMDAwMDAwMCAtMDcwMApAQCAtMjA4LDMzICsyMDgsMzggQEAgaW50IGlw djZfc2V0c29ja29wdChzdHJ1Y3Qgc29jayAqc2ssIGludAogCQlyZXR2ID0gMDsKIAkJYnJlYWs7 CiAKLQljYXNlIElQVjZfUEtUSU5GTzoKKwljYXNlIElQVjZfUkVDVlBLVElORk86CiAJCW5wLT5y eG9wdC5iaXRzLnJ4aW5mbyA9IHZhbGJvb2w7CiAJCXJldHYgPSAwOwogCQlicmVhazsKIAotCWNh c2UgSVBWNl9IT1BMSU1JVDoKKwljYXNlIElQVjZfUkVDVkhPUExJTUlUOgogCQlucC0+cnhvcHQu Yml0cy5yeGhsaW0gPSB2YWxib29sOwogCQlyZXR2ID0gMDsKIAkJYnJlYWs7CiAKLQljYXNlIElQ VjZfUlRIRFI6CisJY2FzZSBJUFY2X1JFQ1ZSVEhEUjoKIAkJaWYgKHZhbCA8IDAgfHwgdmFsID4g MikKIAkJCWdvdG8gZV9pbnZhbDsKIAkJbnAtPnJ4b3B0LmJpdHMuc3JjcnQgPSB2YWw7CiAJCXJl dHYgPSAwOwogCQlicmVhazsKIAotCWNhc2UgSVBWNl9IT1BPUFRTOgorCWNhc2UgSVBWNl9SRUNW SE9QT1BUUzoKIAkJbnAtPnJ4b3B0LmJpdHMuaG9wb3B0cyA9IHZhbGJvb2w7CiAJCXJldHYgPSAw OwogCQlicmVhazsKIAotCWNhc2UgSVBWNl9EU1RPUFRTOgorCWNhc2UgSVBWNl9SRUNWRFNUT1BU UzoKIAkJbnAtPnJ4b3B0LmJpdHMuZHN0b3B0cyA9IHZhbGJvb2w7CiAJCXJldHYgPSAwOwogCQli cmVhazsKIAorCWNhc2UgSVBWNl9SRUNWVENMQVNTOgorCQlucC0+cnhvcHQuYml0cy5yeHRjbGFz cyA9IHZhbGJvb2w7CisJCXJldHYgPSAwOworCQlicmVhazsKKwogCWNhc2UgSVBWNl9GTE9XSU5G TzoKIAkJbnAtPnJ4b3B0LmJpdHMucnhmbG93ID0gdmFsYm9vbDsKIAkJcmV0diA9IDA7CkBAIC0y NzQsNyArMjc5LDcgQEAgaW50IGlwdjZfc2V0c29ja29wdChzdHJ1Y3Qgc29jayAqc2ssIGludAog CQltc2cubXNnX2NvbnRyb2xsZW4gPSBvcHRsZW47CiAJCW1zZy5tc2dfY29udHJvbCA9ICh2b2lk Kikob3B0KzEpOwogCi0JCXJldHYgPSBkYXRhZ3JhbV9zZW5kX2N0bCgmbXNnLCAmZmwsIG9wdCwg Jmp1bmspOworCQlyZXR2ID0gZGF0YWdyYW1fc2VuZF9jdGwoJm1zZywgJmZsLCBvcHQsICZqdW5r LCAmanVuayk7CiAJCWlmIChyZXR2KQogCQkJZ290byBkb25lOwogdXBkYXRlOgpAQCAtNjIwLDI2 ICs2MjUsMzAgQEAgaW50IGlwdjZfZ2V0c29ja29wdChzdHJ1Y3Qgc29jayAqc2ssIGludAogCQl2 YWwgPSBucC0+aXB2Nm9ubHk7CiAJCWJyZWFrOwogCi0JY2FzZSBJUFY2X1BLVElORk86CisJY2Fz ZSBJUFY2X1JFQ1ZQS1RJTkZPOgogCQl2YWwgPSBucC0+cnhvcHQuYml0cy5yeGluZm87CiAJCWJy ZWFrOwogCi0JY2FzZSBJUFY2X0hPUExJTUlUOgorCWNhc2UgSVBWNl9SRUNWSE9QTElNSVQ6CiAJ CXZhbCA9IG5wLT5yeG9wdC5iaXRzLnJ4aGxpbTsKIAkJYnJlYWs7CiAKLQljYXNlIElQVjZfUlRI RFI6CisJY2FzZSBJUFY2X1JFQ1ZSVEhEUjoKIAkJdmFsID0gbnAtPnJ4b3B0LmJpdHMuc3JjcnQ7 CiAJCWJyZWFrOwogCi0JY2FzZSBJUFY2X0hPUE9QVFM6CisJY2FzZSBJUFY2X1JFQ1ZIT1BPUFRT OgogCQl2YWwgPSBucC0+cnhvcHQuYml0cy5ob3BvcHRzOwogCQlicmVhazsKIAotCWNhc2UgSVBW Nl9EU1RPUFRTOgorCWNhc2UgSVBWNl9SRUNWRFNUT1BUUzoKIAkJdmFsID0gbnAtPnJ4b3B0LmJp dHMuZHN0b3B0czsKIAkJYnJlYWs7CiAKKwljYXNlIElQVjZfUkVDVlRDTEFTUzoKKwkJdmFsID0g bnAtPnJ4b3B0LmJpdHMucnh0Y2xhc3M7CisJCWJyZWFrOworCiAJY2FzZSBJUFY2X0ZMT1dJTkZP OgogCQl2YWwgPSBucC0+cnhvcHQuYml0cy5yeGZsb3c7CiAJCWJyZWFrOwpkaWZmIC1ydU5wIGxp bnV4LTIuNi4xMS4xMC9uZXQvaXB2Ni9yYXcuYyBsaW51eC0yLjYuMTEuMTBUMi9uZXQvaXB2Ni9y YXcuYwotLS0gbGludXgtMi42LjExLjEwL25ldC9pcHY2L3Jhdy5jCTIwMDUtMDUtMTYgMTA6NTI6 MDAuMDAwMDAwMDAwIC0wNzAwCisrKyBsaW51eC0yLjYuMTEuMTBUMi9uZXQvaXB2Ni9yYXcuYwky MDA1LTA1LTI0IDE1OjA5OjQyLjAwMDAwMDAwMCAtMDcwMApAQCAtNjE3LDYgKzYxNyw3IEBAIHN0 YXRpYyBpbnQgcmF3djZfc2VuZG1zZyhzdHJ1Y3Qga2lvY2IgKmkKIAlzdHJ1Y3QgZmxvd2kgZmw7 CiAJaW50IGFkZHJfbGVuID0gbXNnLT5tc2dfbmFtZWxlbjsKIAlpbnQgaGxpbWl0ID0gLTE7CisJ aW50IHRjbGFzcyA9IC0xOwogCXUxNiBwcm90bzsKIAlpbnQgZXJyOwogCkBAIC03MDIsNyArNzAz LDcgQEAgc3RhdGljIGludCByYXd2Nl9zZW5kbXNnKHN0cnVjdCBraW9jYiAqaQogCQltZW1zZXQo b3B0LCAwLCBzaXplb2Yoc3RydWN0IGlwdjZfdHhvcHRpb25zKSk7CiAJCW9wdC0+dG90X2xlbiA9 IHNpemVvZihzdHJ1Y3QgaXB2Nl90eG9wdGlvbnMpOwogCi0JCWVyciA9IGRhdGFncmFtX3NlbmRf Y3RsKG1zZywgJmZsLCBvcHQsICZobGltaXQpOworCQllcnIgPSBkYXRhZ3JhbV9zZW5kX2N0bCht c2csICZmbCwgb3B0LCAmaGxpbWl0LCAmdGNsYXNzKTsKIAkJaWYgKGVyciA8IDApIHsKIAkJCWZs Nl9zb2NrX3JlbGVhc2UoZmxvd2xhYmVsKTsKIAkJCXJldHVybiBlcnI7CkBAIC03NTgsNiArNzU5 LDEyIEBAIHN0YXRpYyBpbnQgcmF3djZfc2VuZG1zZyhzdHJ1Y3Qga2lvY2IgKmkKIAkJCWhsaW1p dCA9IGRzdF9tZXRyaWMoZHN0LCBSVEFYX0hPUExJTUlUKTsKIAl9CiAKKwlpZiAodGNsYXNzIDwg MCkgeworCQl0Y2xhc3MgPSBucC0+Y29yay50Y2xhc3M7CisJCWlmICh0Y2xhc3MgPCAwKQorCQkJ dGNsYXNzID0gMDsKKwl9CisKIAlpZiAobXNnLT5tc2dfZmxhZ3MmTVNHX0NPTkZJUk0pCiAJCWdv dG8gZG9fY29uZmlybTsKIApAQCAtNzY2LDggKzc3Myw5IEBAIGJhY2tfZnJvbV9jb25maXJtOgog CQllcnIgPSByYXd2Nl9zZW5kX2hkcmluYyhzaywgbXNnLT5tc2dfaW92LCBsZW4sICZmbCwgKHN0 cnVjdCBydDZfaW5mbyopZHN0LCBtc2ctPm1zZ19mbGFncyk7CiAJfSBlbHNlIHsKIAkJbG9ja19z b2NrKHNrKTsKLQkJZXJyID0gaXA2X2FwcGVuZF9kYXRhKHNrLCBpcF9nZW5lcmljX2dldGZyYWcs IG1zZy0+bXNnX2lvdiwgbGVuLCAwLAotCQkJCQlobGltaXQsIG9wdCwgJmZsLCAoc3RydWN0IHJ0 Nl9pbmZvKilkc3QsIG1zZy0+bXNnX2ZsYWdzKTsKKwkJZXJyID0gaXA2X2FwcGVuZF9kYXRhKHNr LCBpcF9nZW5lcmljX2dldGZyYWcsIG1zZy0+bXNnX2lvdiwKKwkJCWxlbiwgMCwgaGxpbWl0LCB0 Y2xhc3MsIG9wdCwgJmZsLCAoc3RydWN0IHJ0Nl9pbmZvKilkc3QsCisJCQltc2ctPm1zZ19mbGFn cyk7CiAKIAkJaWYgKGVycikKIAkJCWlwNl9mbHVzaF9wZW5kaW5nX2ZyYW1lcyhzayk7CmRpZmYg LXJ1TnAgbGludXgtMi42LjExLjEwL25ldC9pcHY2L3VkcC5jIGxpbnV4LTIuNi4xMS4xMFQyL25l dC9pcHY2L3VkcC5jCi0tLSBsaW51eC0yLjYuMTEuMTAvbmV0L2lwdjYvdWRwLmMJMjAwNS0wNS0x NiAxMDo1MjowMC4wMDAwMDAwMDAgLTA3MDAKKysrIGxpbnV4LTIuNi4xMS4xMFQyL25ldC9pcHY2 L3VkcC5jCTIwMDUtMDUtMjQgMTU6MTE6NTguMDAwMDAwMDAwIC0wNzAwCkBAIC02MzcsNiArNjM3 LDcgQEAgc3RhdGljIGludCB1ZHB2Nl9zZW5kbXNnKHN0cnVjdCBraW9jYiAqaQogCWludCBhZGRy X2xlbiA9IG1zZy0+bXNnX25hbWVsZW47CiAJaW50IHVsZW4gPSBsZW47CiAJaW50IGhsaW1pdCA9 IC0xOworCWludCB0Y2xhc3MgPSAtMTsKIAlpbnQgY29ya3JlcSA9IHVwLT5jb3JrZmxhZyB8fCBt c2ctPm1zZ19mbGFncyZNU0dfTU9SRTsKIAlpbnQgZXJyOwogCkBAIC03NTgsNyArNzU5LDcgQEAg ZG9fdWRwX3NlbmRtc2c6CiAJCW1lbXNldChvcHQsIDAsIHNpemVvZihzdHJ1Y3QgaXB2Nl90eG9w dGlvbnMpKTsKIAkJb3B0LT50b3RfbGVuID0gc2l6ZW9mKCpvcHQpOwogCi0JCWVyciA9IGRhdGFn cmFtX3NlbmRfY3RsKG1zZywgZmwsIG9wdCwgJmhsaW1pdCk7CisJCWVyciA9IGRhdGFncmFtX3Nl bmRfY3RsKG1zZywgZmwsIG9wdCwgJmhsaW1pdCwgJnRjbGFzcyk7CiAJCWlmIChlcnIgPCAwKSB7 CiAJCQlmbDZfc29ja19yZWxlYXNlKGZsb3dsYWJlbCk7CiAJCQlyZXR1cm4gZXJyOwpAQCAtODEy LDYgKzgxMywxMSBAQCBkb191ZHBfc2VuZG1zZzoKIAkJaWYgKGhsaW1pdCA8IDApCiAJCQlobGlt aXQgPSBkc3RfbWV0cmljKGRzdCwgUlRBWF9IT1BMSU1JVCk7CiAJfQorCWlmICh0Y2xhc3MgPCAw KSB7CisJCXRjbGFzcyA9IG5wLT5jb3JrLnRjbGFzczsKKwkJaWYgKHRjbGFzcyA8IDApCisJCQl0 Y2xhc3MgPSAwOworCX0KIAogCWlmIChtc2ctPm1zZ19mbGFncyZNU0dfQ09ORklSTSkKIAkJZ290 byBkb19jb25maXJtOwpAQCAtODMyLDkgKzgzOCwxMCBAQCBiYWNrX2Zyb21fY29uZmlybToKIAog ZG9fYXBwZW5kX2RhdGE6CiAJdXAtPmxlbiArPSB1bGVuOwotCWVyciA9IGlwNl9hcHBlbmRfZGF0 YShzaywgaXBfZ2VuZXJpY19nZXRmcmFnLCBtc2ctPm1zZ19pb3YsIHVsZW4sIHNpemVvZihzdHJ1 Y3QgdWRwaGRyKSwKLQkJCSAgICAgIGhsaW1pdCwgb3B0LCBmbCwgKHN0cnVjdCBydDZfaW5mbyop ZHN0LAotCQkJICAgICAgY29ya3JlcSA/IG1zZy0+bXNnX2ZsYWdzfE1TR19NT1JFIDogbXNnLT5t c2dfZmxhZ3MpOworCWVyciA9IGlwNl9hcHBlbmRfZGF0YShzaywgaXBfZ2VuZXJpY19nZXRmcmFn LCBtc2ctPm1zZ19pb3YsIHVsZW4sCisJCXNpemVvZihzdHJ1Y3QgdWRwaGRyKSwgaGxpbWl0LCB0 Y2xhc3MsIG9wdCwgZmwsCisJCShzdHJ1Y3QgcnQ2X2luZm8qKWRzdCwKKwkJY29ya3JlcSA/IG1z Zy0+bXNnX2ZsYWdzfE1TR19NT1JFIDogbXNnLT5tc2dfZmxhZ3MpOwogCWlmIChlcnIpCiAJCXVk cF92Nl9mbHVzaF9wZW5kaW5nX2ZyYW1lcyhzayk7CiAJZWxzZSBpZiAoIWNvcmtyZXEpCg== --=_mixed 006CCE2C88257018_=-- From tgraf@suug.ch Mon Jun 6 12:55:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 12:55:08 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56Jt3Xq027261 for ; Mon, 6 Jun 2005 12:55:04 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 41DBE1C0EF; Mon, 6 Jun 2005 21:54:22 +0200 (CEST) Date: Mon, 6 Jun 2005 21:54:22 +0200 From: Thomas Graf To: Teemu Koponen Cc: netdev@oss.sgi.com Subject: Re: New address announcements in RTMGRP_IPV4_IFADDR netlink group Message-ID: <20050606195422.GJ15391@postel.suug.ch> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 2141 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1179 Lines: 23 * Teemu Koponen 2005-06-06 11:59 > 0) A userspace daemon process is running and listening to the broadcast > group. > > 1) Address is inserted to an interface (ip addr add ... at shell). > > 2) The daemon receives a NEWADDR message, just as is should, but the > daemon is unable to bind to the address *immediately* (actually in the > function that processes the netlink message). The result is "cannot > assign an address" from the bind call. However, if I do insert a single > nanosleep, even with an arbitrary low sleep value, before the bind > call, the bind then succeeds. > > So, what is the semantics of NEWADDR? Should the address be bindable > right after receiving the message? The bind() call doesn't fail because of the address being non-existant, it fails because the route has not been created for it. The netlink message is generated before we notify the other subsystems about the addition of a new address so you try to bind to an adress for which no route has been generated yet. The best solution is probably to wait for the route addition notification message being received and then bind to that address. From john.ronciak@intel.com Mon Jun 6 13:32:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 13:32:38 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56KWRXq030063 for ; Mon, 6 Jun 2005 13:32:27 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j56KTtBi012037; Mon, 6 Jun 2005 20:29:55 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j56KTLUB028765; Mon, 6 Jun 2005 20:29:52 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060613295112784 ; Mon, 06 Jun 2005 13:29:51 -0700 Received: from orsmsx408.amr.corp.intel.com ([192.168.65.52]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 6 Jun 2005 13:29:51 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: RFC: NAPI packet weighting patch Date: Mon, 6 Jun 2005 13:29:50 -0700 Message-ID: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: RFC: NAPI packet weighting patch Thread-Index: AcVq0KInJf0GW6tXQUu/BgSh9BGIAQABPK1A From: "Ronciak, John" To: "David S. Miller" Cc: , , , "Williams, Mitch A" , , , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" X-OriginalArrivalTime: 06 Jun 2005 20:29:51.0695 (UTC) FILETIME=[7DDA95F0:01C56AD6] X-Scanned-By: MIMEDefang 2.44 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j56KWRXq030063 X-archive-position: 2142 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john.ronciak@intel.com Precedence: bulk X-list: netdev Content-Length: 757 Lines: 22 > If you force the e1000 driver to do RX replenishment every N > packets it should reduce the packet drops the same (in the > single NIC case) as if you reduced the dev->weight to that > same value N. But this isn't what we are seeing. Even if we just reduce the weight value to 32 from 64, all of the drops go away. So there seems to be other things affecting this. We are just talking about single NIC testing at this point. I agree that single and multi-NIC results different issues and we will need to test this as well with whatever we come up with out of this. I also like your idea about the weight value being adjusted based on real work done using some measurable metric. This seems like a good path to explore as well. Cheers, John From mchan@broadcom.com Mon Jun 6 13:41:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 13:41:24 -0700 (PDT) Received: from MMS1.broadcom.com (mms1.broadcom.com [216.31.210.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56KfJXq030916 for ; Mon, 6 Jun 2005 13:41:20 -0700 Received: from 10.10.64.121 by MMS1.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Mon, 06 Jun 2005 13:39:58 -0700 X-Server-Uuid: 146C3151-C1DE-4F71-9D02-C3BE503878DD Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Mon, 6 Jun 2005 13:39:56 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BCM96590; Mon, 6 Jun 2005 13:39:53 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id NAA27549; Mon, 6 Jun 2005 13:39:53 -0700 (PDT) Received: from 10.7.18.143 ([10.7.18.143]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Mon, 6 Jun 2005 20:39:52 +0000 Received: from rh4 by nt-irva-0741; 06 Jun 2005 12:42:22 -0700 Subject: [PATCH] tg3: Fix link failure in 5701 From: "Michael Chan" To: davem@davemloft.net cc: iod00d@hp.com, peterc@gelato.unsw.edu.au, netdev@oss.sgi.com Date: Mon, 06 Jun 2005 12:42:22 -0700 Message-ID: <1118086942.5008.14.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EBA6B172U46020352-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2143 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 1309 Lines: 41 On some 5701 devices with older bootcode, the LED configuration bits in SRAM may be invalid with value zero. The fix is to check for invalid bits (0) and default to PHY 1 mode. Incorrect LED mode will lead to error in programming the PHY. Thanks to Grant Grundler for debugging the problem. >From Grant: | In May, 2004, tg3 v3.4 changed how MAC_LED_CTRL (0x40c) was getting | programmed and how to determine what to program into LED_CTRL. The new | code trusted NIC_SRAM_DATA_CFG (0x00000b58) to indicate what to write | to LED_CTRL and MII EXT_CTRL registers. On "IOX Core Lan", SRAM was | saying MODE_MAC (0x0) and that doesn't work. Signed-off-by: Michael Chan diff -Nru led1/drivers/net/tg3.c led2/drivers/net/tg3.c --- led1/drivers/net/tg3.c 2005-06-06 10:19:56.692541944 -0700 +++ led2/drivers/net/tg3.c 2005-06-06 10:34:49.251852304 -0700 @@ -8555,6 +8555,16 @@ case NIC_SRAM_DATA_CFG_LED_MODE_MAC: tp->led_ctrl = LED_CTRL_MODE_MAC; + + /* Default to PHY_1_MODE if 0 (MAC_MODE) is + * read on some older 5700/5701 bootcode. + */ + if (GET_ASIC_REV(tp->pci_chip_rev_id) == + ASIC_REV_5700 || + GET_ASIC_REV(tp->pci_chip_rev_id) == + ASIC_REV_5701) + tp->led_ctrl = LED_CTRL_MODE_PHY_1; + break; case SHASTA_EXT_LED_SHARED: From rmk+netdev=oss.sgi.com@arm.linux.org.uk Mon Jun 6 14:48:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 14:48:55 -0700 (PDT) Received: from caramon.arm.linux.org.uk (caramon.arm.linux.org.uk [212.18.232.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56LmjXq001817 for ; Mon, 6 Jun 2005 14:48:48 -0700 Received: from flint.arm.linux.org.uk ([2002:d412:e8ba:1:201:2ff:fe14:8fad]) by caramon.arm.linux.org.uk with asmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.41) id 1DfPRS-0000AV-TP for netdev@oss.sgi.com; Mon, 06 Jun 2005 22:47:31 +0100 Received: from rmk by flint.arm.linux.org.uk with local (Exim 4.41) id 1DfPRR-0000Zt-S2 for netdev@oss.sgi.com; Mon, 06 Jun 2005 22:47:29 +0100 Date: Mon, 6 Jun 2005 22:47:29 +0100 From: Russell King To: netdev@oss.sgi.com Subject: Fwd: [Bug 4615] Modem connection stalls out. Message-ID: <20050606224729.B12034@flint.arm.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i X-archive-position: 2144 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rmk@arm.linux.org.uk Precedence: bulk X-list: netdev Content-Length: 3648 Lines: 116 Anyone have any ideas on this bug? The "No buffer space available" looks like the system is running low on memory. Would networking folk concur with that? ----- Forwarded message from bugme-daemon@kernel-bugs.osdl.org ----- Date: Mon, 6 Jun 2005 09:19:47 -0700 From: bugme-daemon@kernel-bugs.osdl.org To: rmk@arm.linux.org.uk Subject: [Bug 4615] Modem connection stalls out. http://bugzilla.kernel.org/show_bug.cgi?id=4615 ------- Additional Comments From alangrimes@starpower.net 2005-06-06 09:19 ------- The only reliable feedback I get from the bug, asside from its obvious symptoms, is through ping... Here is a typical output: 64 bytes from 10.65.28.26: icmp_seq=296 ttl=255 time=2552 ms 64 bytes from 10.65.28.26: icmp_seq=297 ttl=255 time=1561 ms 64 bytes from 10.65.28.26: icmp_seq=298 ttl=255 time=567 ms 64 bytes from 10.65.28.26: icmp_seq=299 ttl=255 time=137 ms 64 bytes from 10.65.28.26: icmp_seq=300 ttl=255 time=484 ms # Hmm, exactly 5 ## minutes, though I've seen it quit after only 10 seconds...) ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ## It would have continued repeating this message indefinitely... ## Note: many more iterations have been removed from this report!!! =P ## Below is what happens when I manually disconnect the modem by sending ## the break signal to the dialer. ping: sendmsg: Network is unreachable ping: sendmsg: Network is unreachable ping: sendmsg: Network is unreachable ping: sendmsg: Network is unreachable --- 10.65.28.26 ping statistics --- 337 packets transmitted, 300 received, 10% packet loss, time 35529 5ms rtt min/avg/max/mdev = 122.973/687.037/6298.163/1093.624 ms, pipe 7 ################################### After power cycling the modem here's what the dialer does: leenooks ~ # wvdial --> WvDial: Internet dialer version 1.54.0 --> Initializing modem. --> Sending: ATZ --> Sending: ATQ0 --> Re-Sending: ATZ ### the dialer is hung and will report that the modem is not responding in a ### few seconds... ### At this point I could ctrl-break the dialer and try again, ### However, this would be entirely unproductive as I'd get the same mesage ### each and every time. ### Only by allowing it to complete its cycle, will it return the modem to ### functionality. I suspect that the dialer sends an IOCTL or something to the ### driver which clears the fault... --> Modem not responding. leenooks ~ # wvdial --> WvDial: Internet dialer version 1.54.0 --> Initializing modem. --> Sending: ATZ ATZ OK --> Sending: AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=60 AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=60 OK --> Modem initialized. --> Sending: ATDT7038298111 --> Waiting for carrier. ATDT7038298111 CONNECT 49333 --> Carrier detected. Waiting for prompt. ** Ascend TNT2.LNHVA.MD.RCN.NET Terminal Server ** Login: --> Looks like a login prompt. --> Sending: alangrimes alangrimes Password: --> Looks like a password prompt. --> Sending: (password) Entering PPP Session. IP address is 66.44.56.212 MTU is 1006. --> Looks like a welcome message. --> Starting pppd at Tue Jun 7 04:58:30 2005 --> pid of pppd: 19733 --> Using interface ppp0 --> local IP address 66.44.56.212 --> remote IP address 10.65.28.27 --> primary DNS address 207.172.3.10 --> secondary DNS address 207.172.3.11 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ----- End forwarded message ----- -- Russell King From davem@davemloft.net Mon Jun 6 15:18:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 15:18:04 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56MHuXq004137 for ; Mon, 6 Jun 2005 15:18:01 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfPth-0007N4-Rl; Mon, 06 Jun 2005 15:16:41 -0700 Date: Mon, 06 Jun 2005 15:16:41 -0700 (PDT) Message-Id: <20050606.151641.95895557.davem@davemloft.net> To: mchan@broadcom.com Cc: iod00d@hp.com, peterc@gelato.unsw.edu.au, netdev@oss.sgi.com Subject: Re: [PATCH] tg3: Fix link failure in 5701 From: "David S. Miller" In-Reply-To: <1118086942.5008.14.camel@rh4> References: <1118086942.5008.14.camel@rh4> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2145 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 409 Lines: 11 From: "Michael Chan" Date: Mon, 06 Jun 2005 12:42:22 -0700 > On some 5701 devices with older bootcode, the LED configuration bits in > SRAM may be invalid with value zero. The fix is to check for invalid > bits (0) and default to PHY 1 mode. Incorrect LED mode will lead to > error in programming the PHY. > > Thanks to Grant Grundler for debugging the problem. Applied, thanks a log. From davem@davemloft.net Mon Jun 6 15:28:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 15:28:40 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56MSaXq005215 for ; Mon, 6 Jun 2005 15:28:36 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfQ42-0007Qq-SG; Mon, 06 Jun 2005 15:27:22 -0700 Date: Mon, 06 Jun 2005 15:27:22 -0700 (PDT) Message-Id: <20050606.152722.31644561.davem@davemloft.net> To: iod00d@hp.com Cc: mchan@broadcom.com, peterc@gelato.unsw.edu.au, netdev@oss.sgi.com Subject: Re: [PATCH] tg3: Fix link failure in 5701 From: "David S. Miller" In-Reply-To: <20050606222631.GE12068@esmail.cup.hp.com> References: <1118086942.5008.14.camel@rh4> <20050606.151641.95895557.davem@davemloft.net> <20050606222631.GE12068@esmail.cup.hp.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2146 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 479 Lines: 15 From: Grant Grundler Date: Mon, 6 Jun 2005 15:26:31 -0700 > Btw, where can I see which version of tg3 will get this fix? > > I'm certainly I'll be asked the question "which tg3 version > is required" more than the few times. It will be version "3.30" with release date "June 6, 2005" I will push it to Linus as soon as the kernel.org mirror system picks it up from my GIT tree at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/tg3-2.6.git/ From davem@davemloft.net Mon Jun 6 15:30:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 15:30:30 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56MUOXq005664 for ; Mon, 6 Jun 2005 15:30:24 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfQ5p-0007RL-B8; Mon, 06 Jun 2005 15:29:13 -0700 Date: Mon, 06 Jun 2005 15:29:13 -0700 (PDT) Message-Id: <20050606.152913.88479223.davem@davemloft.net> To: tgraf@suug.ch Cc: netdev@oss.sgi.com Subject: Re: [PATCHSET] PKT_SCHED related fixes and a meta ematch completion From: "David S. Miller" In-Reply-To: <20050603211241.593114000@axs> References: <20050603211241.593114000@axs> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2147 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 509 Lines: 13 From: Thomas Graf Date: Fri, 03 Jun 2005 23:12:41 +0200 > The following patchset fixes some serious bugs that prevent > the basic classifier and the meta ematch from working properly. > Patch 2 adds a few new meta collectors for socket attribtues which > I'd like to have in 2.6.12 as well. If you think this is too > intrusive (it isn't ;->) I'll resend patch 4 with offsets fixed. I'll try to get these 4 patches into 2.6.12, they all look straight forward and sane to me. Thanks Thomas. From davem@davemloft.net Mon Jun 6 15:37:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 15:37:38 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56MbUXq006468 for ; Mon, 6 Jun 2005 15:37:30 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfQCQ-0007Rs-D4; Mon, 06 Jun 2005 15:36:02 -0700 Date: Mon, 06 Jun 2005 15:36:02 -0700 (PDT) Message-Id: <20050606.153602.23015220.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: johnpol@2ka.mipt.ru, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag From: "David S. Miller" In-Reply-To: <20050604103249.GA1378@gondor.apana.org.au> References: <20050604102204.GA1214@gondor.apana.org.au> <20050604142939.4e2efc55@zanzibar.2ka.mipt.ru> <20050604103249.GA1378@gondor.apana.org.au> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2148 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 597 Lines: 17 From: Herbert Xu Date: Sat, 4 Jun 2005 20:32:49 +1000 > On Sat, Jun 04, 2005 at 02:29:39PM +0400, Evgeniy Polyakov wrote: > > > > But without sg we sill save 4*sizeof(dma addr) - is it really a price? > > We're also reducing the offset/length to 16 bits from 32 bits so we're > shaving off half the size. Note, it is still going to be a 16 byte structure on 64-bit machines. This is mainly due to the 8-byte alignment needed by the page pointer. I'm not objecting to your ideas, just mentioning this fact... I am also puzzled as to where the "4" came from :-) From mporter@kernel.crashing.org Mon Jun 6 15:41:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 15:41:30 -0700 (PDT) Received: from zipcode.az.mvista.com (rav-az.mvista.com [65.200.49.157]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56MfMXq007169 for ; Mon, 6 Jun 2005 15:41:23 -0700 Received: from beef.az.mvista.com (root@beef.az.mvista.com [10.50.1.96]) by zipcode.az.mvista.com (8.9.3/8.9.3) with ESMTP id QAA27950; Mon, 6 Jun 2005 16:13:11 -0700 Received: from beef (mporter@localhost [127.0.0.1]) by beef.az.mvista.com (8.12.11/8.12.11/Debian-1) with SMTP id j56MeFF3004693; Mon, 6 Jun 2005 15:40:15 -0700 Cc: shemminger@osdl.org, linuxppc-embedded@ozlabs.org, netdev@oss.sgi.com Subject: [PATCH][5/5] RapidIO support: net driver In-Reply-To: <11180976151080@foobar.com> X-Mailer: gregkh_patchbomb Date: Mon, 6 Jun 2005 15:40:15 -0700 Message-Id: <1118097615222@foobar.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Reply-To: Matt Porter To: linux-kernel@vger.kernel.org From: Matt Porter Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j56MfMXq007169 X-archive-position: 2149 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mporter@kernel.crashing.org Precedence: bulk X-list: netdev Content-Length: 16772 Lines: 645 Adds an "Ethernet" driver which sends Ethernet packets over the standard RapidIO messaging. This depends on the core RIO patch for mailbox/doorbell access. Signed-off-by: Matt Porter diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2185,6 +2185,20 @@ config ISERIES_VETH tristate "iSeries Virtual Ethernet driver support" depends on NETDEVICES && PPC_ISERIES +config RIONET + tristate "RapidIO Ethernet over messaging driver support" + depends on NETDEVICES && RAPIDIO + +config RIONET_TX_SIZE + int "Number of outbound queue entries" + depends on RIONET + default "128" + +config RIONET_RX_SIZE + int "Number of inbound queue entries" + depends on RIONET + default "128" + config FDDI bool "FDDI driver support" depends on NETDEVICES && (PCI || EISA) diff --git a/drivers/net/Makefile b/drivers/net/Makefile --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_SKFP) += skfp/ obj-$(CONFIG_VIA_RHINE) += via-rhine.o obj-$(CONFIG_VIA_VELOCITY) += via-velocity.o obj-$(CONFIG_ADAPTEC_STARFIRE) += starfire.o +obj-$(CONFIG_RIONET) += rionet.o # # end link order section diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c new file mode 100644 --- /dev/null +++ b/drivers/net/rionet.c @@ -0,0 +1,597 @@ +/* + * rionet - Ethernet driver over RapidIO messaging services + * + * Copyright 2005 MontaVista Software, Inc. + * Matt Porter + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#define DRV_NAME "rionet" +#define DRV_VERSION "0.1" +#define DRV_AUTHOR "Matt Porter " +#define DRV_DESC "Ethernet over RapidIO" + +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); +MODULE_LICENSE("GPL"); + +#define RIONET_DEFAULT_MSGLEVEL 0 +#define RIONET_DOORBELL_JOIN 0x1000 +#define RIONET_DOORBELL_LEAVE 0x1001 + +#define RIONET_MAILBOX 0 + +#define RIONET_TX_RING_SIZE CONFIG_RIONET_TX_SIZE +#define RIONET_RX_RING_SIZE CONFIG_RIONET_RX_SIZE + +static LIST_HEAD(rionet_peers); + +struct rionet_private { + struct rio_mport *mport; + struct sk_buff *rx_skb[RIONET_RX_RING_SIZE]; + struct sk_buff *tx_skb[RIONET_TX_RING_SIZE]; + struct net_device_stats stats; + int rx_slot; + int tx_slot; + int tx_cnt; + int ack_slot; + spinlock_t lock; + spinlock_t tx_lock; + u32 msg_enable; +}; + +struct rionet_peer { + struct list_head node; + struct rio_dev *rdev; + struct resource *res; +}; + +static int rionet_check = 0; +static int rionet_capable = 1; +static struct net_device *sndev = NULL; + +/* + * This is a fast lookup table for for translating TX + * Ethernet packets into a destination RIO device. It + * could be made into a hash table to save memory depending + * on system trade-offs. + */ +static struct rio_dev *rionet_active[RIO_MAX_ROUTE_ENTRIES]; + +#define is_rionet_capable(pef, src_ops, dst_ops) \ + ((pef & RIO_PEF_INB_MBOX) && \ + (pef & RIO_PEF_INB_DOORBELL) && \ + (src_ops & RIO_SRC_OPS_DOORBELL) && \ + (dst_ops & RIO_DST_OPS_DOORBELL)) +#define dev_rionet_capable(dev) \ + is_rionet_capable(dev->pef, dev->src_ops, dev->dst_ops) + +#define RIONET_MAC_MATCH(x) (*(u32 *)x == 0x00010001) +#define RIONET_GET_DESTID(x) (*(u16 *)(x + 4)) + +static struct net_device_stats *rionet_stats(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + return &rnet->stats; +} + +static int rionet_rx_clean(struct net_device *ndev) +{ + int i; + int error = 0; + struct rionet_private *rnet = ndev->priv; + void *data; + + i = rnet->rx_slot; + + do { + if (!rnet->rx_skb[i]) { + rnet->stats.rx_dropped++; + continue; + } + + if (!(data = rio_get_inb_message(rnet->mport, RIONET_MAILBOX))) + break; + + rnet->rx_skb[i]->data = data; + skb_put(rnet->rx_skb[i], RIO_MAX_MSG_SIZE); + rnet->rx_skb[i]->dev = ndev; + rnet->rx_skb[i]->protocol = + eth_type_trans(rnet->rx_skb[i], ndev); + error = netif_rx(rnet->rx_skb[i]); + + if (error == NET_RX_DROP) { + rnet->stats.rx_dropped++; + } else if (error == NET_RX_BAD) { + if (netif_msg_rx_err(rnet)) + printk(KERN_WARNING "%s: bad rx packet\n", + DRV_NAME); + rnet->stats.rx_errors++; + } else { + rnet->stats.rx_packets++; + rnet->stats.rx_bytes += RIO_MAX_MSG_SIZE; + } + + } while ((i = (i + 1) % RIONET_RX_RING_SIZE) != rnet->rx_slot); + + return i; +} + +static void rionet_rx_fill(struct net_device *ndev, int end) +{ + int i; + struct rionet_private *rnet = ndev->priv; + + i = rnet->rx_slot; + do { + rnet->rx_skb[i] = dev_alloc_skb(RIO_MAX_MSG_SIZE); + + if (!rnet->rx_skb[i]) + break; + + rio_add_inb_buffer(rnet->mport, RIONET_MAILBOX, + rnet->rx_skb[i]->data); + } while ((i = (i + 1) % RIONET_RX_RING_SIZE) != end); + + rnet->rx_slot = i; +} + +static int rionet_queue_tx_msg(struct sk_buff *skb, struct net_device *ndev, + struct rio_dev *rdev) +{ + struct rionet_private *rnet = ndev->priv; + + rio_add_outb_message(rnet->mport, rdev, 0, skb->data, skb->len); + rnet->tx_skb[rnet->tx_slot] = skb; + + rnet->stats.tx_packets++; + rnet->stats.tx_bytes += skb->len; + + if (++rnet->tx_cnt == RIONET_TX_RING_SIZE) + netif_stop_queue(ndev); + + if (++rnet->tx_slot == RIONET_TX_RING_SIZE) + rnet->tx_slot = 0; + + if (netif_msg_tx_queued(rnet)) + printk(KERN_INFO "%s: queued skb %8.8x len %8.8x\n", DRV_NAME, + (u32) skb, skb->len); + + return 0; +} + +static int rionet_start_xmit(struct sk_buff *skb, struct net_device *ndev) +{ + int i; + struct rionet_private *rnet = ndev->priv; + struct ethhdr *eth = (struct ethhdr *)skb->data; + u16 destid; + unsigned long flags; + + local_irq_save(flags); + if (!spin_trylock(&rnet->tx_lock)) { + local_irq_restore(flags); + return NETDEV_TX_LOCKED; + } + + if ((rnet->tx_cnt + 1) > RIONET_TX_RING_SIZE) { + netif_stop_queue(ndev); + spin_unlock_irqrestore(&rnet->tx_lock, flags); + printk(KERN_ERR "%s: BUG! Tx Ring full when queue awake!\n", + ndev->name); + return NETDEV_TX_BUSY; + } + + if (eth->h_dest[0] & 0x01) { + /* + * XXX Need to delay queuing if ring max is reached, + * flush additional packets in tx_event() before + * awakening the queue. We can easily exceed ring + * size with a large number of nodes or even a + * small number where the ring is relatively full + * on entrance to hard_start_xmit. + */ + for (i = 0; i < RIO_MAX_ROUTE_ENTRIES; i++) + if (rionet_active[i]) + rionet_queue_tx_msg(skb, ndev, + rionet_active[i]); + } else if (RIONET_MAC_MATCH(eth->h_dest)) { + destid = RIONET_GET_DESTID(eth->h_dest); + if (rionet_active[destid]) + rionet_queue_tx_msg(skb, ndev, rionet_active[destid]); + } + + spin_unlock_irqrestore(&rnet->tx_lock, flags); + + return 0; +} + +static int rionet_set_mac_address(struct net_device *ndev, void *p) +{ + struct sockaddr *addr = p; + + if (!is_valid_ether_addr(addr->sa_data)) + return -EADDRNOTAVAIL; + + memcpy(ndev->dev_addr, addr->sa_data, ndev->addr_len); + + return 0; +} + +static void rionet_dbell_event(struct rio_mport *mport, u16 sid, u16 tid, + u16 info) +{ + struct net_device *ndev = sndev; + struct rionet_private *rnet = ndev->priv; + struct rionet_peer *peer; + + if (netif_msg_intr(rnet)) + printk(KERN_INFO "%s: doorbell sid %4.4x tid %4.4x info %4.4x", + DRV_NAME, sid, tid, info); + if (info == RIONET_DOORBELL_JOIN) { + if (!rionet_active[sid]) { + list_for_each_entry(peer, &rionet_peers, node) { + if (peer->rdev->destid == sid) + rionet_active[sid] = peer->rdev; + } + rio_mport_send_doorbell(mport, sid, + RIONET_DOORBELL_JOIN); + } + } else if (info == RIONET_DOORBELL_LEAVE) { + rionet_active[sid] = NULL; + } else { + if (netif_msg_intr(rnet)) + printk(KERN_WARNING "%s: unhandled doorbell\n", + DRV_NAME); + } +} + +static void rionet_inb_msg_event(struct rio_mport *mport, int mbox, int slot) +{ + int n; + struct net_device *ndev = sndev; + struct rionet_private *rnet = (struct rionet_private *)ndev->priv; + + if (netif_msg_intr(rnet)) + printk(KERN_INFO "%s: inbound message event, mbox %d slot %d\n", + DRV_NAME, mbox, slot); + + spin_lock(&rnet->lock); + if ((n = rionet_rx_clean(ndev)) != rnet->rx_slot) + rionet_rx_fill(ndev, n); + spin_unlock(&rnet->lock); +} + +static void rionet_outb_msg_event(struct rio_mport *mport, int mbox, int slot) +{ + struct net_device *ndev = sndev; + struct rionet_private *rnet = ndev->priv; + + spin_lock(&rnet->lock); + + if (netif_msg_intr(rnet)) + printk(KERN_INFO + "%s: outbound message event, mbox %d slot %d\n", + DRV_NAME, mbox, slot); + + while (rnet->tx_cnt && (rnet->ack_slot != slot)) { + /* dma unmap single */ + dev_kfree_skb_irq(rnet->tx_skb[rnet->ack_slot]); + rnet->tx_skb[rnet->ack_slot] = NULL; + if (++rnet->ack_slot == RIONET_TX_RING_SIZE) + rnet->ack_slot = 0; + rnet->tx_cnt--; + } + + if (rnet->tx_cnt < RIONET_TX_RING_SIZE) + netif_wake_queue(ndev); + + spin_unlock(&rnet->lock); +} + +static int rionet_open(struct net_device *ndev) +{ + int i, rc = 0; + struct rionet_peer *peer, *tmp; + u32 pwdcsr; + struct rionet_private *rnet = ndev->priv; + + if (netif_msg_ifup(rnet)) + printk(KERN_INFO "%s: open\n", DRV_NAME); + + if ((rc = rio_request_inb_dbell(rnet->mport, + RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE, + rionet_dbell_event)) < 0) + goto out; + + if ((rc = rio_request_inb_mbox(rnet->mport, + RIONET_MAILBOX, + RIONET_RX_RING_SIZE, + rionet_inb_msg_event)) < 0) + goto out; + + if ((rc = rio_request_outb_mbox(rnet->mport, + RIONET_MAILBOX, + RIONET_TX_RING_SIZE, + rionet_outb_msg_event)) < 0) + goto out; + + /* Initialize inbound message ring */ + for (i = 0; i < RIONET_RX_RING_SIZE; i++) + rnet->rx_skb[i] = NULL; + rnet->rx_slot = 0; + rionet_rx_fill(ndev, 0); + + rnet->tx_slot = 0; + rnet->tx_cnt = 0; + rnet->ack_slot = 0; + + netif_carrier_on(ndev); + netif_start_queue(ndev); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + if (!(peer->res = rio_request_outb_dbell(peer->rdev, + RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE))) + { + printk(KERN_ERR "%s: error requesting doorbells\n", + DRV_NAME); + continue; + } + + /* + * If device has initialized inbound doorbells, + * send a join message + */ + rio_read_config_32(peer->rdev, RIO_WRITE_PORT_CSR, &pwdcsr); + if (pwdcsr & RIO_DOORBELL_AVAIL) + rio_send_doorbell(peer->rdev, RIONET_DOORBELL_JOIN); + } + + out: + return rc; +} + +static int rionet_close(struct net_device *ndev) +{ + struct rionet_private *rnet = (struct rionet_private *)ndev->priv; + struct rionet_peer *peer, *tmp; + int i; + + if (netif_msg_ifup(rnet)) + printk(KERN_INFO "%s: close\n", DRV_NAME); + + netif_stop_queue(ndev); + netif_carrier_off(ndev); + + for (i = 0; i < RIONET_RX_RING_SIZE; i++) + if (rnet->rx_skb[i]) + kfree_skb(rnet->rx_skb[i]); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + if (rionet_active[peer->rdev->destid]) { + rio_send_doorbell(peer->rdev, RIONET_DOORBELL_LEAVE); + rionet_active[peer->rdev->destid] = NULL; + } + rio_release_outb_dbell(peer->rdev, peer->res); + } + + rio_release_inb_dbell(rnet->mport, RIONET_DOORBELL_JOIN, + RIONET_DOORBELL_LEAVE); + rio_release_inb_mbox(rnet->mport, RIONET_MAILBOX); + rio_release_outb_mbox(rnet->mport, RIONET_MAILBOX); + + return 0; +} + +static void rionet_remove(struct rio_dev *rdev) +{ + struct net_device *ndev = NULL; + struct rionet_peer *peer, *tmp; + + unregister_netdev(ndev); + kfree(ndev); + + list_for_each_entry_safe(peer, tmp, &rionet_peers, node) { + list_del(&peer->node); + kfree(peer); + } +} + +static void rionet_get_drvinfo(struct net_device *ndev, + struct ethtool_drvinfo *info) +{ + struct rionet_private *rnet = ndev->priv; + + strcpy(info->driver, DRV_NAME); + strcpy(info->version, DRV_VERSION); + strcpy(info->fw_version, "n/a"); + sprintf(info->bus_info, "RIO master port %d", rnet->mport->id); +} + +static u32 rionet_get_msglevel(struct net_device *ndev) +{ + struct rionet_private *rnet = ndev->priv; + + return rnet->msg_enable; +} + +static void rionet_set_msglevel(struct net_device *ndev, u32 value) +{ + struct rionet_private *rnet = ndev->priv; + + rnet->msg_enable = value; +} + +static struct ethtool_ops rionet_ethtool_ops = { + .get_drvinfo = rionet_get_drvinfo, + .get_msglevel = rionet_get_msglevel, + .set_msglevel = rionet_set_msglevel, + .get_link = ethtool_op_get_link, +}; + +static int rionet_setup_netdev(struct rio_mport *mport) +{ + int rc = 0; + struct net_device *ndev = NULL; + struct rionet_private *rnet; + u16 device_id; + + /* Allocate our net_device structure */ + ndev = alloc_etherdev(sizeof(struct rionet_private)); + if (ndev == NULL) { + printk(KERN_INFO "%s: could not allocate ethernet device.\n", + DRV_NAME); + rc = -ENOMEM; + goto out; + } + + /* + * XXX hack, store point a static at ndev so we can get it... + * Perhaps need an array of these that the handler can + * index via the mbox number. + */ + sndev = ndev; + + /* Set up private area */ + rnet = (struct rionet_private *)ndev->priv; + rnet->mport = mport; + + /* Set the default MAC address */ + device_id = rio_local_get_device_id(mport); + ndev->dev_addr[0] = 0x00; + ndev->dev_addr[1] = 0x01; + ndev->dev_addr[2] = 0x00; + ndev->dev_addr[3] = 0x01; + ndev->dev_addr[4] = device_id >> 8; + ndev->dev_addr[5] = device_id & 0xff; + + /* Fill in the driver function table */ + ndev->open = &rionet_open; + ndev->hard_start_xmit = &rionet_start_xmit; + ndev->stop = &rionet_close; + ndev->get_stats = &rionet_stats; + ndev->set_mac_address = &rionet_set_mac_address; + ndev->mtu = RIO_MAX_MSG_SIZE - 14; + ndev->features = NETIF_F_LLTX; + SET_ETHTOOL_OPS(ndev, &rionet_ethtool_ops); + + SET_MODULE_OWNER(ndev); + + spin_lock_init(&rnet->lock); + spin_lock_init(&rnet->tx_lock); + + rnet->msg_enable = RIONET_DEFAULT_MSGLEVEL; + + rc = register_netdev(ndev); + if (rc != 0) + goto out; + + printk("%s: %s %s Version %s, MAC %02x:%02x:%02x:%02x:%02x:%02x\n", + ndev->name, + DRV_NAME, + DRV_DESC, + DRV_VERSION, + ndev->dev_addr[0], ndev->dev_addr[1], ndev->dev_addr[2], + ndev->dev_addr[3], ndev->dev_addr[4], ndev->dev_addr[5]); + + out: + return rc; +} + +/* + * XXX Make multi-net safe + */ +static int rionet_probe(struct rio_dev *rdev, const struct rio_device_id *id) +{ + int rc = -ENODEV; + u32 lpef, lsrc_ops, ldst_ops; + struct rionet_peer *peer; + + /* If local device is not rionet capable, give up quickly */ + if (!rionet_capable) + goto out; + + /* + * First time through, make sure local device is rionet + * capable, setup netdev, and set flags so this is skipped + * on later probes + */ + if (!rionet_check) { + rio_local_read_config_32(rdev->net->hport, RIO_PEF_CAR, &lpef); + rio_local_read_config_32(rdev->net->hport, RIO_SRC_OPS_CAR, + &lsrc_ops); + rio_local_read_config_32(rdev->net->hport, RIO_DST_OPS_CAR, + &ldst_ops); + if (!is_rionet_capable(lpef, lsrc_ops, ldst_ops)) { + printk(KERN_ERR + "%s: local device is not network capable\n", + DRV_NAME); + rionet_check = 1; + rionet_capable = 0; + goto out; + } + + rc = rionet_setup_netdev(rdev->net->hport); + rionet_check = 1; + } + + /* + * If the remote device has mailbox/doorbell capabilities, + * add it to the peer list. + */ + if (dev_rionet_capable(rdev)) { + if (!(peer = kmalloc(sizeof(struct rionet_peer), GFP_KERNEL))) { + rc = -ENOMEM; + goto out; + } + peer->rdev = rdev; + list_add_tail(&peer->node, &rionet_peers); + } + + out: + return rc; +} + +static struct rio_device_id rionet_id_table[] = { + {RIO_DEVICE(RIO_ANY_ID, RIO_ANY_ID)} +}; + +static struct rio_driver rionet_driver = { + .name = "rionet", + .id_table = rionet_id_table, + .probe = rionet_probe, + .remove = rionet_remove, +}; + +static int __init rionet_init(void) +{ + return rio_register_driver(&rionet_driver); +} + +static void __exit rionet_exit(void) +{ + rio_unregister_driver(&rionet_driver); +} + +module_init(rionet_init); +module_exit(rionet_exit); From davem@davemloft.net Mon Jun 6 15:48:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 15:48:19 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56MmEXq007919 for ; Mon, 6 Jun 2005 15:48:16 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfQMv-0007U1-7g; Mon, 06 Jun 2005 15:46:53 -0700 Date: Mon, 06 Jun 2005 15:46:53 -0700 (PDT) Message-Id: <20050606.154653.64001264.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: hch@infradead.org, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag From: "David S. Miller" In-Reply-To: <20050606124043.GA625@gondor.apana.org.au> References: <20050606115939.GA399@gondor.apana.org.au> <20050606120914.GA8317@infradead.org> <20050606124043.GA625@gondor.apana.org.au> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2150 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1877 Lines: 46 From: Herbert Xu Date: Mon, 6 Jun 2005 22:40:43 +1000 > However, for skb_frag_t at least going to the 32-bit version on i386 > means at least 72 bytes extra for every skb->data allocation. > > Dave, what are your views on making skb_frag_t bigger? Good question. There is an ancillary issue that I'd like to address at some point, and what you do here is tied into that. Currently, NETIF_F_SG drivers do one DMA mapping call for each fragment of the packet. That totally stinks performance wise, and the PPC64 and SPARC64 folks feel this the most. So I wanted to create a set of interfaces ala: int dma_map_skb(struct sk_buff *skb, ...); void dma_unmap_skb(struct sk_buff *skb, ...); void dma_sync_skb_for_cpu(struct sk_buff *skb, ...); void dma_sync_skb_for_device(struct sk_buff *skb, ...); The question is where to put the DMA mapping cookies :-) On i386 and alike, using something like the DECLARE_PCI_UNMAP_*() macros would allow us to NOP out the DMA addresses entirely. Since they are computable from the page struct and offset. Note that the above interface, on IOMMU platforms, would allow DMA coalescing to be performed. This would hit heavily with TSO, for example. Most packets would go out with a maximum of 2 DMA descriptors, 1 for the mapping of skb->data and 1 for all of the paged SKB data afterwards combined. Note that, due to this coalescing, the "size" member must be larger than a __u16. So I guess I'm taking you a step backwards, I want to make skb_frag_struct a little bigger :-) Ie. put the DMA mapping cookies into the skb_frag_struct, then a set of accessor macros like we have for scatterlist. Well, in fact, it would become a scatterlist and therefore the only thing special about dma_map_skb() is that is maps a linear buffer via skb->data then the scatterlist in skb_shared_info(skb). From grundler@cup.hp.com Mon Jun 6 15:48:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 15:48:56 -0700 (PDT) Received: from palrel10.hp.com (palrel10.hp.com [156.153.255.245]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56MmrXq008005 for ; Mon, 6 Jun 2005 15:48:53 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel10.hp.com (Postfix) with ESMTP id 6EE181379; Mon, 6 Jun 2005 15:24:00 -0700 (PDT) Received: from localhost.localdomain (debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id PAA29395; Mon, 6 Jun 2005 15:18:08 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id DD1758FC56; Mon, 6 Jun 2005 15:26:31 -0700 (PDT) Date: Mon, 6 Jun 2005 15:26:31 -0700 From: Grant Grundler To: "David S. Miller" Cc: mchan@broadcom.com, iod00d@hp.com, peterc@gelato.unsw.edu.au, netdev@oss.sgi.com Subject: Re: [PATCH] tg3: Fix link failure in 5701 Message-ID: <20050606222631.GE12068@esmail.cup.hp.com> References: <1118086942.5008.14.camel@rh4> <20050606.151641.95895557.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606.151641.95895557.davem@davemloft.net> User-Agent: Mutt/1.5.9i X-archive-position: 2151 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 293 Lines: 13 On Mon, Jun 06, 2005 at 03:16:41PM -0700, David S. Miller wrote: > Applied, thanks a log. Dave, Btw, where can I see which version of tg3 will get this fix? I'm certainly I'll be asked the question "which tg3 version is required" more than the few times. thanks, grant ps. Thanks Michael! From herbert@gondor.apana.org.au Mon Jun 6 16:06:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:06:24 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56N6HXq009765 for ; Mon, 6 Jun 2005 16:06:18 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DfQeC-0007FB-00; Tue, 07 Jun 2005 09:04:44 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DfQe5-0001s3-00; Tue, 07 Jun 2005 09:04:37 +1000 Date: Tue, 7 Jun 2005 09:04:37 +1000 To: "David S. Miller" Cc: hch@infradead.org, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050606230437.GA7172@gondor.apana.org.au> References: <20050606115939.GA399@gondor.apana.org.au> <20050606120914.GA8317@infradead.org> <20050606124043.GA625@gondor.apana.org.au> <20050606.154653.64001264.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606.154653.64001264.davem@davemloft.net> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2152 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 907 Lines: 21 On Mon, Jun 06, 2005 at 03:46:53PM -0700, David S. Miller wrote: > > So I guess I'm taking you a step backwards, I want to make > skb_frag_struct a little bigger :-) Ie. put the DMA mapping > cookies into the skb_frag_struct, then a set of accessor > macros like we have for scatterlist. Well, in fact, it would > become a scatterlist and therefore the only thing special > about dma_map_skb() is that is maps a linear buffer via > skb->data then the scatterlist in skb_shared_info(skb). Bigger is better actually :) I'm now thinking of using the memory occupied by the frags array for IPsec crypto operations. So if it's bigger then it simply means that we can store more fragments. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From jgarzik@pobox.com Mon Jun 6 16:07:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:07:09 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56N75Xq009864 for ; Mon, 6 Jun 2005 16:07:06 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DfQf5-00037R-F2; Mon, 06 Jun 2005 23:05:40 +0000 Message-ID: <42A4D6BF.9020908@pobox.com> Date: Mon, 06 Jun 2005 19:05:35 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: herbert@gondor.apana.org.au, hch@infradead.org, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag References: <20050606115939.GA399@gondor.apana.org.au> <20050606120914.GA8317@infradead.org> <20050606124043.GA625@gondor.apana.org.au> <20050606.154653.64001264.davem@davemloft.net> In-Reply-To: <20050606.154653.64001264.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2153 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 100 Lines: 8 David S. Miller wrote: > The question is where to put the DMA mapping cookies :-) Indeed. Jeff From davem@davemloft.net Mon Jun 6 16:11:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:11:09 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56NB6Xq011156 for ; Mon, 6 Jun 2005 16:11:06 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfQj6-0000ak-1R; Mon, 06 Jun 2005 16:09:48 -0700 Date: Mon, 06 Jun 2005 16:09:47 -0700 (PDT) Message-Id: <20050606.160947.75190168.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: hch@infradead.org, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag From: "David S. Miller" In-Reply-To: <20050606230437.GA7172@gondor.apana.org.au> References: <20050606124043.GA625@gondor.apana.org.au> <20050606.154653.64001264.davem@davemloft.net> <20050606230437.GA7172@gondor.apana.org.au> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2154 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 433 Lines: 11 From: Herbert Xu Date: Tue, 7 Jun 2005 09:04:37 +1000 > Bigger is better actually :) I'm now thinking of using the > memory occupied by the frags array for IPsec crypto operations. > So if it's bigger then it simply means that we can store more > fragments. So you want to use this area as a sort-of temporary scratch pad for something other than scatterlist information? That's interesting if so... From herbert@gondor.apana.org.au Mon Jun 6 16:15:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:15:36 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56NFSXq012040 for ; Mon, 6 Jun 2005 16:15:29 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DfQnG-0007Qj-00; Tue, 07 Jun 2005 09:14:06 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DfQnF-0001vS-00; Tue, 07 Jun 2005 09:14:05 +1000 Date: Tue, 7 Jun 2005 09:14:05 +1000 To: "David S. Miller" Cc: hch@infradead.org, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050606231405.GA7385@gondor.apana.org.au> References: <20050606124043.GA625@gondor.apana.org.au> <20050606.154653.64001264.davem@davemloft.net> <20050606230437.GA7172@gondor.apana.org.au> <20050606.160947.75190168.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606.160947.75190168.davem@davemloft.net> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2155 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 824 Lines: 24 On Mon, Jun 06, 2005 at 04:09:47PM -0700, David S. Miller wrote: > > So you want to use this area as a sort-of temporary > scratch pad for something other than scatterlist > information? That's interesting if so... There are two possibilities: 1) We use it directly as a scratch buffer for now since the frags are always linearised currently. 2) We have a list of lists of which this is simply a member. The meta-list itself can then be stored on the stack since each member is only 4 bytes. We can go with 1) for now. When it becomes possible for us to not flatten the frags, we can switch to 2). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From davem@davemloft.net Mon Jun 6 16:19:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:19:32 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56NJTXq016236 for ; Mon, 6 Jun 2005 16:19:29 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfQrG-0000zi-92; Mon, 06 Jun 2005 16:18:14 -0700 Date: Mon, 06 Jun 2005 16:18:14 -0700 (PDT) Message-Id: <20050606.161814.130845728.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: hch@infradead.org, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag From: "David S. Miller" In-Reply-To: <20050606231405.GA7385@gondor.apana.org.au> References: <20050606230437.GA7172@gondor.apana.org.au> <20050606.160947.75190168.davem@davemloft.net> <20050606231405.GA7385@gondor.apana.org.au> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2156 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 330 Lines: 10 From: Herbert Xu Date: Tue, 7 Jun 2005 09:14:05 +1000 > 1) We use it directly as a scratch buffer for now since the > frags are always linearised currently. And since skb_shinfo(skb)->nr_frags will be zero, nobody will mistakedly look at the contents and interpret it as some valid frags. Right? From herbert@gondor.apana.org.au Mon Jun 6 16:21:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:21:49 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56NLjXq017009 for ; Mon, 6 Jun 2005 16:21:45 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DfQt9-0007Sp-00; Tue, 07 Jun 2005 09:20:11 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DfQt7-0001xI-00; Tue, 07 Jun 2005 09:20:09 +1000 Date: Tue, 7 Jun 2005 09:20:09 +1000 To: "David S. Miller" Cc: hch@infradead.org, jmorris@redhat.com, linux-crypto@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [RFC] Replace scatterlist with crypto_frag Message-ID: <20050606232009.GA7475@gondor.apana.org.au> References: <20050606230437.GA7172@gondor.apana.org.au> <20050606.160947.75190168.davem@davemloft.net> <20050606231405.GA7385@gondor.apana.org.au> <20050606.161814.130845728.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606.161814.130845728.davem@davemloft.net> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2158 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 628 Lines: 17 On Mon, Jun 06, 2005 at 04:18:14PM -0700, David S. Miller wrote: > From: Herbert Xu > Date: Tue, 7 Jun 2005 09:14:05 +1000 > > > 1) We use it directly as a scratch buffer for now since the > > frags are always linearised currently. > > And since skb_shinfo(skb)->nr_frags will be zero, nobody > will mistakedly look at the contents and interpret it as > some valid frags. Right? Yep. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Mon Jun 6 16:21:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:21:44 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56NLbXq016977 for ; Mon, 6 Jun 2005 16:21:37 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DfQsu-0007SP-00; Tue, 07 Jun 2005 09:20:19 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DfQsr-0001wK-00; Tue, 07 Jun 2005 09:19:53 +1000 From: Herbert Xu To: rmk@arm.linux.org.uk (Russell King) Subject: Re: Fwd: [Bug 4615] Modem connection stalls out. Cc: netdev@oss.sgi.com Organization: Core In-Reply-To: <20050606224729.B12034@flint.arm.linux.org.uk> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Tue, 07 Jun 2005 09:19:53 +1000 X-archive-position: 2157 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 580 Lines: 16 Russell King wrote: > > The "No buffer space available" looks like the system is running low on > memory. Would networking folk concur with that? It's probably not running low of system memory. However, it might be running out of things such as routing cache entries. Check the dmesg output, it might have a clue on what went wrong. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From mitch.a.williams@intel.com Mon Jun 6 16:57:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 16:58:02 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j56NviXq019949 for ; Mon, 6 Jun 2005 16:57:44 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j56NtK0S002940; Mon, 6 Jun 2005 23:55:20 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j56NtKOJ019762; Mon, 6 Jun 2005 23:55:20 GMT Received: from mawilli1-desk2.amr.corp.intel.com (mawilli1-desk2.amr.corp.intel.com [134.134.3.124]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j56NtJSL007934; Mon, 6 Jun 2005 16:55:19 -0700 Date: Mon, 6 Jun 2005 16:55:19 -0700 From: Mitch Williams X-X-Sender: mawilli1@mawilli1-desk2.amr.corp.intel.com To: "Ronciak, John" cc: "David S. Miller" , mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, "Williams, Mitch A" , jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: RE: RFC: NAPI packet weighting patch In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> ReplyTo: "Mitch Williams" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2159 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mitch.a.williams@intel.com Precedence: bulk X-list: netdev Content-Length: 1543 Lines: 40 On Mon, 6 Jun 2005, Ronciak, John wrote: > > If you force the e1000 driver to do RX replenishment every N > > packets it should reduce the packet drops the same (in the > > single NIC case) as if you reduced the dev->weight to that > > same value N. > > But this isn't what we are seeing. Even if we just reduce the weight > value to 32 from 64, all of the drops go away. So there seems to be > other things affecting this. Some quickie results for everybody -- I've been working on other stuff this morning and haven't had much time in the lab. Increasing the RX ring to 512 descriptors eliminates dropped packets, but performance goes down. When I mentioned this, John and Jesse both nodded and said, "Yep. That's what happens when the descriptor ring grows past one page." Reducing the weight to 32 got rid of almost all of the dropped packets (down to < 1 per second); reducing it to 20 eliminated all of them. In both cases performance rose as compared to the default weight of 64. Tests were run on 2.6.12rc5 on a dual Xeon 2.8GHz PCI-X system. We run Chariot for performance testing, using TCP/IP large file transfers with 10 Windows 2000 clients. We're still looking at some methods of returning RX resources to the hardware more often, but we don't have results on that yet. > I also like your idea about the weight value being adjusted based on > real work done using some measurable metric. This seems like a good > path to explore as well. Agreed. I think NAPI can be a lot smarter than it is today. -Mitch From greearb@candelatech.com Mon Jun 6 17:10:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 17:10:32 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j570ALXq020856 for ; Mon, 6 Jun 2005 17:10:21 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j570go5I000307; Mon, 6 Jun 2005 17:42:51 -0700 Message-ID: <42A4E599.2090604@candelatech.com> Date: Mon, 06 Jun 2005 17:08:57 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Mitch Williams CC: "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2160 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1511 Lines: 44 Mitch Williams wrote: > > On Mon, 6 Jun 2005, Ronciak, John wrote: > > >>> If you force the e1000 driver to do RX replenishment every N >>> packets it should reduce the packet drops the same (in the >>> single NIC case) as if you reduced the dev->weight to that >>> same value N. >> >>But this isn't what we are seeing. Even if we just reduce the weight >>value to 32 from 64, all of the drops go away. So there seems to be >>other things affecting this. > > > Some quickie results for everybody -- I've been working on other stuff this > morning and haven't had much time in the lab. > > Increasing the RX ring to 512 descriptors eliminates dropped packets, but > performance goes down. When I mentioned this, John and Jesse both nodded > and said, "Yep. That's what happens when the descriptor ring grows past > one page." > > Reducing the weight to 32 got rid of almost all of the dropped packets > (down to < 1 per second); reducing it to 20 eliminated all of them. In > both cases performance rose as compared to the default weight of 64. > > Tests were run on 2.6.12rc5 on a dual Xeon 2.8GHz PCI-X system. We run > Chariot for performance testing, using TCP/IP large file transfers with 10 > Windows 2000 clients. So is the Linux server reading/writing these large files to/from the disk? Can you tell us how much performance went down when you increased the descriptors to 512? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@davemloft.net Mon Jun 6 21:09:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:09:59 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5749uXq005901 for ; Mon, 6 Jun 2005 21:09:57 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVOR-0000e6-2U; Mon, 06 Jun 2005 21:08:47 -0700 Date: Mon, 06 Jun 2005 21:08:46 -0700 (PDT) Message-Id: <20050606.210846.07641049.davem@davemloft.net> To: netdev@oss.sgi.com CC: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 0/9]: TCP: The Road to Super TSO From: "David S. Miller" X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2161 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1557 Lines: 33 Some folks, notable the S2IO guys, get performance degradation from the Super TSO v2 patch (they get it from the first version as well). It's a real pain to spot what causes such things in such a huge patch... so I started splitting things up in a very fine grained manner so we can catch regressions more precisely. There are several bugs spotted by this first set of 9 patches, and I'd really appreciate good high-quality testing reports. Please do not mail such reports privately to me, as some have done, always include netdev@oss.sgi.com, thanks a lot. Herbert, I'm CC:'ing you because one of the bugs fixed here has to do with the TSO header COW'ing stuff you did. You missed one case where a skb_header_release() call was needed, namely tcp_fragment() where it does it's __skb_append(). John, I'm CC:'ing you because there are several cwnd handling related cures in here. I did _not_ fix the TSO cwnd growth bug yet in these patches, but it is at the very top of my TODO list for my next batch of work on this stuff. The most notable fix here is the bogus extra cwnd validation done by __tcp_push_pending_frames(). That validation should only occur if we _do_ send some packets, and tcp_write_xmit() takes care of that just fine. The other one is that the 'nonagle' argument to __tcp_push_pending_frames() is clobbered by it's tcp_skb_is_last() logic, causing TCP_NAGLE_PUSH to be used for all packets processed by tcp_write_xmit(), whoops... Please help me review this stuff, thanks. The patches will show up as followups to this email. From davem@davemloft.net Mon Jun 6 21:17:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:17:59 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574HuXq006623 for ; Mon, 6 Jun 2005 21:17:56 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVWB-0000h6-FB; Mon, 06 Jun 2005 21:16:47 -0700 Date: Mon, 06 Jun 2005 21:16:17 -0700 (PDT) Message-Id: <20050606.211617.92588086.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 1/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2162 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 2330 Lines: 75 [TCP]: Simplify SKB data portion allocation with NETIF_F_SG. The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller 28f78ef8dcc90a2a26499dab76678bd6813d7793 (from 3f5948fa2cbbda1261eec9a39ef3004b3caf73fb) diff --git a/include/net/sock.h b/include/net/sock.h --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1130,13 +1130,16 @@ static inline void sk_stream_moderate_sn static inline struct sk_buff *sk_stream_alloc_pskb(struct sock *sk, int size, int mem, int gfp) { - struct sk_buff *skb = alloc_skb(size + sk->sk_prot->max_header, gfp); + struct sk_buff *skb; + int hdr_len; + hdr_len = SKB_DATA_ALIGN(sk->sk_prot->max_header); + skb = alloc_skb(size + hdr_len, gfp); if (skb) { skb->truesize += mem; if (sk->sk_forward_alloc >= (int)skb->truesize || sk_stream_mem_schedule(sk, skb->truesize, 0)) { - skb_reserve(skb, sk->sk_prot->max_header); + skb_reserve(skb, hdr_len); return skb; } __kfree_skb(skb); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -775,13 +775,9 @@ static inline int select_size(struct soc { int tmp = tp->mss_cache_std; - if (sk->sk_route_caps & NETIF_F_SG) { - int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER); + if (sk->sk_route_caps & NETIF_F_SG) + tmp = 0; - if (tmp >= pgbreak && - tmp <= pgbreak + (MAX_SKB_FRAGS - 1) * PAGE_SIZE) - tmp = pgbreak; - } return tmp; } @@ -891,11 +887,6 @@ new_segment: tcp_mark_push(tp, skb); goto new_segment; } else if (page) { - /* If page is cached, align - * offset to L1 cache boundary - */ - off = (off + L1_CACHE_BYTES - 1) & - ~(L1_CACHE_BYTES - 1); if (off == PAGE_SIZE) { put_page(page); TCP_PAGE(sk) = page = NULL; From davem@davemloft.net Mon Jun 6 21:18:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:18:45 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574IfXq006820 for ; Mon, 6 Jun 2005 21:18:41 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVX0-0000hN-6H; Mon, 06 Jun 2005 21:17:38 -0700 Date: Mon, 06 Jun 2005 21:17:08 -0700 (PDT) Message-Id: <20050606.211708.63132555.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 2/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2163 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 2161 Lines: 69 [TCP]: Fix quick-ack decrementing with TSO. On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller 00cb08b2ec091f4b461210026392edeaccf31d9c (from 28f78ef8dcc90a2a26499dab76678bd6813d7793) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -817,11 +817,16 @@ static inline int tcp_ack_scheduled(stru return tp->ack.pending&TCP_ACK_SCHED; } -static __inline__ void tcp_dec_quickack_mode(struct tcp_sock *tp) +static __inline__ void tcp_dec_quickack_mode(struct tcp_sock *tp, unsigned int pkts) { - if (tp->ack.quick && --tp->ack.quick == 0) { - /* Leaving quickack mode we deflate ATO. */ - tp->ack.ato = TCP_ATO_MIN; + if (tp->ack.quick) { + if (pkts >= tp->ack.quick) { + tp->ack.quick = 0; + + /* Leaving quickack mode we deflate ATO. */ + tp->ack.ato = TCP_ATO_MIN; + } else + tp->ack.quick -= pkts; } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -141,11 +141,11 @@ static inline void tcp_event_data_sent(s tp->ack.pingpong = 1; } -static __inline__ void tcp_event_ack_sent(struct sock *sk) +static __inline__ void tcp_event_ack_sent(struct sock *sk, unsigned int pkts) { struct tcp_sock *tp = tcp_sk(sk); - tcp_dec_quickack_mode(tp); + tcp_dec_quickack_mode(tp, pkts); tcp_clear_xmit_timer(sk, TCP_TIME_DACK); } @@ -361,7 +361,7 @@ static int tcp_transmit_skb(struct sock tp->af_specific->send_check(sk, th, skb->len, skb); if (tcb->flags & TCPCB_FLAG_ACK) - tcp_event_ack_sent(sk); + tcp_event_ack_sent(sk, tcp_skb_pcount(skb)); if (skb->len != tcp_header_size) tcp_event_data_sent(tp, skb, sk); From davem@davemloft.net Mon Jun 6 21:19:31 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:19:35 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574JTXq007264 for ; Mon, 6 Jun 2005 21:19:31 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVXm-0000hX-AA; Mon, 06 Jun 2005 21:18:26 -0700 Date: Mon, 06 Jun 2005 21:17:56 -0700 (PDT) Message-Id: <20050606.211756.30188342.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 3/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2164 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 10264 Lines: 330 [TCP]: Move send test logic out of net/tcp.h This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller cba5d690f46699d37df7dc087247d1f7c7155692 (from 00cb08b2ec091f4b461210026392edeaccf31d9c) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -945,6 +945,9 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ extern int tcp_write_xmit(struct sock *, int nonagle); +extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, + unsigned cur_mss, int nonagle); +extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); extern int tcp_retransmit_skb(struct sock *, struct sk_buff *); extern void tcp_xmit_retransmit_queue(struct sock *); extern void tcp_simple_retransmit(struct sock *); @@ -1389,12 +1392,6 @@ static __inline__ __u32 tcp_max_burst(co return 3; } -static __inline__ int tcp_minshall_check(const struct tcp_sock *tp) -{ - return after(tp->snd_sml,tp->snd_una) && - !after(tp->snd_sml, tp->snd_nxt); -} - static __inline__ void tcp_minshall_update(struct tcp_sock *tp, int mss, const struct sk_buff *skb) { @@ -1402,122 +1399,18 @@ static __inline__ void tcp_minshall_upda tp->snd_sml = TCP_SKB_CB(skb)->end_seq; } -/* Return 0, if packet can be sent now without violation Nagle's rules: - 1. It is full sized. - 2. Or it contains FIN. - 3. Or TCP_NODELAY was set. - 4. Or TCP_CORK is not set, and all sent packets are ACKed. - With Minshall's modification: all sent small packets are ACKed. - */ - -static __inline__ int -tcp_nagle_check(const struct tcp_sock *tp, const struct sk_buff *skb, - unsigned mss_now, int nonagle) -{ - return (skb->len < mss_now && - !(TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) && - ((nonagle&TCP_NAGLE_CORK) || - (!nonagle && - tp->packets_out && - tcp_minshall_check(tp)))); -} - -extern void tcp_set_skb_tso_segs(struct sock *, struct sk_buff *); - -/* This checks if the data bearing packet SKB (usually sk->sk_send_head) - * should be put on the wire right now. - */ -static __inline__ int tcp_snd_test(struct sock *sk, - struct sk_buff *skb, - unsigned cur_mss, int nonagle) -{ - struct tcp_sock *tp = tcp_sk(sk); - int pkts = tcp_skb_pcount(skb); - - if (!pkts) { - tcp_set_skb_tso_segs(sk, skb); - pkts = tcp_skb_pcount(skb); - } - - /* RFC 1122 - section 4.2.3.4 - * - * We must queue if - * - * a) The right edge of this frame exceeds the window - * b) There are packets in flight and we have a small segment - * [SWS avoidance and Nagle algorithm] - * (part of SWS is done on packetization) - * Minshall version sounds: there are no _small_ - * segments in flight. (tcp_nagle_check) - * c) We have too many packets 'in flight' - * - * Don't use the nagle rule for urgent data (or - * for the final FIN -DaveM). - * - * Also, Nagle rule does not apply to frames, which - * sit in the middle of queue (they have no chances - * to get new data) and if room at tail of skb is - * not enough to save something seriously (<32 for now). - */ - - /* Don't be strict about the congestion window for the - * final FIN frame. -DaveM - */ - return (((nonagle&TCP_NAGLE_PUSH) || tp->urg_mode - || !tcp_nagle_check(tp, skb, cur_mss, nonagle)) && - (((tcp_packets_in_flight(tp) + (pkts-1)) < tp->snd_cwnd) || - (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)) && - !after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd)); -} - static __inline__ void tcp_check_probe_timer(struct sock *sk, struct tcp_sock *tp) { if (!tp->packets_out && !tp->pending) tcp_reset_xmit_timer(sk, TCP_TIME_PROBE0, tp->rto); } -static __inline__ int tcp_skb_is_last(const struct sock *sk, - const struct sk_buff *skb) -{ - return skb->next == (struct sk_buff *)&sk->sk_write_queue; -} - -/* Push out any pending frames which were held back due to - * TCP_CORK or attempt at coalescing tiny packets. - * The socket must be locked by the caller. - */ -static __inline__ void __tcp_push_pending_frames(struct sock *sk, - struct tcp_sock *tp, - unsigned cur_mss, - int nonagle) -{ - struct sk_buff *skb = sk->sk_send_head; - - if (skb) { - if (!tcp_skb_is_last(sk, skb)) - nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, nonagle)) - tcp_check_probe_timer(sk, tp); - } - tcp_cwnd_validate(sk, tp); -} - static __inline__ void tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp) { __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk, 1), tp->nonagle); } -static __inline__ int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) -{ - struct sk_buff *skb = sk->sk_send_head; - - return (skb && - tcp_snd_test(sk, skb, tcp_current_mss(sk, 1), - tcp_skb_is_last(sk, skb) ? TCP_NAGLE_PUSH : tp->nonagle)); -} - static __inline__ void tcp_init_wl(struct tcp_sock *tp, u32 ack, u32 seq) { tp->snd_wl1 = seq; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -419,6 +419,135 @@ static inline void tcp_tso_set_push(stru TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_PSH; } +static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (skb->len <= tp->mss_cache_std || + !(sk->sk_route_caps & NETIF_F_TSO)) { + /* Avoid the costly divide in the normal + * non-TSO case. + */ + skb_shinfo(skb)->tso_segs = 1; + skb_shinfo(skb)->tso_size = 0; + } else { + unsigned int factor; + + factor = skb->len + (tp->mss_cache_std - 1); + factor /= tp->mss_cache_std; + skb_shinfo(skb)->tso_segs = factor; + skb_shinfo(skb)->tso_size = tp->mss_cache_std; + } +} + +static inline int tcp_minshall_check(const struct tcp_sock *tp) +{ + return after(tp->snd_sml,tp->snd_una) && + !after(tp->snd_sml, tp->snd_nxt); +} + +/* Return 0, if packet can be sent now without violation Nagle's rules: + * 1. It is full sized. + * 2. Or it contains FIN. + * 3. Or TCP_NODELAY was set. + * 4. Or TCP_CORK is not set, and all sent packets are ACKed. + * With Minshall's modification: all sent small packets are ACKed. + */ + +static inline int tcp_nagle_check(const struct tcp_sock *tp, + const struct sk_buff *skb, + unsigned mss_now, int nonagle) +{ + return (skb->len < mss_now && + !(TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) && + ((nonagle&TCP_NAGLE_CORK) || + (!nonagle && + tp->packets_out && + tcp_minshall_check(tp)))); +} + +/* This checks if the data bearing packet SKB (usually sk->sk_send_head) + * should be put on the wire right now. + */ +static int tcp_snd_test(struct sock *sk, struct sk_buff *skb, + unsigned cur_mss, int nonagle) +{ + struct tcp_sock *tp = tcp_sk(sk); + int pkts = tcp_skb_pcount(skb); + + if (!pkts) { + tcp_set_skb_tso_segs(sk, skb); + pkts = tcp_skb_pcount(skb); + } + + /* RFC 1122 - section 4.2.3.4 + * + * We must queue if + * + * a) The right edge of this frame exceeds the window + * b) There are packets in flight and we have a small segment + * [SWS avoidance and Nagle algorithm] + * (part of SWS is done on packetization) + * Minshall version sounds: there are no _small_ + * segments in flight. (tcp_nagle_check) + * c) We have too many packets 'in flight' + * + * Don't use the nagle rule for urgent data (or + * for the final FIN -DaveM). + * + * Also, Nagle rule does not apply to frames, which + * sit in the middle of queue (they have no chances + * to get new data) and if room at tail of skb is + * not enough to save something seriously (<32 for now). + */ + + /* Don't be strict about the congestion window for the + * final FIN frame. -DaveM + */ + return (((nonagle&TCP_NAGLE_PUSH) || tp->urg_mode + || !tcp_nagle_check(tp, skb, cur_mss, nonagle)) && + (((tcp_packets_in_flight(tp) + (pkts-1)) < tp->snd_cwnd) || + (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)) && + !after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd)); +} + +static inline int tcp_skb_is_last(const struct sock *sk, + const struct sk_buff *skb) +{ + return skb->next == (struct sk_buff *)&sk->sk_write_queue; +} + +/* Push out any pending frames which were held back due to + * TCP_CORK or attempt at coalescing tiny packets. + * The socket must be locked by the caller. + */ +void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, + unsigned cur_mss, int nonagle) +{ + struct sk_buff *skb = sk->sk_send_head; + + if (skb) { + if (!tcp_skb_is_last(sk, skb)) + nonagle = TCP_NAGLE_PUSH; + if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || + tcp_write_xmit(sk, nonagle)) + tcp_check_probe_timer(sk, tp); + } + tcp_cwnd_validate(sk, tp); +} + +int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) +{ + struct sk_buff *skb = sk->sk_send_head; + + return (skb && + tcp_snd_test(sk, skb, tcp_current_mss(sk, 1), + (tcp_skb_is_last(sk, skb) ? + TCP_NAGLE_PUSH : + tp->nonagle))); +} + + /* Send _single_ skb sitting at the send head. This function requires * true push pending frames to setup probe timer etc. */ @@ -440,27 +569,6 @@ void tcp_push_one(struct sock *sk, unsig } } -void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (skb->len <= tp->mss_cache_std || - !(sk->sk_route_caps & NETIF_F_TSO)) { - /* Avoid the costly divide in the normal - * non-TSO case. - */ - skb_shinfo(skb)->tso_segs = 1; - skb_shinfo(skb)->tso_size = 0; - } else { - unsigned int factor; - - factor = skb->len + (tp->mss_cache_std - 1); - factor /= tp->mss_cache_std; - skb_shinfo(skb)->tso_segs = factor; - skb_shinfo(skb)->tso_size = tp->mss_cache_std; - } -} - /* Function to create two new TCP segments. Shrinks the given segment * to the specified size and appends a new segment with the rest of the * packet to the list. This won't be called frequently, I hope. From davem@davemloft.net Mon Jun 6 21:20:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:20:29 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574KQXq007956 for ; Mon, 6 Jun 2005 21:20:26 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVYg-0000hr-Ep; Mon, 06 Jun 2005 21:19:22 -0700 Date: Mon, 06 Jun 2005 21:18:52 -0700 (PDT) Message-Id: <20050606.211852.55512928.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 4/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2165 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 2053 Lines: 61 [TCP]: Move __tcp_data_snd_check into tcp_output.c It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller bdbf09522de5be3ada129dceaa3ad9da9be078bc (from cba5d690f46699d37df7dc087247d1f7c7155692) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -945,6 +945,7 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ extern int tcp_write_xmit(struct sock *, int nonagle); +extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, unsigned cur_mss, int nonagle); extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3975,16 +3975,6 @@ static inline void tcp_check_space(struc } } -static void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || - tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tp->nonagle)) - tcp_check_probe_timer(sk, tp); -} - static __inline__ void tcp_data_snd_check(struct sock *sk) { struct sk_buff *skb = sk->sk_send_head; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -536,6 +536,16 @@ void __tcp_push_pending_frames(struct so tcp_cwnd_validate(sk, tp); } +void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || + tcp_packets_in_flight(tp) >= tp->snd_cwnd || + tcp_write_xmit(sk, tp->nonagle)) + tcp_check_probe_timer(sk, tp); +} + int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) { struct sk_buff *skb = sk->sk_send_head; From davem@davemloft.net Mon Jun 6 21:21:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:21:27 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574LNXq008492 for ; Mon, 6 Jun 2005 21:21:23 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVZY-0000i1-I9; Mon, 06 Jun 2005 21:20:16 -0700 Date: Mon, 06 Jun 2005 21:19:46 -0700 (PDT) Message-Id: <20050606.211946.21590576.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 5/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2166 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 658 Lines: 21 [TCP]: Add missing skb_header_release() call to tcp_fragment(). When we add any new packet to the TCP socket write queue, we must call skb_header_release() on it in order for the TSO sharing checks in the drivers to work. Signed-off-by: David S. Miller 79eb6b25499ed5470cb7b20428c435288fcb3502 (from bdbf09522de5be3ada129dceaa3ad9da9be078bc) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -660,6 +660,7 @@ static int tcp_fragment(struct sock *sk, } /* Link BUFF into the send queue. */ + skb_header_release(buff); __skb_append(skb, buff); return 0; From davem@davemloft.net Mon Jun 6 21:22:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:22:17 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574MDXq009056 for ; Mon, 6 Jun 2005 21:22:13 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVaQ-0000iW-2Q; Mon, 06 Jun 2005 21:21:10 -0700 Date: Mon, 06 Jun 2005 21:20:39 -0700 (PDT) Message-Id: <20050606.212039.48802169.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 6/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2167 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 5503 Lines: 173 [TCP]: Kill extra cwnd validate in __tcp_push_pending_frames(). The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller ae083bd3447865cbaf0996a69ba03807fd9fce01 (from 79eb6b25499ed5470cb7b20428c435288fcb3502) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -944,7 +944,6 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ -extern int tcp_write_xmit(struct sock *, int nonagle); extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, unsigned cur_mss, int nonagle); @@ -964,6 +963,9 @@ extern void tcp_push_one(struct sock *, extern void tcp_send_ack(struct sock *sk); extern void tcp_send_delayed_ack(struct sock *sk); +/* tcp_input.c */ +extern void tcp_cwnd_application_limited(struct sock *sk); + /* tcp_timer.c */ extern void tcp_init_xmit_timers(struct sock *); extern void tcp_clear_xmit_timers(struct sock *); @@ -1339,28 +1341,6 @@ static inline void tcp_sync_left_out(str tp->left_out = tp->sacked_out + tp->lost_out; } -extern void tcp_cwnd_application_limited(struct sock *sk); - -/* Congestion window validation. (RFC2861) */ - -static inline void tcp_cwnd_validate(struct sock *sk, struct tcp_sock *tp) -{ - __u32 packets_out = tp->packets_out; - - if (packets_out >= tp->snd_cwnd) { - /* Network is feed fully. */ - tp->snd_cwnd_used = 0; - tp->snd_cwnd_stamp = tcp_time_stamp; - } else { - /* Network starves. */ - if (tp->packets_out > tp->snd_cwnd_used) - tp->snd_cwnd_used = tp->packets_out; - - if ((s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= tp->rto) - tcp_cwnd_application_limited(sk); - } -} - /* Set slow start threshould and cwnd not falling to slow start */ static inline void __tcp_enter_cwr(struct tcp_sock *tp) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -517,35 +517,6 @@ static inline int tcp_skb_is_last(const return skb->next == (struct sk_buff *)&sk->sk_write_queue; } -/* Push out any pending frames which were held back due to - * TCP_CORK or attempt at coalescing tiny packets. - * The socket must be locked by the caller. - */ -void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, - unsigned cur_mss, int nonagle) -{ - struct sk_buff *skb = sk->sk_send_head; - - if (skb) { - if (!tcp_skb_is_last(sk, skb)) - nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, nonagle)) - tcp_check_probe_timer(sk, tp); - } - tcp_cwnd_validate(sk, tp); -} - -void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || - tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tp->nonagle)) - tcp_check_probe_timer(sk, tp); -} - int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp) { struct sk_buff *skb = sk->sk_send_head; @@ -846,6 +817,26 @@ unsigned int tcp_current_mss(struct sock return mss_now; } +/* Congestion window validation. (RFC2861) */ + +static inline void tcp_cwnd_validate(struct sock *sk, struct tcp_sock *tp) +{ + __u32 packets_out = tp->packets_out; + + if (packets_out >= tp->snd_cwnd) { + /* Network is feed fully. */ + tp->snd_cwnd_used = 0; + tp->snd_cwnd_stamp = tcp_time_stamp; + } else { + /* Network starves. */ + if (tp->packets_out > tp->snd_cwnd_used) + tp->snd_cwnd_used = tp->packets_out; + + if ((s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= tp->rto) + tcp_cwnd_application_limited(sk); + } +} + /* This routine writes packets to the network. It advances the * send_head. This happens as incoming acks open up the remote * window for us. @@ -853,7 +844,7 @@ unsigned int tcp_current_mss(struct sock * Returns 1, if no segments are in flight and we have queued segments, but * cannot send anything now because of SWS or another problem. */ -int tcp_write_xmit(struct sock *sk, int nonagle) +static int tcp_write_xmit(struct sock *sk, int nonagle) { struct tcp_sock *tp = tcp_sk(sk); unsigned int mss_now; @@ -906,6 +897,34 @@ int tcp_write_xmit(struct sock *sk, int return 0; } +/* Push out any pending frames which were held back due to + * TCP_CORK or attempt at coalescing tiny packets. + * The socket must be locked by the caller. + */ +void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, + unsigned cur_mss, int nonagle) +{ + struct sk_buff *skb = sk->sk_send_head; + + if (skb) { + if (!tcp_skb_is_last(sk, skb)) + nonagle = TCP_NAGLE_PUSH; + if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || + tcp_write_xmit(sk, nonagle)) + tcp_check_probe_timer(sk, tp); + } +} + +void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || + tcp_packets_in_flight(tp) >= tp->snd_cwnd || + tcp_write_xmit(sk, tp->nonagle)) + tcp_check_probe_timer(sk, tp); +} + /* This function returns the amount that we can raise the * usable window based on the following constraints * From davem@davemloft.net Mon Jun 6 21:23:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:23:12 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574N9Xq009621 for ; Mon, 6 Jun 2005 21:23:09 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVbJ-0000ih-Lc; Mon, 06 Jun 2005 21:22:05 -0700 Date: Mon, 06 Jun 2005 21:21:35 -0700 (PDT) Message-Id: <20050606.212135.27954595.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 7/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2168 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 3001 Lines: 101 [TCP]: tcp_write_xmit() tabbing cleanup Put the main basic block of work at the top-level of tabbing, and mark the TCP_CLOSE test with unlikely(). Signed-off-by: David S. Miller b8d892e4dc753d796e80da6e17f2a88aede0695e (from ae083bd3447865cbaf0996a69ba03807fd9fce01) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -847,54 +847,54 @@ static inline void tcp_cwnd_validate(str static int tcp_write_xmit(struct sock *sk, int nonagle) { struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb; unsigned int mss_now; + int sent_pkts; /* If we are closed, the bytes will have to remain here. * In time closedown will finish, we empty the write queue and all * will be happy. */ - if (sk->sk_state != TCP_CLOSE) { - struct sk_buff *skb; - int sent_pkts = 0; + if (unlikely(sk->sk_state == TCP_CLOSE)) + return 0; - /* Account for SACKS, we may need to fragment due to this. - * It is just like the real MSS changing on us midstream. - * We also handle things correctly when the user adds some - * IP options mid-stream. Silly to do, but cover it. - */ - mss_now = tcp_current_mss(sk, 1); - while ((skb = sk->sk_send_head) && - tcp_snd_test(sk, skb, mss_now, - tcp_skb_is_last(sk, skb) ? nonagle : - TCP_NAGLE_PUSH)) { - if (skb->len > mss_now) { - if (tcp_fragment(sk, skb, mss_now)) - break; - } - - TCP_SKB_CB(skb)->when = tcp_time_stamp; - tcp_tso_set_push(skb); - if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC))) + /* Account for SACKS, we may need to fragment due to this. + * It is just like the real MSS changing on us midstream. + * We also handle things correctly when the user adds some + * IP options mid-stream. Silly to do, but cover it. + */ + mss_now = tcp_current_mss(sk, 1); + sent_pkts = 0; + while ((skb = sk->sk_send_head) && + tcp_snd_test(sk, skb, mss_now, + tcp_skb_is_last(sk, skb) ? nonagle : + TCP_NAGLE_PUSH)) { + if (skb->len > mss_now) { + if (tcp_fragment(sk, skb, mss_now)) break; + } - /* Advance the send_head. This one is sent out. - * This call will increment packets_out. - */ - update_send_head(sk, tp, skb); + TCP_SKB_CB(skb)->when = tcp_time_stamp; + tcp_tso_set_push(skb); + if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC))) + break; - tcp_minshall_update(tp, mss_now, skb); - sent_pkts = 1; - } + /* Advance the send_head. This one is sent out. + * This call will increment packets_out. + */ + update_send_head(sk, tp, skb); - if (sent_pkts) { - tcp_cwnd_validate(sk, tp); - return 0; - } + tcp_minshall_update(tp, mss_now, skb); + sent_pkts = 1; + } - return !tp->packets_out && sk->sk_send_head; + if (sent_pkts) { + tcp_cwnd_validate(sk, tp); + return 0; } - return 0; + + return !tp->packets_out && sk->sk_send_head; } /* Push out any pending frames which were held back due to From davem@davemloft.net Mon Jun 6 21:23:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:24:00 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574NvXq010147 for ; Mon, 6 Jun 2005 21:23:57 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVc5-0000it-8E; Mon, 06 Jun 2005 21:22:53 -0700 Date: Mon, 06 Jun 2005 21:22:23 -0700 (PDT) Message-Id: <20050606.212223.74566928.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 8/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2169 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 3090 Lines: 82 [TCP]: Fix redundant calculations of tcp_current_mss() tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller f22c7890049ef8c51b0cdcc5d7e0cd06333de6b0 (from b8d892e4dc753d796e80da6e17f2a88aede0695e) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -946,7 +946,7 @@ extern __u32 cookie_v4_init_sequence(str extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, - unsigned cur_mss, int nonagle); + unsigned int cur_mss, int nonagle); extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); extern int tcp_retransmit_skb(struct sock *, struct sk_buff *); extern void tcp_xmit_retransmit_queue(struct sock *); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -844,11 +844,10 @@ static inline void tcp_cwnd_validate(str * Returns 1, if no segments are in flight and we have queued segments, but * cannot send anything now because of SWS or another problem. */ -static int tcp_write_xmit(struct sock *sk, int nonagle) +static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; - unsigned int mss_now; int sent_pkts; /* If we are closed, the bytes will have to remain here. @@ -858,13 +857,6 @@ static int tcp_write_xmit(struct sock *s if (unlikely(sk->sk_state == TCP_CLOSE)) return 0; - - /* Account for SACKS, we may need to fragment due to this. - * It is just like the real MSS changing on us midstream. - * We also handle things correctly when the user adds some - * IP options mid-stream. Silly to do, but cover it. - */ - mss_now = tcp_current_mss(sk, 1); sent_pkts = 0; while ((skb = sk->sk_send_head) && tcp_snd_test(sk, skb, mss_now, @@ -902,7 +894,7 @@ static int tcp_write_xmit(struct sock *s * The socket must be locked by the caller. */ void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, - unsigned cur_mss, int nonagle) + unsigned int cur_mss, int nonagle) { struct sk_buff *skb = sk->sk_send_head; @@ -910,7 +902,7 @@ void __tcp_push_pending_frames(struct so if (!tcp_skb_is_last(sk, skb)) nonagle = TCP_NAGLE_PUSH; if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, nonagle)) + tcp_write_xmit(sk, cur_mss, nonagle)) tcp_check_probe_timer(sk, tp); } } @@ -921,7 +913,7 @@ void __tcp_data_snd_check(struct sock *s if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tp->nonagle)) + tcp_write_xmit(sk, tcp_current_mss(sk, 1), tp->nonagle)) tcp_check_probe_timer(sk, tp); } From davem@davemloft.net Mon Jun 6 21:25:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:25:12 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574P5Xq010860 for ; Mon, 6 Jun 2005 21:25:07 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfVd7-0000jA-7T; Mon, 06 Jun 2005 21:23:57 -0700 Date: Mon, 06 Jun 2005 21:23:27 -0700 (PDT) Message-Id: <20050606.212327.18313062.davem@davemloft.net> To: netdev@oss.sgi.com Cc: herbert@gondor.apana.org.au, jheffner@psc.edu Subject: [PATCH 9/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> References: <20050606.210846.07641049.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2170 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 3948 Lines: 126 [TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling. 'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller 45d0377c7d18e1a036b0a1f96788a998dccf73cf (from f22c7890049ef8c51b0cdcc5d7e0cd06333de6b0) diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -944,7 +944,6 @@ extern __u32 cookie_v4_init_sequence(str /* tcp_output.c */ -extern void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb); extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp, unsigned int cur_mss, int nonagle); extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3975,12 +3975,9 @@ static inline void tcp_check_space(struc } } -static __inline__ void tcp_data_snd_check(struct sock *sk) +static __inline__ void tcp_data_snd_check(struct sock *sk, struct tcp_sock *tp) { - struct sk_buff *skb = sk->sk_send_head; - - if (skb != NULL) - __tcp_data_snd_check(sk, skb); + tcp_push_pending_frames(sk, tp); tcp_check_space(sk); } @@ -4274,7 +4271,7 @@ int tcp_rcv_established(struct sock *sk, */ tcp_ack(sk, skb, 0); __kfree_skb(skb); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); return 0; } else { /* Header too small */ TCP_INC_STATS_BH(TCP_MIB_INERRS); @@ -4340,7 +4337,7 @@ int tcp_rcv_established(struct sock *sk, if (TCP_SKB_CB(skb)->ack_seq != tp->snd_una) { /* Well, only one small jumplet in fast path... */ tcp_ack(sk, skb, FLAG_DATA); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); if (!tcp_ack_scheduled(tp)) goto no_ack; } @@ -4418,7 +4415,7 @@ step5: /* step 7: process the segment text */ tcp_data_queue(sk, skb); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); tcp_ack_snd_check(sk); return 0; @@ -4732,7 +4729,7 @@ int tcp_rcv_state_process(struct sock *s /* Do step6 onward by hand. */ tcp_urg(sk, skb, th); __kfree_skb(skb); - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); return 0; } @@ -4921,7 +4918,7 @@ int tcp_rcv_state_process(struct sock *s /* tcp_data could move socket to TIME-WAIT */ if (sk->sk_state != TCP_CLOSE) { - tcp_data_snd_check(sk); + tcp_data_snd_check(sk, tp); tcp_ack_snd_check(sk); } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -899,24 +899,11 @@ void __tcp_push_pending_frames(struct so struct sk_buff *skb = sk->sk_send_head; if (skb) { - if (!tcp_skb_is_last(sk, skb)) - nonagle = TCP_NAGLE_PUSH; - if (!tcp_snd_test(sk, skb, cur_mss, nonagle) || - tcp_write_xmit(sk, cur_mss, nonagle)) + if (tcp_write_xmit(sk, cur_mss, nonagle)) tcp_check_probe_timer(sk, tp); } } -void __tcp_data_snd_check(struct sock *sk, struct sk_buff *skb) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd) || - tcp_packets_in_flight(tp) >= tp->snd_cwnd || - tcp_write_xmit(sk, tcp_current_mss(sk, 1), tp->nonagle)) - tcp_check_probe_timer(sk, tp); -} - /* This function returns the amount that we can raise the * usable window based on the following constraints * From shemminger@osdl.org Mon Jun 6 21:54:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:54:45 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574sPXq012420 for ; Mon, 6 Jun 2005 21:54:26 -0700 Received: from [192.168.0.106] (063-170-215-071.dslnorthwest.net [63.170.215.71]) (authenticated bits=0) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j574qdjA017561 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 6 Jun 2005 21:52:42 -0700 Message-ID: <42A5284C.3060808@osdl.org> Date: Mon, 06 Jun 2005 21:53:32 -0700 From: Stephen Hemminger User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Mitch Williams CC: "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2171 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 177 Lines: 3 I noticed that the tg3 driver copies packets less than a certain threshold to a new buffer, but e1000 always passes the big buffer up the stack. Could this be having an impact? From shemminger@osdl.org Mon Jun 6 21:56:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 21:56:40 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j574ubXq012885 for ; Mon, 6 Jun 2005 21:56:37 -0700 Received: from [192.168.0.106] (063-170-215-071.dslnorthwest.net [63.170.215.71]) (authenticated bits=0) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j574tNjA017827 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 6 Jun 2005 21:55:24 -0700 Message-ID: <42A528F0.2090208@osdl.org> Date: Mon, 06 Jun 2005 21:56:16 -0700 From: Stephen Hemminger User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com, herbert@gondor.apana.org.au, jheffner@psc.edu Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO References: <20050606.210846.07641049.davem@davemloft.net> In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2172 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 169 Lines: 3 I'll merge these with the TCP infrastructure stuff and send it off to Andrew. Actually, it is more of fix the TCP infrastructure to match TSO + rc6 but you get the ida. From yoshfuji@linux-ipv6.org Mon Jun 6 22:20:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 22:20:37 -0700 (PDT) Received: from yue.st-paulia.net ([203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j575KRXq014617 for ; Mon, 6 Jun 2005 22:20:28 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id C32F533CC2; Tue, 7 Jun 2005 14:19:22 +0900 (JST) Date: Tue, 07 Jun 2005 14:19:22 +0900 (JST) Message-Id: <20050607.141922.65612976.yoshfuji@linux-ipv6.org> To: dlstevens@us.ibm.com Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: IPV6 RFC3542 compliance [PATCH] From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2173 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 1573 Lines: 37 In article (at Mon, 6 Jun 2005 13:48:26 -0600), David Stevens says: > I've been looking at RFC 3542 (Advanced Sockets API) compliance, > and found the following: > > ("x" is one of {PKTINFO, HOPLIMIT, RTHDR, DSTOPTS, TCLASS }) Well, this breaks API. Please rename old options, say: IPV6_PKTINFO => IPV6_2292PKTINFO IPV6_HOPLIMIT => IPV6_2292HOPLIMI IPV6_RTHDR => IPV6_2292RTHDR IPV6_DSTOPTS => IPV6_2292DSTOPTS And, add allocate new values for 2292bis options like: #define IPV6_RECVPKTINFO 48 /* RFC2292bis */ #define IPV6_PKTINFO 49 /* RFC2292bis */ #define IPV6_RECVHOPLIMIT 50 /* RFC2292bis */ #define IPV6_HOPLIMIT 51 /* RFC2292bis */ #define IPV6_RECVRTHDR 52 /* RFC2292bis */ #define IPV6_RTHDR 53 /* RFC2292bis */ #define IPV6_RECVHOPOPTS 54 /* RFC2292bis */ #define IPV6_HOPOPTS 55 /* RFC2292bis */ #define IPV6_RECVDSTOPTS 56 /* RFC2292bis */ #define IPV6_DSTOPTS 57 /* RFC2292bis */ #define IPV6_RECVRTHDRDSTOPTS 58 /* RFC2292bis */ #define IPV6_RTHDRDSTOPTS 59 /* RFC2292bis */ (This is what KAME people did, and I believe that it is the best way to keep backward compatibility.) -- YOSHIFUJI Hideaki @ USAGI Project GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From rdunlap@xenotime.net Mon Jun 6 22:47:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 22:47:58 -0700 (PDT) Received: from titan.genwebhost.com (titan.genwebhost.com [209.9.226.66]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j575ltXq016606 for ; Mon, 6 Jun 2005 22:47:55 -0700 Received: from pool-71-111-140-4.ptldor.dsl-w.verizon.net ([71.111.140.4] helo=midway.verizon.net) by titan.genwebhost.com with esmtpa (Exim 4.51) id 1DfWvJ-0003F1-Dq; Tue, 07 Jun 2005 01:46:49 -0400 Date: Mon, 6 Jun 2005 22:46:46 -0700 From: randy_dunlap To: Phil Oester Cc: herbert@gondor.apana.org.au, netdev@oss.sgi.com, akpm@osdl.org Subject: Re: 2.6.12-rcx networking oops Message-Id: <20050606224646.24af30ff.rdunlap@xenotime.net> In-Reply-To: <20050601170058.GA20112@linuxace.com> References: <20050531224012.GA16789@linuxace.com> <20050601054955.GA2625@gondor.apana.org.au> <20050601170058.GA20112@linuxace.com> Organization: YPO4 X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - titan.genwebhost.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - xenotime.net X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2174 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rdunlap@xenotime.net Precedence: bulk X-list: netdev Content-Length: 3237 Lines: 90 On Wed, 1 Jun 2005 10:00:58 -0700 Phil Oester wrote: | On Wed, Jun 01, 2005 at 03:49:55PM +1000, Herbert Xu wrote: | > This looks like stack overflow. %esi is meant to be "res" which is | > a local variable. As you can see, it's pointing below %esp and | > threadinfo. Agreed, the stack trace is suspicious. (more below) | Ok, so I enabled DEBUG_STACKOVERFLOW in addition to CONFIG_DEBUG_SLAB | and CONFIG_DEBUG_PAGEALLOC, and got the below today...so maybe it | is a slab issue? | | 0xc0238cdd is in dst_alloc (net/core/dst.c:124). | 119 if (ops->gc && atomic_read(&ops->entries) > ops->gc_thresh) { | 120 if (ops->gc()) | 121 return NULL; | 122 } | 123 dst = kmem_cache_alloc(ops->kmem_cachep, SLAB_ATOMIC); | | 0xc013912b is at mm/slab.c:3077. | 3072 size = kmem_cache_size(c); | 3073 local_irq_restore(flags); | 3074 } | 3075 | 3076 return size; | 3077 } | | | Phil This is with NAPI, right? Would it make sense to try it with that disabled? (I don't recall you saying it's NAPI, but the e1000 functions seem to indicate that.) and how about enabling CONFIG_FRAME_POINTER ? | invalid operand: 0000 [#1] | SMP DEBUG_PAGEALLOC | CPU: 1 | EIP: 0060:[] Not tainted VLI | EFLAGS: 00016292 (2.6.12-rc5-git5) | EIP is at ksize+0x7b/0x100 ksize() isn't that large. In my build this offset and the Code: 8d 05 0c.... (below) point to the lock slow paths in mm/slab.c (fwiw). | eax: c0238cdd ebx: f7ba9c20 ecx: f7babf78 edx: dcc59000 | esi: 00000020 edi: 0000e3ba ebp: c0338d98 esp: c0338d88 | ds: 007b es: 007b ss: 0068 | Process swapper (pid: 0, threadinfo=c0338000 task=c1989b00) | Stack: 00000000 04000000 c02d1a00 ffffff97 c0338db0 c0238cdd c0338e58 04000000 | 00000000 ffffff97 c0338eb4 c0245cb7 00000002 f7b01000 c0338dec c0338df0 | f7318ef8 00000000 00000000 00000001 f72dbef8 0000a704 103c243b f27ceec0 | Call Trace: | [] show_stack+0x7a/0x90 | [] show_registers+0x14d/0x1b0 | [] die+0xf9/0x180 | [] do_trap+0xa0/0xb0 | [] do_invalid_op+0xa9/0xc0 | [] error_code+0x4f/0x54 | [] dst_alloc+0x2d/0xa0 | [] ip_route_input_slow+0x4a7/0x840 | [] ip_route_input+0x9a/0x160 | [] ip_rcv+0x3b0/0x4d0 | [] netif_receive_skb+0x13a/0x1a0 | [] e1000_clean_rx_irq+0x180/0x4d0 | [] e1000_clean+0x40/0xe0 | [] net_rx_action+0x90/0x130 | [] __do_softirq+0xd4/0xf0 | [] do_softirq+0x52/0x70 | ======================= | [] irq_exit+0x3a/0x40 | [] do_IRQ+0x68/0xa0 | [] common_interrupt+0x1a/0x20 | [] cpu_idle+0x7b/0x80 | [] start_secondary+0x73/0x90 | [<00000000>] stext+0x3feffd6c/0xc | [] 0xc198afb4 | Code: 8d 05 0c e2 34 c0 e8 e9 25 15 00 e9 96 dd ff ff 8d 05 0c e2 34 c0 e8 a9 25 15 00 e9 00 e2 ff | | ff 8d 05 0c e2 34 c0 e8 c9 25 15 00 23 e2 ff ff 8d 05 0c e2 34 c0 e8 89 25 15 00 e9 84 e2 ff ff | <0>Kernel panic - not syncing: Fatal exception in interrupt --- ~Randy From davem@davemloft.net Mon Jun 6 22:52:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 22:52:47 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j575qgXq017250 for ; Mon, 6 Jun 2005 22:52:42 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfWzt-0000pO-F1; Mon, 06 Jun 2005 22:51:33 -0700 Date: Mon, 06 Jun 2005 22:51:33 -0700 (PDT) Message-Id: <20050606.225133.74747961.davem@davemloft.net> To: shemminger@osdl.org Cc: netdev@oss.sgi.com, herbert@gondor.apana.org.au, jheffner@psc.edu Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <42A528F0.2090208@osdl.org> References: <20050606.210846.07641049.davem@davemloft.net> <42A528F0.2090208@osdl.org> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2175 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 476 Lines: 13 From: Stephen Hemminger Date: Mon, 06 Jun 2005 21:56:16 -0700 > I'll merge these with the TCP infrastructure stuff and > send it off to Andrew. Actually, it is more of fix the TCP > infrastructure to match TSO + rc6 but you get the ida. Probably not a good idea, it's %75 of the implementation of Super TSO and totally conflicts with the super TSO patch. Probably best to keep the existing Super TSO stuff in there until I'm done with this stuff. :) From dlstevens@us.ibm.com Mon Jun 6 23:26:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 23:26:43 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j576QYXq019318 for ; Mon, 6 Jun 2005 23:26:40 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e31.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j576PVua209622 for ; Tue, 7 Jun 2005 02:25:31 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j576PVes089470 for ; Tue, 7 Jun 2005 00:25:31 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j576PUmS014564 for ; Tue, 7 Jun 2005 00:25:31 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j576PUTn014561; Tue, 7 Jun 2005 00:25:30 -0600 In-Reply-To: <20050607.141922.65612976.yoshfuji@linux-ipv6.org> To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@davemloft.net, netdev@oss.sgi.com MIME-Version: 1.0 Subject: Re: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Mon, 6 Jun 2005 23:25:28 -0700 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/07/2005 00:25:30, Serialize complete at 06/07/2005 00:25:30 Content-Type: text/plain; charset="US-ASCII" X-archive-position: 2176 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1543 Lines: 35 RFC 3542 broke the API-- they've defined options with the same name, but different semantics. Binaries using the old numbers would not work, unless we return the old numbers in the control message types, but in the new API, those have to be different from the boolean option value (and equal to the sticky option value). And those same binaries would not work when recompiled, because the option names in the source would match the new numbers, but still have the old arguments-- an error to be detected at run-time, only. My guess is that existing use of these is pretty limited, so I'm not sure backward compatibility is worth it. If we wanted to get really ugly, we could use the size of the option value to determine what to do. Only two new ancillary message types are int-sized (TCLASS and HOPLIMIT). HOPLIMIT is not a valid socket option, (done with IPV6_UNICAST_HOPS instead) and TCLASS was not implemented at all-- not a problem. Then, in the receive processing, we'd have to return the old message type for programs using the sticky options as boolean, and the new message type otherwise. It's really ugly, but possible, I believe; then it would break RFC 3542 compliance only in not treating boolean-sized options as an error. But I think the better way is to fix programs that use these right away. A program that uses "IPV6_RTHDR" with a boolean argment is not portable (which is the whole point of having a common API). We shouldn't encourage it by making it continue to work. +-DLS From yoshfuji@linux-ipv6.org Mon Jun 6 23:34:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 23:35:00 -0700 (PDT) Received: from yue.st-paulia.net (yue.linux-ipv6.org [203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j576YvXq020084 for ; Mon, 6 Jun 2005 23:34:57 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id 52F0D33CC2; Tue, 7 Jun 2005 15:34:00 +0900 (JST) Date: Tue, 07 Jun 2005 15:33:59 +0900 (JST) Message-Id: <20050607.153359.82068814.yoshfuji@linux-ipv6.org> To: dlstevens@us.ibm.com Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: IPV6 RFC3542 compliance [PATCH] From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20050607.141922.65612976.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2177 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 551 Lines: 18 In article (at Mon, 6 Jun 2005 23:25:28 -0700), David Stevens says: > And those same binaries would not work when recompiled, > because the option names in the source would match the > new numbers, but still have the old arguments-- an error to > be detected at run-time, only. It is not good at all to break API at this moment (2.6.x). Portable applications do like this: #ifdef IPV6_RECVHOPOPTS // RFC2292bis #else // RFC2292 #endif --yoshfuji From dlstevens@us.ibm.com Mon Jun 6 23:36:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 23:36:55 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j576aqXq020526 for ; Mon, 6 Jun 2005 23:36:52 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j576ZnMK216086 for ; Tue, 7 Jun 2005 02:35:49 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j576ZnXR035016 for ; Tue, 7 Jun 2005 00:35:49 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j576ZnCF027282 for ; Tue, 7 Jun 2005 00:35:49 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j576Zn6x027273; Tue, 7 Jun 2005 00:35:49 -0600 In-Reply-To: To: David Stevens Cc: davem@davemloft.net, netdev@oss.sgi.com MIME-Version: 1.0 Subject: Re: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Mon, 6 Jun 2005 23:35:46 -0700 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/07/2005 00:35:49, Serialize complete at 06/07/2005 00:35:49 Content-Type: text/plain; charset="US-ASCII" X-archive-position: 2178 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 661 Lines: 20 PS - I should've said clearly; with the patch I submitted as-is, all old binaries should return EINVAL on the socket options that have changed. That's because all of those (except the new IPV6_TCLASS, which didn't exist before) have option arguments greater than int-size. Recompiling those programs will still result in the setsockopt() returning EINVAL, until the source is fixed to change the socket options to the IPV6_RECVx. sendmsg() and recvmsg() processing in old binaries should still work, as-is. So, with that patch, programs using the old names will give a strong indication of what needs fixing. +-DLS From dlstevens@us.ibm.com Mon Jun 6 23:51:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 06 Jun 2005 23:51:28 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j576pMXq031245 for ; Mon, 6 Jun 2005 23:51:25 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e31.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j576oJua258972 for ; Tue, 7 Jun 2005 02:50:19 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j576oJXR110864 for ; Tue, 7 Jun 2005 00:50:19 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j576oI7N008037 for ; Tue, 7 Jun 2005 00:50:19 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j576oIfs008034; Tue, 7 Jun 2005 00:50:18 -0600 In-Reply-To: <20050607.153359.82068814.yoshfuji@linux-ipv6.org> To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@davemloft.net, netdev@oss.sgi.com MIME-Version: 1.0 Subject: Re: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Mon, 6 Jun 2005 23:50:16 -0700 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/07/2005 00:50:17, Serialize complete at 06/07/2005 00:50:17 Content-Type: text/plain; charset="US-ASCII" X-archive-position: 2179 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 572 Lines: 26 > Portable applications do like this: > #ifdef IPV6_RECVHOPOPTS > // RFC2292bis > #else > // RFC2292 > #endif > --yoshfuji I don't understand. If they do this, they'll work already when recompiled (with the patch I sent), won't they? If they don't do this, old binaries will return EINVAL on the setsockopt() calls that have changed. And if they're going to edit the source, they can do #ifdefs as above and work again. How does it help to renumber? I can renumber, of course-- I just don't see how that does anything. +-DLS From yoshfuji@linux-ipv6.org Tue Jun 7 00:06:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 00:06:28 -0700 (PDT) Received: from yue.st-paulia.net (yue.linux-ipv6.org [203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5776HXq001763 for ; Tue, 7 Jun 2005 00:06:18 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id AA88E33CC2; Tue, 7 Jun 2005 16:05:22 +0900 (JST) Date: Tue, 07 Jun 2005 16:05:21 +0900 (JST) Message-Id: <20050607.160521.73986501.yoshfuji@linux-ipv6.org> To: dlstevens@us.ibm.com Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: IPV6 RFC3542 compliance [PATCH] From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20050607.153359.82068814.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2180 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 783 Lines: 28 In article (at Mon, 6 Jun 2005 23:50:16 -0700), David Stevens says: > > Portable applications do like this: > > > #ifdef IPV6_RECVHOPOPTS > > // RFC2292bis > > #else > > // RFC2292 > > #endif > > > --yoshfuji > > I don't understand. If they do this, they'll > work already when recompiled (with the patch > I sent), won't they? Yes (or they should do so before your favorite distro start shipping with new constants). > How does it help to renumber? I can renumber, > of course-- I just don't see how that does > anything. We can still keep old binaries if we renumber. This is important point. e.g. people, including myself, can keep using old binaries on new kernels. --yoshfuji From dlstevens@us.ibm.com Tue Jun 7 00:41:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 00:42:00 -0700 (PDT) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j577fuXq007425 for ; Tue, 7 Jun 2005 00:41:58 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j577er9q523116 for ; Tue, 7 Jun 2005 03:40:53 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j577eres183168 for ; Tue, 7 Jun 2005 01:40:53 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j577eqSR001269 for ; Tue, 7 Jun 2005 01:40:53 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j577eqtA001265; Tue, 7 Jun 2005 01:40:52 -0600 In-Reply-To: <20050607.160521.73986501.yoshfuji@linux-ipv6.org> To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@davemloft.net, netdev@oss.sgi.com MIME-Version: 1.0 Subject: Re: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Tue, 7 Jun 2005 00:40:28 -0700 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/07/2005 01:40:52, Serialize complete at 06/07/2005 01:40:52 Content-Type: text/plain; charset="US-ASCII" X-archive-position: 2181 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1848 Lines: 60 > We can still keep old binaries if we renumber. > This is important point. > e.g. people, including myself, can keep using old binaries on new kernels. > --yoshfuji But old binaries won't work with just that change (and making them work is independent of changing the numbers). For example, old binary: IPV6_RTHDR is value 5 it does: on=1; setsockopt(s, SOL_IPV6, 5, &on); and later a recvmsg() where it looks for cmsg_type == IPV6_RTHDR (5). In the new API, the equivalent: IPV6_RTHDR 728 IPV6_RECVRTHDR 729 old binary calls with "5", which you want to work, but returns cmsg_type "728" (app doesn't find a "5"). The boolean socket option in the new API cannot be equal to the cmsg_type, because IPV6_RTHDR and IPV6_RECVRTHDR do different things as socket options (and both are there). So, no old binary can work unless the kernel "knows" it's talking to an old binary, and it returns a different (wrong, under the new API) cmsg_type for that option. But the putcmsg() are done in receive processing, so you'd need a flag to tell you which you had, and a map for old and new cmsg_type's. But the number changes don't help here, because an old binary will call with argument size of int, a new binary will have an argument greater (barring bugs). So, you can tell without number changes, but you still have all the ugly code to return old and new data for old and new binaries. And if the caller doesn't change the source, it'll recompile fine but give incorrect results (EINVAL on the setsockopt call) when s/he gets the new definition of IPV6_RTHDR, but still calls it with a boolean argument value. I'm suggesting we bypass the ugly binary support and get EINVAL when run now; trivial source fix, and they have a working binary again. +-DLS From yoshfuji@linux-ipv6.org Tue Jun 7 00:48:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 00:48:49 -0700 (PDT) Received: from yue.st-paulia.net (yue.linux-ipv6.org [203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j577mlXq008280 for ; Tue, 7 Jun 2005 00:48:47 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id E999033CC2; Tue, 7 Jun 2005 16:47:51 +0900 (JST) Date: Tue, 07 Jun 2005 16:47:49 +0900 (JST) Message-Id: <20050607.164749.62298775.yoshfuji@linux-ipv6.org> To: dlstevens@us.ibm.com Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: IPV6 RFC3542 compliance [PATCH] From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20050607.160521.73986501.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2182 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 918 Lines: 32 In article (at Tue, 7 Jun 2005 00:40:28 -0700), David Stevens says: > > We can still keep old binaries if we renumber. > > This is important point. > > e.g. people, including myself, can keep using old binaries on new > kernels. : > But old binaries won't work with just that change > (and making them work is independent of changing > the numbers). > > For example, old binary: > > IPV6_RTHDR is value 5 > > it does: > on=1; setsockopt(s, SOL_IPV6, 5, &on); > and later a recvmsg() where it looks for > cmsg_type == IPV6_RTHDR (5). > > In the new API, the equivalent: > > IPV6_RTHDR 728 > IPV6_RECVRTHDR 729 > > old binary calls with "5", which you want > to work, but returns cmsg_type "728" (app doesn't > find a "5"). No, kernel should send 5, if application use old API, of course. --yoshfuji From yoshfuji@linux-ipv6.org Tue Jun 7 00:56:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 00:56:38 -0700 (PDT) Received: from yue.st-paulia.net (yue.linux-ipv6.org [203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j577uWXq013297 for ; Tue, 7 Jun 2005 00:56:32 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id 2085733CC2; Tue, 7 Jun 2005 16:55:37 +0900 (JST) Date: Tue, 07 Jun 2005 16:55:36 +0900 (JST) Message-Id: <20050607.165536.75463878.yoshfuji@linux-ipv6.org> To: dlstevens@us.ibm.com Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: IPV6 RFC3542 compliance [PATCH] From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20050607.164749.62298775.yoshfuji@linux-ipv6.org> References: <20050607.160521.73986501.yoshfuji@linux-ipv6.org> <20050607.164749.62298775.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-archive-position: 2183 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 738 Lines: 18 In article <20050607.164749.62298775.yoshfuji@linux-ipv6.org> (at Tue, 07 Jun 2005 16:47:49 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > No, kernel should send 5, if application use old API, of course. This can be implemented like this (based on codes from our repository): /* RFC2292bis */ if (np->rxopt.bits.rxhbh && opt->hop) { u8 *ptr = skb->nh.raw + opt->hop; put_cmsg(msg, SOL_IPV6, IPV6_HOPOPTS, (ptr[1]+1)<<3, ptr); } /* RFC2292 */ if (np->rxopt.bits.rxhbh2292 && opt->hop) { u8 *ptr = skb->nh.raw + opt->hop; put_cmsg(msg, SOL_IPV6, IPV6_2292HOPOPTS, (ptr[1]+1)<<3, ptr); } --yoshfuji From dlstevens@us.ibm.com Tue Jun 7 01:02:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 01:02:52 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5782nXq014214 for ; Tue, 7 Jun 2005 01:02:49 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j57814MK083620 for ; Tue, 7 Jun 2005 04:01:13 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j57814XR153206 for ; Tue, 7 Jun 2005 02:01:04 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j57813Pw017327 for ; Tue, 7 Jun 2005 02:01:04 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j57813jP017322; Tue, 7 Jun 2005 02:01:03 -0600 In-Reply-To: <20050607.164749.62298775.yoshfuji@linux-ipv6.org> To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org MIME-Version: 1.0 Subject: Re: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Tue, 7 Jun 2005 01:01:01 -0700 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/07/2005 02:01:03, Serialize complete at 06/07/2005 02:01:03 Content-Type: text/plain; charset="ISO-2022-JP" X-archive-position: 2184 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1093 Lines: 29 YOSHIFUJI Hideaki / $B5HF#1QL@(B wrote on 06/07/2005 12:47:49 AM: > No, kernel should send 5, if application use old API, of course. > --yoshfuji Ok, but this gets back to my point. If the program source doesn't have #ifdefs like you suggested (for example, if it was written before the new API existed), then it'll still have an error, but that error won't show up until the next time it's recompiled. So, an old binary will work fine, but recompiling it will get EINVAL on the setsockopt() calls. The old binary will work, the old source will compile, but the new binary will not work, and may not be found until much later. The two API's are fundamentally incompatible, because they have common names that do different things, and in the first, the boolean socket option and cmsg_type must be the same, in the second, they cannot be. I think it's better to break and fix any use of these right away instead of delaying the error until the next time some old binary is recompiled and run, don't you? +-DLS From dlstevens@us.ibm.com Tue Jun 7 01:05:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 01:05:26 -0700 (PDT) Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5785NXq014855 for ; Tue, 7 Jun 2005 01:05:24 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e31.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j5784Lua530508 for ; Tue, 7 Jun 2005 04:04:21 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5784Kes171200 for ; Tue, 7 Jun 2005 02:04:20 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j5784K4o024219 for ; Tue, 7 Jun 2005 02:04:20 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j5784KSk024208; Tue, 7 Jun 2005 02:04:20 -0600 In-Reply-To: <20050607.165536.75463878.yoshfuji@linux-ipv6.org> To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org MIME-Version: 1.0 Subject: Re: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Tue, 7 Jun 2005 01:04:17 -0700 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/07/2005 02:04:20, Serialize complete at 06/07/2005 02:04:20 Content-Type: text/plain; charset="ISO-2022-JP" X-archive-position: 2185 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1169 Lines: 36 YOSHIFUJI Hideaki / $B5HF#1QL@(B wrote on 06/07/2005 12:55:36 AM: > In article <20050607.164749.62298775.yoshfuji@linux-ipv6.org> (at Tue, 07 Jun > 2005 16:47:49 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B says: > > No, kernel should send 5, if application use old API, of course. > This can be implemented like this (based on codes from our repository): > /* RFC2292bis */ > if (np->rxopt.bits.rxhbh && opt->hop) { > u8 *ptr = skb->nh.raw + opt->hop; > put_cmsg(msg, SOL_IPV6, IPV6_HOPOPTS, (ptr[1]+1)<<3, ptr); > } > /* RFC2292 */ > if (np->rxopt.bits.rxhbh2292 && opt->hop) { > u8 *ptr = skb->nh.raw + opt->hop; > put_cmsg(msg, SOL_IPV6, IPV6_2292HOPOPTS, (ptr[1]+1)<<3, ptr); > } > --yoshfuji Sure, it's easy to do. But the application that's using it has broken source, and nobody will know until after it's recompiled. I'd just have a single flag for all, on the assumption that they're either using old API exclusively, or new. But, again, it leaves a land mine for the source bug in the application that you're allowing to still work. +-DLS From yoshfuji@linux-ipv6.org Tue Jun 7 01:15:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 01:15:22 -0700 (PDT) Received: from yue.st-paulia.net (yue.linux-ipv6.org [203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j578FJXq015721 for ; Tue, 7 Jun 2005 01:15:20 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id A547433CC2; Tue, 7 Jun 2005 17:14:24 +0900 (JST) Date: Tue, 07 Jun 2005 17:14:23 +0900 (JST) Message-Id: <20050607.171423.106079530.yoshfuji@linux-ipv6.org> To: dlstevens@us.ibm.com Cc: davem@davemloft.net, netdev@oss.sgi.com, yoshfuji@linux-ipv6.org Subject: Re: IPV6 RFC3542 compliance [PATCH] From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: References: <20050607.164749.62298775.yoshfuji@linux-ipv6.org> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2186 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 1162 Lines: 25 In article (at Tue, 7 Jun 2005 01:01:01 -0700), David Stevens says: > Ok, but this gets back to my point. If the program source > doesn't have #ifdefs like you suggested (for example, if > it was written before the new API existed), then it'll still > have an error, but that error won't show up until the next > time it's recompiled. So, an old binary will work fine, but > recompiling it will get EINVAL on the setsockopt() calls. > The old binary will work, the old source will compile, but > the new binary will not work, and may not be found until > much later. It is okay, we can warn that "you use old API; please fix that!" or something like that. (like SO_BSDCOMPAT or, neigh.XXX.base_reachable_time sysctl.) > I think it's better to break and fix any use of these > right away instead of delaying the error until the next > time some old binary is recompiled and run, don't you? No, I don't think so. Please do NOT break binary API in 2.6.x. I think it is okay in 2.7, but I still think that it is good to use new values for new semantics. --yoshfuji From jeroen@simonetti.nl Tue Jun 7 02:29:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 02:29:55 -0700 (PDT) Received: from services-04.netland.nl (mx1.netland.nl [217.170.32.72]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j579ToXq020363 for ; Tue, 7 Jun 2005 02:29:51 -0700 Received: from n010095.nbs.netland.nl (fw.office.netland.nl [217.170.32.40]) by services-04.netland.nl (Postfix) with ESMTP id 6811254007 for ; Tue, 7 Jun 2005 11:28:41 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by n010095.nbs.netland.nl (Postfix) with ESMTP id 18747A4ED for ; Tue, 7 Jun 2005 11:28:41 +0200 (CEST) Received: from n010095.nbs.netland.nl ([127.0.0.1]) by localhost (n010095.nbs.netland.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 27550-03 for ; Tue, 7 Jun 2005 11:28:40 +0200 (CEST) Received: from jeroens.office.netland.nl (jeroens.office.netland.nl [192.168.170.25]) by n010095.nbs.netland.nl (Postfix) with ESMTP id 43940A4EA for ; Tue, 7 Jun 2005 11:28:40 +0200 (CEST) Subject: [PATCH 1/1] sysctl configurable icmperror sourceaddress From: "J. Simonetti" To: netdev@oss.sgi.com Content-Type: text/plain Date: Tue, 07 Jun 2005 11:26:23 +0200 Message-Id: <1118136384.10479.15.camel@jeroens.office.netland.nl> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-4) Content-Transfer-Encoding: 7bit X-archive-position: 2187 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jeroen@simonetti.nl Precedence: bulk X-list: netdev Content-Length: 791 Lines: 21 This patch alows you to change the source address of icmp error messages. It applies cleanly to 2.6.11.11 and retains the default behaviour. In the old (default) behaviour icmp error messages are sent with the ip of the exiting interface. The new behaviour (when the sysctl variable is toggled on), it will send the message with the ip of the interface that received the packet that caused the icmp error. This is the behaviour network administrators will expect from a router. It makes debugging complicated network layouts much easier. Also, all 'vendor routers' I know of have the later behaviour. Regards, Jeroen Simonetti -- "Absolutely nothing should be concluded from these figures except that no conclusion can be drawn from them." (By Joseph L. Brothers, Linux/PowerPC Project) From jeroen@simonetti.nl Tue Jun 7 02:44:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 02:44:13 -0700 (PDT) Received: from services-04.netland.nl (mx1.netland.nl [217.170.32.72]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j579iAXq021553 for ; Tue, 7 Jun 2005 02:44:10 -0700 Received: from n010095.nbs.netland.nl (fw.office.netland.nl [217.170.32.40]) by services-04.netland.nl (Postfix) with ESMTP id 74F7854010 for ; Tue, 7 Jun 2005 11:43:07 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by n010095.nbs.netland.nl (Postfix) with ESMTP id 6D34FA4ED for ; Tue, 7 Jun 2005 11:43:07 +0200 (CEST) Received: from n010095.nbs.netland.nl ([127.0.0.1]) by localhost (n010095.nbs.netland.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 28483-04-5 for ; Tue, 7 Jun 2005 11:43:06 +0200 (CEST) Received: from jeroens.office.netland.nl (jeroens.office.netland.nl [192.168.170.25]) by n010095.nbs.netland.nl (Postfix) with ESMTP id 787CCA4EA for ; Tue, 7 Jun 2005 11:43:06 +0200 (CEST) Subject: Re: [PATCH 1/1] sysctl configurable icmperror sourceaddress From: "J. Simonetti" To: netdev@oss.sgi.com In-Reply-To: <1118136384.10479.15.camel@jeroens.office.netland.nl> References: <1118136384.10479.15.camel@jeroens.office.netland.nl> Content-Type: multipart/mixed; boundary="=-XV4zojviW1nuiDFTTQao" Date: Tue, 07 Jun 2005 11:40:50 +0200 Message-Id: <1118137250.10479.17.camel@jeroens.office.netland.nl> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-4) X-archive-position: 2188 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jeroen@simonetti.nl Precedence: bulk X-list: netdev Content-Length: 2477 Lines: 85 --=-XV4zojviW1nuiDFTTQao Content-Type: text/plain Content-Transfer-Encoding: 7bit On Tue, 2005-06-07 at 11:26 +0200, J. Simonetti wrote: > This patch alows you to change the source address of icmp error > messages. It applies cleanly to 2.6.11.11 and retains the default > behaviour. I swear I had attached it... really... ;) Jeroen Simonetti -- character density, n.: The number of very weird people in the office. --=-XV4zojviW1nuiDFTTQao Content-Disposition: attachment; filename=linux-2.6.11.11-icmperrors.patch Content-Type: text/x-patch; name=linux-2.6.11.11-icmperrors.patch; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit --- include/linux/sysctl.h.orig 2004-12-24 22:34:58.000000000 +0100 +++ include/linux/sysctl.h 2005-06-07 10:16:39.730585288 +0200 @@ -345,6 +345,7 @@ NET_TCP_MODERATE_RCVBUF=106, NET_TCP_TSO_WIN_DIVISOR=107, NET_TCP_BIC_BETA=108, + NET_IPV4_ICMP_ERRORS_USE_INBOUND_IFADDR=109, }; enum { --- net/ipv4/icmp.c.orig 2004-12-24 22:35:28.000000000 +0100 +++ net/ipv4/icmp.c 2005-06-07 10:15:42.645263576 +0200 @@ -207,6 +207,7 @@ int sysctl_icmp_ratelimit = 1 * HZ; int sysctl_icmp_ratemask = 0x1818; +int sysctl_icmp_errors_use_inbound_ifaddr = 0; /* * ICMP control array. This specifies what to do with each ICMP. @@ -511,8 +512,12 @@ */ saddr = iph->daddr; - if (!(rt->rt_flags & RTCF_LOCAL)) - saddr = 0; + if (!(rt->rt_flags & RTCF_LOCAL)) { + if(sysctl_icmp_errors_use_inbound_ifaddr) + saddr = inet_select_addr(skb_in->dev, 0, RT_SCOPE_LINK); + else + saddr = 0; + } tos = icmp_pointers[type].error ? ((iph->tos & IPTOS_TOS_MASK) | IPTOS_PREC_INTERNETCONTROL) : --- net/ipv4/sysctl_net_ipv4.c.orig 2004-12-24 22:35:23.000000000 +0100 +++ net/ipv4/sysctl_net_ipv4.c 2005-06-07 10:19:44.538490216 +0200 @@ -23,6 +23,7 @@ extern int sysctl_icmp_echo_ignore_all; extern int sysctl_icmp_echo_ignore_broadcasts; extern int sysctl_icmp_ignore_bogus_error_responses; +extern int sysctl_icmp_errors_use_inbound_ifaddr; /* From ip_fragment.c */ extern int sysctl_ipfrag_low_thresh; @@ -396,6 +397,14 @@ .proc_handler = &proc_dointvec }, { + .ctl_name = NET_IPV4_ICMP_ERRORS_USE_INBOUND_IFADDR, + .procname = "icmp_errors_use_inbound_ifaddr", + .data = &sysctl_icmp_errors_use_inbound_ifaddr, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec + }, + { .ctl_name = NET_IPV4_ROUTE, .procname = "route", .maxlen = 0, --=-XV4zojviW1nuiDFTTQao-- From gnb@melbourne.sgi.com Tue Jun 7 03:12:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 03:12:27 -0700 (PDT) Received: from larry.melbourne.sgi.com (mverd138.asia.info.net [61.14.31.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j57ACNXq026263 for ; Tue, 7 Jun 2005 03:12:24 -0700 Received: from [134.14.55.176] (hole.melbourne.sgi.com [134.14.55.176]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA22815; Tue, 7 Jun 2005 20:11:12 +1000 Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 From: Greg Banks To: "David S. Miller" Cc: Linux Network Development list , mchan@broadcom.com In-Reply-To: <20050603.122558.88474819.davem@davemloft.net> References: <20050603.122558.88474819.davem@davemloft.net> Content-Type: text/plain Organization: Silicon Graphics Inc, Australian Software Group. Message-Id: <1118139072.2198.119.camel@hole.melbourne.sgi.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6-1mdk Date: Tue, 07 Jun 2005 20:11:12 +1000 Content-Transfer-Encoding: 7bit X-archive-position: 2189 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gnb@melbourne.sgi.com Precedence: bulk X-list: netdev Content-Length: 694 Lines: 23 On Sat, 2005-06-04 at 05:25, David S. Miller wrote: > This version incorporates two bug fixes from Michael. > > 1) Check the mailbox register for 0x1 while polling on the COMPLETE > state bit. > > 2) Remove the BUG_ON() check in tg3_restart_ints(), it can legally and > harmlessly occur. > > Point #2 may want some refinements, but this patch below is good > enough for testing. This patch seems to run well, so far without the lockup we saw with the first version. It really helps with irq fairness when we have lots of tg3 and Fibre Channel HBA interrupts going to the same CPU. Greg. -- Greg Banks, R&D Software Engineer, SGI Australian Software Group. I don't speak for SGI. From jbenc@suse.cz Tue Jun 7 05:59:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 05:59:54 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57CxnXq017839 for ; Tue, 7 Jun 2005 05:59:50 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 0C1376282E2; Tue, 7 Jun 2005 14:58:42 +0200 (CEST) Date: Tue, 7 Jun 2005 14:58:41 +0200 From: Jiri Benc To: NetDev Cc: Zhu Yi , Jeff Garzik , Jirka Bohac Subject: Re: [3/9] ieee80211: fix ipw 64bit compilation warnings Message-ID: <20050607145841.06d8d40f@griffin.suse.cz> In-Reply-To: <1118039392.5702.30.camel@debian.sh.intel.com> References: <20050603182625.64d33be3@griffin.suse.cz> <20050603183048.7786f98b@griffin.suse.cz> <1118039392.5702.30.camel@debian.sh.intel.com> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2192 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 7430 Lines: 246 On Mon, 06 Jun 2005 14:29:52 +0800, Zhu Yi wrote: > ("%zd", sizeof()) should be better. Thanks. This is a corrected version of the patch. This patch fixes warnings when compiling ipw2100 and ipw2200 on x86_64. Signed-off-by: Jiri Benc Signed-off-by: Jirka Bohac Index: netdev/drivers/net/wireless/ipw2200.c =================================================================== --- netdev.orig/drivers/net/wireless/ipw2200.c 2005-06-01 11:03:37.000000000 +0200 +++ netdev/drivers/net/wireless/ipw2200.c 2005-06-07 14:23:08.000000000 +0200 @@ -241,8 +241,8 @@ IPW_DEBUG_IO(" reg = 0x%8X : value = 0x%8X\n", reg, value); _ipw_write32(priv, CX2_INDIRECT_ADDR, reg & CX2_INDIRECT_ADDR_MASK); _ipw_write8(priv, CX2_INDIRECT_DATA, value); - IPW_DEBUG_IO(" reg = 0x%8X : value = 0x%8X\n", - (unsigned)(priv->hw_base + CX2_INDIRECT_DATA), + IPW_DEBUG_IO(" reg = 0x%8lX : value = 0x%8X\n", + (unsigned long)(priv->hw_base + CX2_INDIRECT_DATA), value); } @@ -508,7 +508,7 @@ /* verify we have enough room to store the value */ if (*len < sizeof(u32)) { IPW_DEBUG_ORD("ordinal buffer length too small, " - "need %d\n", sizeof(u32)); + "need %zd\n", sizeof(u32)); return -EINVAL; } @@ -541,7 +541,7 @@ /* verify we have enough room to store the value */ if (*len < sizeof(u32)) { IPW_DEBUG_ORD("ordinal buffer length too small, " - "need %d\n", sizeof(u32)); + "need %zd\n", sizeof(u32)); return -EINVAL; } @@ -1740,7 +1740,7 @@ u32 address = CX2_SHARED_SRAM_DMA_CONTROL + (sizeof(struct command_block) * index); IPW_DEBUG_FW(">> :\n"); - ipw_write_indirect(priv, address, (u8*)cb, sizeof(struct command_block)); + ipw_write_indirect(priv, address, (u8*)cb, (int)sizeof(struct command_block)); IPW_DEBUG_FW("<< :\n"); return 0; @@ -2342,7 +2342,7 @@ return -EINVAL; } - IPW_DEBUG_INFO("Loading firmware '%s' file v%d.%d (%d bytes)\n", + IPW_DEBUG_INFO("Loading firmware '%s' file v%d.%d (%zd bytes)\n", name, IPW_FW_MAJOR(header->version), IPW_FW_MINOR(header->version), @@ -2697,7 +2697,7 @@ q->bd = pci_alloc_consistent(dev,sizeof(q->bd[0])*count, &q->q.dma_addr); if (!q->bd) { - IPW_ERROR("pci_alloc_consistent(%d) failed\n", + IPW_ERROR("pci_alloc_consistent(%zd) failed\n", sizeof(q->bd[0]) * count); kfree(q->txb); q->txb = NULL; @@ -3466,8 +3466,8 @@ x->channel_num); } else { IPW_DEBUG_SCAN("Scan result of wrong size %d " - "(should be %d)\n", - notif->size,sizeof(*x)); + "(should be %zd)\n", + notif->size, sizeof(*x)); } break; } @@ -3482,8 +3482,8 @@ x->status); } else { IPW_ERROR("Scan completed of wrong size %d " - "(should be %d)\n", - notif->size,sizeof(*x)); + "(should be %zd)\n", + notif->size, sizeof(*x)); } priv->status &= ~(STATUS_SCANNING | STATUS_SCAN_ABORTING); @@ -3515,7 +3515,7 @@ IPW_ERROR("Frag length: %d\n", x->frag_length); } else { IPW_ERROR("Frag length of wrong size %d " - "(should be %d)\n", + "(should be %zd)\n", notif->size, sizeof(*x)); } break; @@ -3532,8 +3532,8 @@ memcpy(&priv->last_link_deterioration, x, sizeof(*x)); } else { IPW_ERROR("Link Deterioration of wrong size %d " - "(should be %d)\n", - notif->size,sizeof(*x)); + "(should be %zd)\n", + notif->size, sizeof(*x)); } break; } @@ -3552,7 +3552,7 @@ struct notif_beacon_state *x = ¬if->u.beacon_state; if (notif->size != sizeof(*x)) { IPW_ERROR("Beacon state of wrong size %d (should " - "be %d)\n", notif->size, sizeof(*x)); + "be %zd)\n", notif->size, sizeof(*x)); break; } @@ -3602,8 +3602,8 @@ break; } - IPW_ERROR("TGi Tx Key of wrong size %d (should be %d)\n", - notif->size,sizeof(*x)); + IPW_ERROR("TGi Tx Key of wrong size %d (should be %zd)\n", + notif->size, sizeof(*x)); break; } @@ -3616,8 +3616,8 @@ break; } - IPW_ERROR("Calibration of wrong size %d (should be %d)\n", - notif->size,sizeof(*x)); + IPW_ERROR("Calibration of wrong size %d (should be %zd)\n", + notif->size, sizeof(*x)); break; } @@ -3628,7 +3628,7 @@ break; } - IPW_ERROR("Noise stat is wrong size %d (should be %d)\n", + IPW_ERROR("Noise stat is wrong size %d (should be %zd)\n", notif->size, sizeof(u32)); break; } @@ -4823,7 +4823,7 @@ } /* Advance skb->data to the start of the actual payload */ - skb_reserve(rxb->skb, (u32)&pkt->u.frame.data[0] - (u32)pkt); + skb_reserve(rxb->skb, offsetof(struct ipw_rx_packet, u.frame.data)); /* Set the size of the skb to the size of the frame */ skb_put(rxb->skb, pkt->u.frame.length); Index: netdev/drivers/net/wireless/ipw2100.c =================================================================== --- netdev.orig/drivers/net/wireless/ipw2100.c 2005-06-01 11:03:37.000000000 +0200 +++ netdev/drivers/net/wireless/ipw2100.c 2005-06-07 14:29:13.000000000 +0200 @@ -493,7 +493,7 @@ *len = IPW_ORD_TAB_1_ENTRY_SIZE; IPW_DEBUG_WARNING(DRV_NAME - ": ordinal buffer length too small, need %d\n", + ": ordinal buffer length too small, need %zd\n", IPW_ORD_TAB_1_ENTRY_SIZE); return -EINVAL; @@ -2302,7 +2302,7 @@ #endif IPW_DEBUG_INFO(DRV_NAME ": PCI latency error detected at " - "0x%04X.\n", i * sizeof(struct ipw2100_status)); + "0x%04zX.\n", i * sizeof(struct ipw2100_status)); #ifdef ACPI_CSTATE_LIMIT_DEFINED IPW_DEBUG_INFO(DRV_NAME ": Disabling C3 transitions.\n"); @@ -2398,7 +2398,7 @@ /* Make a copy of the frame so we can dump it to the logs if * ieee80211_rx fails */ memcpy(packet_data, packet->skb->data, - min(status->frame_size, IPW_RX_NIC_BUFFER_LENGTH)); + min_t(u32, status->frame_size, IPW_RX_NIC_BUFFER_LENGTH)); #endif if (!ieee80211_rx(priv->ieee, packet->skb, stats)) { @@ -2730,21 +2730,21 @@ { int i = txq->oldest; IPW_DEBUG_TX( - "TX%d V=%p P=%p T=%p L=%d\n", i, + "TX%d V=%p P=%04X T=%04X L=%d\n", i, &txq->drv[i], - (void*)txq->nic + i * sizeof(struct ipw2100_bd), - (void*)txq->drv[i].host_addr, + (u32)(txq->nic + i * sizeof(struct ipw2100_bd)), + txq->drv[i].host_addr, txq->drv[i].buf_length); if (packet->type == DATA) { i = (i + 1) % txq->entries; IPW_DEBUG_TX( - "TX%d V=%p P=%p T=%p L=%d\n", i, + "TX%d V=%p P=%04X T=%04X L=%d\n", i, &txq->drv[i], - (void*)txq->nic + i * - sizeof(struct ipw2100_bd), - (void*)txq->drv[i].host_addr, + (u32)(txq->nic + i * + sizeof(struct ipw2100_bd)), + (u32)txq->drv[i].host_addr, txq->drv[i].buf_length); } } @@ -4212,7 +4212,7 @@ { IPW_DEBUG_INFO("enter\n"); - IPW_DEBUG_INFO("initializing bd queue at virt=%p, phys=%08x\n", q->drv, q->nic); + IPW_DEBUG_INFO("initializing bd queue at virt=%p, phys=%08x\n", q->drv, (u32)q->nic); write_register(priv->net_dev, base, q->nic); write_register(priv->net_dev, size, q->entries); @@ -8431,7 +8431,7 @@ priv->net_dev->name, fw_name); return rc; } - IPW_DEBUG_INFO("firmware data %p size %d\n", fw->fw_entry->data, + IPW_DEBUG_INFO("firmware data %p size %zd\n", fw->fw_entry->data, fw->fw_entry->size); ipw2100_mod_firmware_load(fw); -- Jiri Benc SUSE Labs From gandalf@wlug.westbo.se Tue Jun 7 06:06:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 06:06:17 -0700 (PDT) Received: from mxfep02.bredband.com (mxfep02.bredband.com [195.54.107.73]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57D6BXq018853 for ; Tue, 7 Jun 2005 06:06:12 -0700 Received: from tux.rsn.bth.se ([85.228.2.43] [85.228.2.43]) by mxfep02.bredband.com with ESMTP id <20050607130507.WFEV25621.mxfep02.bredband.com@tux.rsn.bth.se>; Tue, 7 Jun 2005 15:05:07 +0200 Received: from localhost (localhost [127.0.0.1]) by tux.rsn.bth.se (Postfix) with ESMTP id 38AEA3F55; Tue, 7 Jun 2005 14:06:18 +0200 (CEST) Date: Tue, 7 Jun 2005 14:06:18 +0200 (CEST) From: Martin Josefsson X-X-Sender: gandalf@tux.rsn.bth.se To: jamal Cc: Stephen Hemminger , Mitch Williams , "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <1118147904.6320.108.camel@localhost.localdomain> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2193 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev Content-Length: 1710 Lines: 34 On Tue, 7 Jun 2005, jamal wrote: > It is possible. Remember also the cost of IO these days is worse than a > cache miss in cycles as well as absolute time. So the e1000 maybe doing > more IO than the tg3. > > I think there is something fishy about the e1000 in general; From what i > just heard mentioned reading the emails is there's improvement if the rx > ring is replenished on a per packet basis instead of a batch at the end. > This somehow is not an issue with tg3. I think doing replenishing in > smaller batches like 5 packets at a time would also help. > That the tg3 doesnt need to have its rx ring sizes adjusted but the > e1000 gets better the lower the rx ring size is strange. > > To the intel folks: shouldnt someone be investigating why this is so? > > Fixing the effect with "lets lower the weight" or "wait, lets adjust it > at runtime" because we know it fixes our problem - sounds like a serious > bandaid to me. Lets find the cause and fix that instead. > Why is this issue happening with e1000? Thats what needs to be resolved. > So far some evidence seems to be suggesting that the tg3 uses less CPU. One thing that jumps to mind is that e1000 starts at lastrxdescriptor+1 and loops and checks the status of each descriptor and stops when it finds a descriptor that isn't finished. Another way to do it is to read out the current position of the ring and loop from lastrxdescriptor+1 up to the current position. Scott Feldman implemented this for TX and there it increased performance somewhat (discussed here on netdev some months ago). I wonder if it could also decrease RX latency, I mean, we have to get the cache miss sometime anyway. I havn't checked how tg3 does it. /Martin From hadi@cyberus.ca Tue Jun 7 06:30:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 06:30:55 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57DUiXq020570 for ; Tue, 7 Jun 2005 06:30:44 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1Dfe9H-0008RY-HE for netdev@oss.sgi.com; Tue, 07 Jun 2005 09:29:43 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1Dfe9D-0004Pp-Oa; Tue, 07 Jun 2005 09:29:40 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Martin Josefsson Cc: Stephen Hemminger , Mitch Williams , "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> Content-Type: text/plain Organization: unknown Date: Tue, 07 Jun 2005 09:29:08 -0400 Message-Id: <1118150948.6320.152.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2194 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 902 Lines: 21 On Tue, 2005-07-06 at 14:06 +0200, Martin Josefsson wrote: > One thing that jumps to mind is that e1000 starts at lastrxdescriptor+1 > and loops and checks the status of each descriptor and stops when it finds > a descriptor that isn't finished. Another way to do it is to read out the > current position of the ring and loop from lastrxdescriptor+1 up to the > current position. Scott Feldman implemented this for TX and there it > increased performance somewhat (discussed here on netdev some months ago). > I wonder if it could also decrease RX latency, I mean, we have to get the > cache miss sometime anyway. > The effect of Scotts patch was to reduce IO by amortizing it on the TX side. Are we talking about the same thing ? This was in the case of TX descriptor prunning? So it is possible that the e1000 is doing more than necessary share of IO on the receive side as well. cheers, jamal From gandalf@wlug.westbo.se Tue Jun 7 06:36:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 06:36:09 -0700 (PDT) Received: from mxfep02.bredband.com (mxfep02.bredband.com [195.54.107.73]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57Da5Xq021190 for ; Tue, 7 Jun 2005 06:36:06 -0700 Received: from tux.rsn.bth.se ([85.228.2.43] [85.228.2.43]) by mxfep02.bredband.com with ESMTP id <20050607133502.WLUM25621.mxfep02.bredband.com@tux.rsn.bth.se>; Tue, 7 Jun 2005 15:35:02 +0200 Received: from localhost (localhost [127.0.0.1]) by tux.rsn.bth.se (Postfix) with ESMTP id 8A45E3F55; Tue, 7 Jun 2005 14:36:12 +0200 (CEST) Date: Tue, 7 Jun 2005 14:36:12 +0200 (CEST) From: Martin Josefsson X-X-Sender: gandalf@tux.rsn.bth.se To: jamal Cc: Stephen Hemminger , Mitch Williams , "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <1118150948.6320.152.camel@localhost.localdomain> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> <1118150948.6320.152.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2195 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: gandalf@wlug.westbo.se Precedence: bulk X-list: netdev Content-Length: 1233 Lines: 29 On Tue, 7 Jun 2005, jamal wrote: > On Tue, 2005-07-06 at 14:06 +0200, Martin Josefsson wrote: > > > One thing that jumps to mind is that e1000 starts at lastrxdescriptor+1 > > and loops and checks the status of each descriptor and stops when it finds > > a descriptor that isn't finished. Another way to do it is to read out the > > current position of the ring and loop from lastrxdescriptor+1 up to the > > current position. Scott Feldman implemented this for TX and there it > > increased performance somewhat (discussed here on netdev some months ago). > > I wonder if it could also decrease RX latency, I mean, we have to get the > > cache miss sometime anyway. > > > > The effect of Scotts patch was to reduce IO by amortizing it on the TX > side. Are we talking about the same thing ? This was in the case of TX > descriptor prunning? Yes, that was for TX pruning. > So it is possible that the e1000 is doing more than necessary share of > IO on the receive side as well. Yes, that's what I mean. Same thing but for RX but the question is how much we would gain from it, we still need to touch the rx-descriptor sooner or later. Would be worth a test. My testsetup isn't in a working condition right now, Robert? /Martin From tgr@postel.suug.ch Tue Jun 7 07:49:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 07:49:26 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57EnKXq003170 for ; Tue, 7 Jun 2005 07:49:21 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id A7DDB1C0EE; Tue, 7 Jun 2005 16:20:58 +0200 (CEST) Message-Id: <20050607140901.469224000@axs> References: <20050607140842.778143000@axs> Date: Tue, 07 Jun 2005 16:08:48 +0200 From: Thomas Graf To: davem@davemloft.net Cc: netdev@oss.sgi.com Subject: [PATCH 6/7] [PKT_SCHED]: Cleanup pfifo_fast qdisc and remove unnecessary code Content-Disposition: inline; filename=sch_pfifo_fast_cleanup X-archive-position: 2203 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Status: RO Content-Length: 3144 Lines: 115 Removes the skb trimming code which is not needed since we never touch the skb upon failure. Removes unnecessary initializers, and simplifies the code a bit. Signed-off-by: Thomas Graf Index: net-2.6.13/net/sched/sch_generic.c =================================================================== --- net-2.6.13.orig/net/sched/sch_generic.c +++ net-2.6.13/net/sched/sch_generic.c @@ -311,6 +311,8 @@ static const u8 prio2band[TC_PRIO_MAX+1] generic prio+fifo combination. */ +#define PFIFO_FAST_BANDS 3 + static inline struct sk_buff_head *prio2list(struct sk_buff *skb, struct Qdisc *qdisc) { @@ -318,8 +320,7 @@ static inline struct sk_buff_head *prio2 return list + prio2band[skb->priority & TC_PRIO_MAX]; } -static int -pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc* qdisc) +static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc* qdisc) { struct sk_buff_head *list = prio2list(skb, qdisc); @@ -331,36 +332,34 @@ pfifo_fast_enqueue(struct sk_buff *skb, return qdisc_drop(skb, qdisc); } -static struct sk_buff * -pfifo_fast_dequeue(struct Qdisc* qdisc) +static struct sk_buff *pfifo_fast_dequeue(struct Qdisc* qdisc) { int prio; struct sk_buff_head *list = qdisc_priv(qdisc); - for (prio = 0; prio < 3; prio++, list++) { + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++, list++) { struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list); if (skb) { qdisc->q.qlen--; return skb; } } + return NULL; } -static int -pfifo_fast_requeue(struct sk_buff *skb, struct Qdisc* qdisc) +static int pfifo_fast_requeue(struct sk_buff *skb, struct Qdisc* qdisc) { qdisc->q.qlen++; return __qdisc_requeue(skb, qdisc, prio2list(skb, qdisc)); } -static void -pfifo_fast_reset(struct Qdisc* qdisc) +static void pfifo_fast_reset(struct Qdisc* qdisc) { int prio; struct sk_buff_head *list = qdisc_priv(qdisc); - for (prio=0; prio < 3; prio++) + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) __qdisc_reset_queue(qdisc, list + prio); qdisc->qstats.backlog = 0; @@ -369,35 +368,30 @@ pfifo_fast_reset(struct Qdisc* qdisc) static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb) { - unsigned char *b = skb->tail; - struct tc_prio_qopt opt; + struct tc_prio_qopt opt = { .bands = PFIFO_FAST_BANDS }; - opt.bands = 3; memcpy(&opt.priomap, prio2band, TC_PRIO_MAX+1); RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt); return skb->len; rtattr_failure: - skb_trim(skb, b - skb->data); return -1; } static int pfifo_fast_init(struct Qdisc *qdisc, struct rtattr *opt) { - int i; + int prio; struct sk_buff_head *list = qdisc_priv(qdisc); - for (i=0; i<3; i++) - skb_queue_head_init(list+i); + for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) + skb_queue_head_init(list + prio); return 0; } static struct Qdisc_ops pfifo_fast_ops = { - .next = NULL, - .cl_ops = NULL, .id = "pfifo_fast", - .priv_size = 3 * sizeof(struct sk_buff_head), + .priv_size = PFIFO_FAST_BANDS * sizeof(struct sk_buff_head), .enqueue = pfifo_fast_enqueue, .dequeue = pfifo_fast_dequeue, .requeue = pfifo_fast_requeue, From tgr@postel.suug.ch Tue Jun 7 07:49:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 07:49:26 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57EnKXq003172 for ; Tue, 7 Jun 2005 07:49:20 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id BCCC21C0F0; Tue, 7 Jun 2005 16:21:03 +0200 (CEST) Message-Id: <20050607140901.632982000@axs> References: <20050607140842.778143000@axs> Date: Tue, 07 Jun 2005 16:08:49 +0200 From: Thomas Graf To: davem@davemloft.net Cc: netdev@oss.sgi.com Subject: [PATCH 7/7] [PKT_SCHED]: noop/noqueue qdisc style cleanups Content-Disposition: inline; filename=sch_generic_cleanups X-archive-position: 2202 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Status: RO Content-Length: 1356 Lines: 54 Signed-off-by: Thomas Graf Index: net-2.6.13/net/sched/sch_generic.c =================================================================== --- net-2.6.13.orig/net/sched/sch_generic.c +++ net-2.6.13/net/sched/sch_generic.c @@ -243,31 +243,27 @@ static void dev_watchdog_down(struct net cheaper. */ -static int -noop_enqueue(struct sk_buff *skb, struct Qdisc * qdisc) +static int noop_enqueue(struct sk_buff *skb, struct Qdisc * qdisc) { kfree_skb(skb); return NET_XMIT_CN; } -static struct sk_buff * -noop_dequeue(struct Qdisc * qdisc) +static struct sk_buff *noop_dequeue(struct Qdisc * qdisc) { return NULL; } -static int -noop_requeue(struct sk_buff *skb, struct Qdisc* qdisc) +static int noop_requeue(struct sk_buff *skb, struct Qdisc* qdisc) { if (net_ratelimit()) - printk(KERN_DEBUG "%s deferred output. It is buggy.\n", skb->dev->name); + printk(KERN_DEBUG "%s deferred output. It is buggy.\n", + skb->dev->name); kfree_skb(skb); return NET_XMIT_CN; } struct Qdisc_ops noop_qdisc_ops = { - .next = NULL, - .cl_ops = NULL, .id = "noop", .priv_size = 0, .enqueue = noop_enqueue, @@ -285,8 +281,6 @@ struct Qdisc noop_qdisc = { }; static struct Qdisc_ops noqueue_qdisc_ops = { - .next = NULL, - .cl_ops = NULL, .id = "noqueue", .priv_size = 0, .enqueue = noop_enqueue, From kernel@linuxace.com Tue Jun 7 08:35:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 08:35:59 -0700 (PDT) Received: from linuxace.com (adsl-67-120-171-161.dsl.lsan03.pacbell.net [67.120.171.161]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j57FZsXq011446 for ; Tue, 7 Jun 2005 08:35:56 -0700 Received: (qmail 28824 invoked by uid 0); 7 Jun 2005 15:34:51 -0000 Date: Tue, 7 Jun 2005 08:34:51 -0700 From: Phil Oester To: randy_dunlap Cc: herbert@gondor.apana.org.au, netdev@oss.sgi.com, akpm@osdl.org Subject: Re: 2.6.12-rcx networking oops Message-ID: <20050607153451.GA28776@linuxace.com> References: <20050531224012.GA16789@linuxace.com> <20050601054955.GA2625@gondor.apana.org.au> <20050601170058.GA20112@linuxace.com> <20050606224646.24af30ff.rdunlap@xenotime.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050606224646.24af30ff.rdunlap@xenotime.net> User-Agent: Mutt/1.4.1i X-archive-position: 2204 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kernel@linuxace.com Precedence: bulk X-list: netdev Content-Length: 713 Lines: 20 On Mon, Jun 06, 2005 at 10:46:46PM -0700, randy_dunlap wrote: > Agreed, the stack trace is suspicious. (more below) Yes, many of the oops i've collected are questionable... > This is with NAPI, right? Would it make sense to try it with that > disabled? (I don't recall you saying it's NAPI, but the e1000 > functions seem to indicate that.) It is NAPI, but it works fine up to 2.6.11-rc1. 2.6.11-rc2 fails, so I'm now testing each individual -bk snapshot between them in hopes of finding the offending changeset. Given that this box is a firewall, it could be the slew of large netfilter changes which went into -rc2, but we'll see. > and how about enabling CONFIG_FRAME_POINTER ? It is enabled. Phil From john.ronciak@intel.com Tue Jun 7 09:27:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 09:27:10 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57GR4Xq025161 for ; Tue, 7 Jun 2005 09:27:04 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j57GNcD5005448; Tue, 7 Jun 2005 16:23:38 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j57GN8AP019253; Tue, 7 Jun 2005 16:23:33 GMT Received: from orsmsx331.amr.corp.intel.com ([192.168.65.56]) by orsmsxvs040.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060709233307122 ; Tue, 07 Jun 2005 09:23:33 -0700 Received: from orsmsx408.amr.corp.intel.com ([192.168.65.52]) by orsmsx331.amr.corp.intel.com with Microsoft SMTPSVC(6.0.3790.211); Tue, 7 Jun 2005 09:23:33 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: RFC: NAPI packet weighting patch Date: Tue, 7 Jun 2005 09:23:32 -0700 Message-ID: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: RFC: NAPI packet weighting patch Thread-Index: AcVrXegOZHm18G6dQvuhFRXgNm9BYQAHZMnQ From: "Ronciak, John" To: , "Stephen Hemminger" Cc: "Williams, Mitch A" , "David S. Miller" , , , , , , "Venkatesan, Ganesh" , "Brandeburg, Jesse" X-OriginalArrivalTime: 07 Jun 2005 16:23:33.0265 (UTC) FILETIME=[3FA2E010:01C56B7D] X-Scanned-By: MIMEDefang 2.44 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j57GR4Xq025161 X-archive-position: 2205 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john.ronciak@intel.com Precedence: bulk X-list: netdev Content-Length: 1178 Lines: 28 >> > To the intel folks: shouldnt someone be investigating why this is so? This is why we started all of this. We have data that is showing this issue where our over all performance is best in class and yet we can make it better by changing things like the weight value. There also seems to be some misconceptions about changing the weight value. It actually improves the performance of other drivers as well. Not as much as it improves the e1000 performance but it does seem to help others as well. We (Intel) have to be careful talking about competitors performance so we just refer to them as competitors in these threads. So it is not just e1000 who benefits from the lower weight values. One thing it is doing for e1000 right now is that it is stopping the e1000 from dropping frames which is part of why it's helping the e1000 more (I think). I agree that we need to bottom out on this and it's why we are dedicating the time and resources to this effort. We also appreciate all the effort to help resolve this as well. This should result in a better performing 2.6 stack and drivers. The new TSO code is a big step in that direction as well. Cheers, John From Robert.Olsson@data.slu.se Tue Jun 7 09:35:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 09:35:52 -0700 (PDT) Received: from mx1.slu.se (mx1.slu.se [130.238.96.70]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57GZmXq026611 for ; Tue, 7 Jun 2005 09:35:49 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by mx1.slu.se (8.13.1/8.13.1) with ESMTP id j57GYL89026326; Tue, 7 Jun 2005 18:34:21 +0200 Received: by robur.slu.se (Postfix, from userid 1000) id 5C03AEE3F0; Tue, 7 Jun 2005 18:34:21 +0200 (CEST) From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17061.52365.336303.369135@robur.slu.se> Date: Tue, 7 Jun 2005 18:34:21 +0200 To: Martin Josefsson Cc: jamal , Stephen Hemminger , Mitch Williams , "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> <1118150948.6320.152.camel@localhost.localdomain> X-Mailer: VM 7.19 under Emacs 21.4.1 X-Scanned-By: MIMEDefang 2.48 on 130.238.96.70 X-archive-position: 2206 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Content-Length: 991 Lines: 28 Martin Josefsson writes: > > So it is possible that the e1000 is doing more than necessary share of > > IO on the receive side as well. > > Yes, that's what I mean. Same thing but for RX but the question is how > much we would gain from it, we still need to touch the rx-descriptor > sooner or later. Would be worth a test. > My testsetup isn't in a working condition right now, Robert? Next week possibly... but really now idea what's to test or whats going on. We have dual TCP server with one NIC. How is setup now? I don't know even how it should be setup for maximum TCP performance? How is irq affinity setup? Is irq's jumping between the CPU:s etc? Does ksoftirq(s) use CPU? If so it can be adjusted tuned too. How is packets processes by CPU's. /proc/net/softnet_stat. Do we see drops w. one CPU too etc It might be intricate question of balance between softirq and userland. Cheers. --ro BTW, Can netperf be used for tests like this? (Rick?) From davem@davemloft.net Tue Jun 7 13:23:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 13:23:54 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57KNmXq022919 for ; Tue, 7 Jun 2005 13:23:49 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfkaF-0002Ve-US; Tue, 07 Jun 2005 13:21:59 -0700 Date: Tue, 07 Jun 2005 13:21:59 -0700 (PDT) Message-Id: <20050607.132159.35660612.davem@davemloft.net> To: john.ronciak@intel.com Cc: hadi@cyberus.ca, shemminger@osdl.org, mitch.a.williams@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2207 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 491 Lines: 11 From: "Ronciak, John" Date: Tue, 7 Jun 2005 09:23:32 -0700 > There also seems to be some misconceptions about changing the weight > value. It actually improves the performance of other drivers as well. > Not as much as it improves the e1000 performance but it does seem to > help others as well. One reason it helps e1000 more, which Robert Olsson mentioned, could be the HW irq mitigation settings used by the e1000 driver. Lowering these would be a good test. From pmeda@akamai.com Tue Jun 7 13:33:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 13:33:56 -0700 (PDT) Received: from smtp3.akamai.com (smtp3.akamai.com [63.116.109.25]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57KXpXq023989 for ; Tue, 7 Jun 2005 13:33:51 -0700 Received: from smtp3.akamai.com (vwall3.sanmateo.corp.akamai.com [172.23.1.73]) by smtp3.akamai.com (8.12.10/8.12.10) with ESMTP id j57KWjRx021896 for ; Tue, 7 Jun 2005 13:32:47 -0700 (PDT) Received: from allur.sanmateo.akamai.com (allur.sanmateo.corp.akamai.com [172.23.11.58]) by smtp3.akamai.com (8.12.10/8.12.10) with ESMTP id j57KWiB6021894; Tue, 7 Jun 2005 13:32:44 -0700 (PDT) From: pmeda@akamai.com Received: (from pmeda@localhost) by allur.sanmateo.akamai.com (8.9.3/8.9.3) id NAA06207; Tue, 7 Jun 2005 13:32:44 -0700 Date: Tue, 7 Jun 2005 13:32:44 -0700 Message-Id: <200506072032.NAA06207@allur.sanmateo.akamai.com> To: davem@davemloft.net, jgarzik@pobox.com Subject: [patch] devinet: cleanup if statements Cc: akpm@osdl.org, netdev@oss.sgi.com X-archive-position: 2208 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pmeda@akamai.com Precedence: bulk X-list: netdev Content-Length: 1044 Lines: 33 Cleanup the devinet if statements. - when there is no colon, interface name is same as device. - ifa_label is an array, not a pointer, and so can never be null. Signed-Off-by: Prasanna Meda --- a/net/ipv4/devinet.c Wed Jun 1 23:54:37 2005 +++ b/net/ipv4/devinet.c Wed Jun 1 23:57:16 2005 @@ -636,10 +636,7 @@ ret = -ENOBUFS; if ((ifa = inet_alloc_ifa()) == NULL) break; - if (colon) - memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ); - else - memcpy(ifa->ifa_label, dev->name, IFNAMSIZ); + memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ); } else { ret = 0; if (ifa->ifa_local == sin->sin_addr.s_addr) @@ -746,10 +743,7 @@ if (len < (int) sizeof(ifr)) break; memset(&ifr, 0, sizeof(struct ifreq)); - if (ifa->ifa_label) - strcpy(ifr.ifr_name, ifa->ifa_label); - else - strcpy(ifr.ifr_name, dev->name); + strcpy(ifr.ifr_name, ifa->ifa_label); (*(struct sockaddr_in *)&ifr.ifr_addr).sin_family = AF_INET; (*(struct sockaddr_in *)&ifr.ifr_addr).sin_addr.s_addr = From shemminger@osdl.org Tue Jun 7 14:24:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 14:24:43 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57LOaXq027454 for ; Tue, 7 Jun 2005 14:24:37 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j57LNCjA003938 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Tue, 7 Jun 2005 14:23:13 -0700 Received: from unknown-215.office.pdx.osdl.net (unknown-215.office.pdx.osdl.net [10.8.0.215]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j57LNAC8015526; Tue, 7 Jun 2005 14:23:12 -0700 Date: Tue, 7 Jun 2005 14:23:09 -0700 From: Stephen Hemminger To: "David S. Miller" Cc: netdev@oss.sgi.com, Andrew Morton Subject: 2.6.12-rc6-tcp1 Message-ID: <20050607142309.061a7ced@unknown-215.office.pdx.osdl.net> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2209 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 867 Lines: 30 http://developer.osdl.org/shemminger/patches/2.6.12-rc6-tcp1 This is the update of the TCP pluggable congestion framework with other network changes (targeted for 2.6.13) Minor tweaks from last time: * move tcpdiag_put to tcp.h to avoid build problems when IP_TCPDIAG is a module. * sysctl tcp_congestion_control now checks for valid values * default tcp_congestion_control is determined LIFO (ie last one registered is the default). Sysctl just reorders list + added /sys/class/net/ethX/weight interface + added scalable TCP - removed version patch to make integration with -mm easier fastroute-stats-remove.patch no-congestion.patch no-throttle.patch bigger-backlog.patch fix-weightp.patch weight-sysfs.patch tcp_super_tso_v3.patch tcp_infra.patch tcp_bic.patch tcp_westwood.patch hstcp.patch hybla.patch vegas.patch h-tcp.patch scaleable_tcp.patch From tgraf@suug.ch Tue Jun 7 14:37:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 14:37:08 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57Lb4Xq028457 for ; Tue, 7 Jun 2005 14:37:05 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 304941C0EE; Tue, 7 Jun 2005 23:36:21 +0200 (CEST) Date: Tue, 7 Jun 2005 23:36:21 +0200 From: Thomas Graf To: netdev@oss.sgi.com Subject: netdev munching messages again? Message-ID: <20050607213621.GG20969@postel.suug.ch> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050607140901.632982000@axs> X-archive-position: 2210 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 253 Lines: 4 Is netdev not fed regularely so it started munching messages again? I've not received the introduction message and patches 1-5 back only 6-7 which have been sitting in the queue due to refused connections for a while. Am I the only one having troubles? From davem@davemloft.net Tue Jun 7 14:43:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 14:43:50 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57LhjXq029311 for ; Tue, 7 Jun 2005 14:43:45 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DflqH-0000mj-5m; Tue, 07 Jun 2005 14:42:37 -0700 Date: Tue, 07 Jun 2005 14:42:37 -0700 (PDT) Message-Id: <20050607.144237.93024273.davem@davemloft.net> To: tgraf@suug.ch Cc: netdev@oss.sgi.com Subject: Re: netdev munching messages again? From: "David S. Miller" In-Reply-To: <20050607213621.GG20969@postel.suug.ch> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2211 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 627 Lines: 17 From: Thomas Graf Date: Tue, 7 Jun 2005 23:36:21 +0200 > Is netdev not fed regularely so it started munching messages again? > I've not received the introduction message and patches 1-5 back > only 6-7 which have been sitting in the queue due to refused > connections for a while. Am I the only one having troubles? This is exactly what I saw as well. I did get all of your postings because you sent them with me on the CC: list, but netdev only sent out 6 and 7 to me just as you observed. This has become a regular occurance, it may be time to finally move this thing over to vger.kernel.org. Thoughts? From shemminger@osdl.org Tue Jun 7 15:44:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 15:44:38 -0700 (PDT) Received: from smtp.osdl.org (fire.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57MiVXq000513 for ; Tue, 7 Jun 2005 15:44:32 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j57MhPjA011445 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Tue, 7 Jun 2005 15:43:25 -0700 Received: from unknown-215.office.pdx.osdl.net (unknown-215.office.pdx.osdl.net [10.8.0.215]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j57MhOXJ019833; Tue, 7 Jun 2005 15:43:24 -0700 Date: Tue, 7 Jun 2005 15:43:24 -0700 From: Stephen Hemminger To: linux-net@vger.kernel.org, lartc@mailman.ds9a.nl, netdev@oss.sgi.com Subject: [ANNOUNCE] iproute2-ss050607 Message-ID: <20050607154324.0b280333@unknown-215.office.pdx.osdl.net> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2212 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 852 Lines: 22 Small update to iproute2, I have been waiting to get a CVS conversion completed and working on other things so changes are small. http://developer.osdl.org/dev/iproute2/download/iproute2-ss050607.tar.gz Stephen Hemminger * Fix 'ip link' map to handle case where device gets autoloaded by using if_nametoindex as fallback * Device indices are unsigned not int. Masahide NAKAMURA * [ip] show timestamp when using '-t' option. * [ip] remove duplicated code for expired message of xfrm. * [ip] add "deleteall" command for xfrm; "flush" uses kernel's flush interface and "deleteall" uses legacy iproute2's flush feature like getting-and-deleting-for-each. This is the first export from the CVS repo, so let me know if there are any quirks. If you have something you want to see in the next release and it isn't there please resend. From rick.jones2@hp.com Tue Jun 7 16:20:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 16:21:03 -0700 (PDT) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j57NKsXq006517 for ; Tue, 7 Jun 2005 16:20:54 -0700 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel11.hp.com (Postfix) with ESMTP id 94B30B71D; Tue, 7 Jun 2005 16:19:47 -0700 (PDT) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id QAA12692; Tue, 7 Jun 2005 16:19:46 -0700 (PDT) Message-ID: <42A62B92.2050701@hp.com> Date: Tue, 07 Jun 2005 16:19:46 -0700 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Robert Olsson , netdev@oss.sgi.com Cc: Martin Josefsson , jamal , Stephen Hemminger , Mitch Williams , "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> <1118150948.6320.152.camel@localhost.localdomain> <17061.52365.336303.369135@robur.slu.se> In-Reply-To: <17061.52365.336303.369135@robur.slu.se> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2213 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev Content-Length: 1069 Lines: 25 > > BTW, Can netperf be used for tests like this? (Rick?) Assuming I'm translating "test like this" to the right sort of stuff :) If one wants to see the effect of different buffer replenishment strategies, I suppose that some netperf tests could indeed be used. It would be desirable to look at service demand moreso than throughput (assuming the throughput is link-bound). TCP_STREAM and/or TCP_MAERTS. I'm not sure the extent to which it would be visible to a TCP_RR test. Differences in service demand could also be used to measure effects of irq migration, pinning IRQs and/or processes to specific CPUs and the like. The linux processor affinity stuff in netperf could use a little help though - it is easily confused as to when to use a two argument vs three argument sched_setaffinity call. I suspect one may also see differences in TCP_RR transaction rates. I suspect some high number of confidence interval iterations might be required. rick jones i'd trim individual names from the dist list, but am not 100% sure who is on netdev... From jesse.brandeburg@intel.com Tue Jun 7 18:53:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 18:53:31 -0700 (PDT) Received: from orsfmr003.jf.intel.com (fmr18.intel.com [134.134.136.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j581rKXq015730 for ; Tue, 7 Jun 2005 18:53:20 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr003.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j581ooa1011867; Wed, 8 Jun 2005 01:50:50 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j581onM6010391; Wed, 8 Jun 2005 01:50:49 GMT Received: from ladlxr.jf.intel.com (ladlxr.jf.intel.com [10.23.35.110]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j581oiSL030884; Tue, 7 Jun 2005 18:50:46 -0700 Date: Tue, 7 Jun 2005 18:50:44 -0700 (PDT) From: Jesse Brandeburg X-X-Sender: jbrandeb@ladlxr To: Ben Greear cc: "Williams, Mitch A" , "Ronciak, John" , "David S. Miller" , mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, jdmason@us.ibm.com, shemminger@osdl.org, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <42A4E599.2090604@candelatech.com> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A4E599.2090604@candelatech.com> ReplyTo: "Jesse Brandeburg" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2214 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jesse.brandeburg@intel.com Precedence: bulk X-list: netdev Content-Length: 354 Lines: 12 On Mon, 6 Jun 2005, Ben Greear wrote: > So is the Linux server reading/writing these large files to/from the > disk? no, the test runs completely from memory, and the clients are reading/writing from/to the server > Can you tell us how much performance went down when you increased the > descriptors to 512? sorry don't know the answer to that one. From jesse.brandeburg@intel.com Tue Jun 7 19:23:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 19:23:03 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j582MxXq018481 for ; Tue, 7 Jun 2005 19:22:59 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j582KcmM008975; Wed, 8 Jun 2005 02:20:38 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j582KcM6025562; Wed, 8 Jun 2005 02:20:38 GMT Received: from ladlxr.jf.intel.com (ladlxr.jf.intel.com [10.23.35.110]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j582KbSL032195; Tue, 7 Jun 2005 19:20:37 -0700 Date: Tue, 7 Jun 2005 19:20:37 -0700 (PDT) From: Jesse Brandeburg X-X-Sender: jbrandeb@ladlxr To: "David S. Miller" cc: "Ronciak, John" , hadi@cyberus.ca, shemminger@osdl.org, "Williams, Mitch A" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" , "Brandeburg, Jesse" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <20050607.132159.35660612.davem@davemloft.net> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> ReplyTo: "Jesse Brandeburg" MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-1056581817-1118197237=:31708" X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2215 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jesse.brandeburg@intel.com Precedence: bulk X-list: netdev Content-Length: 3183 Lines: 80 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-1056581817-1118197237=:31708 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT On Tue, 7 Jun 2005, David S. Miller wrote: > > There also seems to be some misconceptions about changing the weight > > value.  It actually improves the performance of other drivers as well. > > Not as much as it improves the e1000 performance but it does seem to > > help others as well. > > One reason it helps e1000 more, which Robert Olsson mentioned, could > be the HW irq mitigation settings used by the e1000 driver.  Lowering > these would be a good test. Well, first a little more data. The machine in question is a dual xeon running 2.6.12-rc5 or 2.6.12-rc4-supertso with the 2.6.12-rc5 kernel (the old) tso promptly shuts down after a SACK, and after that point the machine is CPU bound at 100%. This is the point that we start to drop packets at the hardware level. I tried the experiment today where I replenish buffers to hardware every 16 packets or so. This appears to mitigate all drops at the hardware level (no drops). We're still at 100% with the rc5 kernel, however. even with this replenish fix, the addition of dropping the weight to 16 helped increase our throughput, although only about 1%. On the other hand, taking our driver as is with no changes and running the supertso (not the split out version, yet) kernel, we show no dropped packets and 60% cpu use. This combines with a 6% increase in throughput, and the data pattern on the wire is much more constant (i have tcpdumps, do you want to see them Dave?) I'm looking forward to trying the split out patches tomorrow. here is my (compile tested) patch, for e1000 diff -rup e1000-6.0.60.orig/src/e1000_main.c e1000-6.0.60/src/e1000_main.c --- e1000-6.0.60.orig/src/e1000_main.c 2005-06-07 19:07:37.000000000 -0700 +++ e1000-6.0.60/src/e1000_main.c 2005-06-07 19:15:05.000000000 -0700 @@ -3074,11 +3074,14 @@ e1000_clean_rx_irq(struct e1000_adapter next_desc: rx_desc->status = 0; buffer_info->skb = NULL; + if(unlikely((i & ~(E1000_RX_BUFFER_WRITE - 1)) == i)) + adapter->alloc_rx_buf(adapter); if(unlikely(++i == rx_ring->count)) i = 0; rx_desc = E1000_RX_DESC(*rx_ring, i); } rx_ring->next_to_clean = i; + /* not sure this is necessary any more, but its safe */ adapter->alloc_rx_buf(adapter); return cleaned; @@ -3209,12 +3212,15 @@ e1000_clean_rx_irq_ps(struct e1000_adapt next_desc: rx_desc->wb.middle.status_error &= ~0xFF; buffer_info->skb = NULL; + if(unlikely((i & ~(E1000_RX_BUFFER_WRITE - 1)) == i)) + adapter->alloc_rx_buf(adapter); if(unlikely(++i == rx_ring->count)) i = 0; rx_desc = E1000_RX_DESC_PS(*rx_ring, i); staterr = le32_to_cpu(rx_desc->wb.middle.status_error); } rx_ring->next_to_clean = i; + /* not sure this is necessary any more, but its safe */ adapter->alloc_rx_buf(adapter); return cleaned; PS e1000-6.0.60 is posted on sf.net/projects/e1000 now. --8323328-1056581817-1118197237=:31708-- From davem@davemloft.net Tue Jun 7 20:33:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 20:33:37 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j583XFXq028103 for ; Tue, 7 Jun 2005 20:33:19 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfrIF-0001IB-Em; Tue, 07 Jun 2005 20:31:51 -0700 Date: Tue, 07 Jun 2005 20:31:51 -0700 (PDT) Message-Id: <20050607.203151.55506965.davem@davemloft.net> To: jesse.brandeburg@intel.com Cc: john.ronciak@intel.com, hadi@cyberus.ca, shemminger@osdl.org, mitch.a.williams@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2216 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 404 Lines: 9 From: Jesse Brandeburg Date: Tue, 7 Jun 2005 19:20:37 -0700 (PDT) > I'm looking forward to trying the split out patches tomorrow. Don't get too excited, those are purely bug fixes and don't actually do the actual "Super TSO" part yet. I'm trying to test the cleanups leading up to the actual TSO segmenting change to make sure any such regressions therein get weeded out. From davem@davemloft.net Tue Jun 7 20:44:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 07 Jun 2005 20:45:07 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j583iwXq029331 for ; Tue, 7 Jun 2005 20:44:58 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DfrTf-0001Jh-Tg; Tue, 07 Jun 2005 20:43:39 -0700 Date: Tue, 07 Jun 2005 20:43:39 -0700 (PDT) Message-Id: <20050607.204339.21591152.davem@davemloft.net> To: jesse.brandeburg@intel.com Cc: john.ronciak@intel.com, hadi@cyberus.ca, shemminger@osdl.org, mitch.a.williams@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2217 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 2186 Lines: 47 From: Jesse Brandeburg Date: Tue, 7 Jun 2005 19:20:37 -0700 (PDT) > with the 2.6.12-rc5 kernel (the old) tso promptly shuts down after a SACK, > and after that point the machine is CPU bound at 100%. This is the point > that we start to drop packets at the hardware level. You're getting packet loss on the local network where you're running these tests? Or is it simple packet reordering? > I tried the experiment today where I replenish buffers to hardware every > 16 packets or so. This appears to mitigate all drops at the hardware > level (no drops). We're still at 100% with the rc5 kernel, however. > > even with this replenish fix, the addition of dropping the weight to 16 > helped increase our throughput, although only about 1%. Any minor timing difference of any kind can have up to a %3 or %4 difference in TCP performance when the receiver is CPU limited. > On the other hand, taking our driver as is with no changes and running the > supertso (not the split out version, yet) kernel, we show no dropped > packets and 60% cpu use. This combines with a 6% increase in throughput, > and the data pattern on the wire is much more constant (i have tcpdumps, > do you want to see them Dave?) Yes, indeed the tcpdumps tend to look much nicer with supertso. The 10gbit guys see regressions though. They are helping me test things gradually in order to track down what change causes the problems. That's why I've started rewriting super TSO from scratch in a series of very small patches. I don't see how supertso can help the receiver, which is where the RX drops should be occuring. That's a little weird. I can't believe a 2.5 GHZ machine can't keep up with a simple 1 Gbit TCP stream. Do you have some other computation going on in that system? As stated yesterday my 1.5 GHZ crappy sparc64 box can receive a 1 Gbit TCP stream with much cpu to spare, my 750 MHZ sparc64 box can nearly do so as well. Something is up, if a single gigabit TCP stream can fully CPU load your machine. 10 gigabit, yeah, definitely all current generation machines are cpu limited over that link speed, but 1 gigabit should be no problem. From belyshev@depni.sinp.msu.ru Wed Jun 8 06:22:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:23:01 -0700 (PDT) Received: from depni.sinp.msu.ru (depni.sinp.msu.ru [213.131.7.21]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DMmXq022761 for ; Wed, 8 Jun 2005 06:22:48 -0700 Received: by depni.sinp.msu.ru (Postfix, from userid 1109) id 90683D6C28; Wed, 8 Jun 2005 17:21:36 +0400 (MSD) To: netdev@oss.sgi.com Subject: oops with hostap and 2.6.12-rc6-mm1: Kernel BUG at "net/ipv4/tcp_output.c":928 From: belyshev@depni.sinp.msu.ru Date: Wed, 08 Jun 2005 17:21:36 +0400 Message-ID: <56hdg93rxb.fsf@depni.sinp.msu.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2220 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: belyshev@depni.sinp.msu.ru Precedence: bulk X-list: netdev Content-Length: 2207 Lines: 41 Seems that this oops happens only if using hostap. $ cat /dev/zero | nc host.com discard ^C Segmentation fault ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at "net/ipv4/tcp_output.c":928 invalid operand: 0000 [1] CPU 0 Modules linked in: hostap_cs hostap Pid: 3312, comm: nc Not tainted 2.6.12-rc6-mm1 RIP: 0010:[] {tcp_tso_should_defer+55} RSP: 0018:ffff810015769c10 EFLAGS: 00010246 RAX: 0000000000000017 RBX: ffff81001e751340 RCX: 0000000005a80100 RDX: ffff81001e751340 RSI: ffff81001388aac0 RDI: 0000000000000002 RBP: ffff81001388aac0 R08: 0000000000000000 R09: ffff810015769d58 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: ffff81001388aac0 R15: 0000000000000018 FS: 00002aaaaae00c80(0000) GS:ffffffff8082e840(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000005c5008 CR3: 0000000013eb4000 CR4: 00000000000006e0 Process nc (pid: 3312, threadinfo ffff810015768000, task ffff81001ed2b750) Stack: ffffffff803ecaa4 ffff8100016c7bd8 000005a800000001 0000000000000001 ffff81001388aac0 ffff81001388aac0 0000000000000000 0000000000000000 ffff810019e4e888 ffff81001559f828 Call Trace:{tcp_write_xmit+196} {__tcp_push_pending_frames+41} {tcp_close+593} {inet_release+92} {sock_release+33} {sock_close+53} {__fput+194} {filp_close+104} {put_files_struct+116} {do_exit+522} {do_group_exit+177} {get_signal_to_deliver+1255} {do_signal+157} {autoremove_wake_function+0} {inotify_inode_queue_event+41} {vfs_write+303} {sysret_signal+28} {ptregscall_common+103} Code: 0f 0b de b7 4f 80 ff ff ff ff a0 03 44 8b 8e 14 03 00 00 44 RIP {tcp_tso_should_defer+55} RSP <1>Fixing recursive fault but reboot is needed! From pavel@ucw.cz Wed Jun 8 06:21:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:21:41 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DLUXq022650 for ; Wed, 8 Jun 2005 06:21:32 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id D34248B8A7; Wed, 8 Jun 2005 15:20:19 +0200 (CEST) Date: Wed, 8 Jun 2005 15:20:19 +0200 From: Pavel Machek To: Netdev list , "James P. Ketrenos" , Andrew Morton Subject: [-mm] ipw2100 cleanups: no X___ prefixes Message-ID: <20050608132019.GA2620@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2219 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 2670 Lines: 86 ipw2100 uses strange X__ prefixes even for symbols already prefixed by ipw2100. Fixed. Signed-off-by: Pavel Machek --- /data/l/clean-mm/drivers/net/wireless/ipw2100.c 2005-06-08 12:11:29.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.c 2005-06-08 15:09:26.000000000 +0200 @@ -106,7 +106,7 @@ tx_pend_list : Holds used Tx buffers waiting to go into the TBD ring TAIL modified ipw2100_tx() - HEAD modified by X__ipw2100_tx_send_data() + HEAD modified by ipw2100_tx_send_data() msg_free_list : Holds pre-allocated Msg (Command) buffers TAIL modified in __ipw2100_tx_process() @@ -114,7 +114,7 @@ msg_pend_list : Holds used Msg buffers waiting to go into the TBD ring TAIL modified in ipw2100_hw_send_command() - HEAD modified in X__ipw2100_tx_send_commands() + HEAD modified in ipw2100_tx_send_commands() The flow of data on the TX side is as follows: @@ -287,8 +279,8 @@ /* Pre-decl until we get the code solid and then we can clean it up */ -static void X__ipw2100_tx_send_commands(struct ipw2100_priv *priv); -static void X__ipw2100_tx_send_data(struct ipw2100_priv *priv); +static void ipw2100_tx_send_commands(struct ipw2100_priv *priv); +static void ipw2100_tx_send_data(struct ipw2100_priv *priv); static int ipw2100_adapter_setup(struct ipw2100_priv *priv); static void ipw2100_queues_initialize(struct ipw2100_priv *priv); @@ -2841,14 +2736,14 @@ while (__ipw2100_tx_process(priv) && i < 200) i++; if (i == 200) { - IPW_DEBUG_WARNING( + printk(KERN_WARNING DRV_NAME ": " "%s: Driver is running slow (%d iters).\n", priv->net_dev->name, i); } } -static void X__ipw2100_tx_send_commands(struct ipw2100_priv *priv) +static void ipw2100_tx_send_commands(struct ipw2100_priv *priv) { struct list_head *element; struct ipw2100_tx_packet *packet; @@ -2916,10 +2811,10 @@ /* - * X__ipw2100_tx_send_data + * ipw2100_tx_send_data * */ -static void X__ipw2100_tx_send_data(struct ipw2100_priv *priv) +static void ipw2100_tx_send_data(struct ipw2100_priv *priv) { struct list_head *element; struct ipw2100_tx_packet *packet; @@ -3134,8 +3029,8 @@ IPW2100_INTA_TX_TRANSFER); __ipw2100_tx_complete(priv); - X__ipw2100_tx_send_commands(priv); - X__ipw2100_tx_send_data(priv); + ipw2100_tx_send_commands(priv); + ipw2100_tx_send_data(priv); } if (inta & IPW2100_INTA_TX_COMPLETE) { @@ -3286,7 +3179,7 @@ list_add_tail(element, &priv->tx_pend_list); INC_STAT(&priv->tx_pend_stat); - X__ipw2100_tx_send_data(priv); + ipw2100_tx_send_data(priv); spin_unlock_irqrestore(&priv->low_lock, flags); return 0; From pavel@ucw.cz Wed Jun 8 06:32:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:32:43 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DWXXq025031 for ; Wed, 8 Jun 2005 06:32:36 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 1EC548B8A7; Wed, 8 Jun 2005 15:31:24 +0200 (CEST) Date: Wed, 8 Jun 2005 15:31:24 +0200 From: Pavel Machek To: Netdev list , "James P. Ketrenos" , Andrew Morton Subject: [-mm] ipw2100: assume recent kernel Message-ID: <20050608133123.GA3008@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2223 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 2060 Lines: 76 ipw2100 still has support for old kernels. Thats considered bad for patch in mainline... this fixes few instances. Signed-off-by: Pavel Machek --- /data/l/clean-mm/drivers/net/wireless/ipw2100.c 2005-06-08 12:11:29.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.c 2005-06-08 15:09:26.000000000 +0200 @@ -6719,17 +5873,9 @@ /* Remove the PRESENT state of the device */ netif_device_detach(dev); -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,10) - pci_save_state(pci_dev, priv->pm_state); -#else pci_save_state(pci_dev); -#endif pci_disable_device (pci_dev); -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,11) - pci_set_power_state(pci_dev, state); -#else pci_set_power_state(pci_dev, PCI_D3hot); -#endif up(&priv->action_sem); @@ -6750,17 +5896,9 @@ IPW_DEBUG_INFO("%s: Coming out of suspend...\n", dev->name); -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,11) - pci_set_power_state(pci_dev, 0); -#else pci_set_power_state(pci_dev, PCI_D0); -#endif pci_enable_device(pci_dev); -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,10) - pci_restore_state(pci_dev, priv->pm_state); -#else pci_restore_state(pci_dev); -#endif /* * Suspend/Resume resets the PCI configuration space, so we have to --- /data/l/clean-mm/drivers/net/wireless/ipw2100.h 2005-06-08 12:11:29.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.h 2005-06-08 15:07:31.000000000 +0200 @@ -44,30 +44,6 @@ #include -#ifndef IRQ_NONE -typedef void irqreturn_t; -#define IRQ_NONE -#define IRQ_HANDLED -#define IRQ_RETVAL(x) -#endif - -#if WIRELESS_EXT < 17 -#define IW_QUAL_QUAL_INVALID 0x10 -#define IW_QUAL_LEVEL_INVALID 0x20 -#define IW_QUAL_NOISE_INVALID 0x40 -#endif - -#if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,5) ) -#define pci_dma_sync_single_for_cpu pci_dma_sync_single -#define pci_dma_sync_single_for_device pci_dma_sync_single -#endif - -#ifndef HAVE_FREE_NETDEV -#define free_netdev(x) kfree(x) -#endif - - - struct ipw2100_priv; struct ipw2100_tx_packet; struct ipw2100_rx_packet; From tgraf@suug.ch Wed Jun 8 06:30:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:30:45 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DUaXq024064 for ; Wed, 8 Jun 2005 06:30:37 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id E5EC71C0F2; Wed, 8 Jun 2005 15:29:53 +0200 (CEST) Date: Wed, 8 Jun 2005 15:29:53 +0200 From: Thomas Graf To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: netdev munching messages again? Message-ID: <20050608132953.GK20969@postel.suug.ch> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> <20050607.144237.93024273.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050607.144237.93024273.davem@davemloft.net> X-archive-position: 2222 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 572 Lines: 14 * David S. Miller <20050607.144237.93024273.davem@davemloft.net> 2005-06-07 14:42 > I did get all of your postings because you sent them > with me on the CC: list, but netdev only sent out > 6 and 7 to me just as you observed. I tried to resend, the message were accepted by oss.sgi.com but none of them came back. Maybe dropped due to duplicated message ids though. > This has become a regular occurance, it may be time to finally move > this thing over to vger.kernel.org. Thoughts? I have no personal objections, would be valuable to take over the archives though. From pavel@ucw.cz Wed Jun 8 06:29:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:29:49 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DTMXq023795 for ; Wed, 8 Jun 2005 06:29:24 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 66F9E8B8A7; Wed, 8 Jun 2005 15:28:14 +0200 (CEST) Date: Wed, 8 Jun 2005 15:28:14 +0200 From: Pavel Machek To: Netdev list , "James P. Ketrenos" , Andrew Morton Subject: [-mm] ipw2100: cleanup debug prints Message-ID: <20050608132814.GA2634@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2221 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 12982 Lines: 412 ipw2100 uses custom debug prints that are sometimes longer and always harder to read than normal printk. They also introduced some bugs where prefix is printed twice. Signed-off-by: Pavel Machek --- /data/l/clean-mm/drivers/net/wireless/ipw2100.c 2005-06-08 12:11:29.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.c 2005-06-08 15:09:26.000000000 +0200 @@ -484,7 +476,7 @@ u32 total_length; if (ordinals->table1_addr == 0) { - IPW_DEBUG_WARNING(DRV_NAME ": attempt to use fw ordinals " + printk(KERN_WARNING DRV_NAME ": attempt to use fw ordinals " "before they have been loaded.\n"); return -EINVAL; } @@ -493,8 +485,7 @@ if (*len < IPW_ORD_TAB_1_ENTRY_SIZE) { *len = IPW_ORD_TAB_1_ENTRY_SIZE; - IPW_DEBUG_WARNING(DRV_NAME - ": ordinal buffer length too small, need %d\n", + printk(KERN_WARNING DRV_NAME ": ordinal buffer length too small, need %d\n", IPW_ORD_TAB_1_ENTRY_SIZE); return -EINVAL; @@ -546,7 +537,7 @@ return 0; } - IPW_DEBUG_WARNING(DRV_NAME ": ordinal %d neither in table 1 nor " + printk(KERN_WARNING DRV_NAME ": ordinal %d neither in table 1 nor " "in table 2\n", ord); return -EINVAL; @@ -761,7 +752,7 @@ } if (priv->fatal_error) { - IPW_DEBUG_WARNING("%s: firmware fatal error\n", + printk(KERN_WARNING DRV_NAME ": %s: firmware fatal error\n", priv->net_dev->name); return -EIO; } @@ -1001,7 +975,7 @@ /* load microcode */ err = ipw2100_ucode_download(priv, &ipw2100_firmware); if (err) { - IPW_DEBUG_ERROR("%s: Error loading microcode: %d\n", + printk(KERN_ERR DRV_NAME ": %s: Error loading microcode: %d\n", priv->net_dev->name, err); goto fail; } @@ -1014,7 +988,7 @@ /* s/w reset and clock stabilization (again!!!) */ err = sw_reset_and_clock(priv); if (err) { - IPW_DEBUG_ERROR("%s: sw_reset_and_clock failed: %d\n", + printk(KERN_ERR DRV_NAME ": %s: sw_reset_and_clock failed: %d\n", priv->net_dev->name, err); goto fail; } @@ -1210,7 +1163,7 @@ * fw & dino ucode */ if (ipw2100_download_firmware(priv)) { - IPW_DEBUG_ERROR("%s: Failed to power on the adapter.\n", + printk(KERN_ERR DRV_NAME ": %s: Failed to power on the adapter.\n", priv->net_dev->name); return -EIO; } @@ -1270,7 +1223,7 @@ i ? "SUCCESS" : "FAILED"); if (!i) { - IPW_DEBUG_WARNING("%s: Firmware did not initialize.\n", + printk(KERN_WARNING DRV_NAME ": %s: Firmware did not initialize.\n", priv->net_dev->name); return -EIO; } @@ -1466,7 +1416,7 @@ err = ipw2100_hw_phy_off(priv); if (err) - IPW_DEBUG_WARNING("Error disabling radio %d\n", err); + printk(KERN_WARNING DRV_NAME ": Error disabling radio %d\n", err); /* * If in D0-standby mode going directly to D3 may cause a @@ -1492,7 +1442,7 @@ err = ipw2100_hw_send_command(priv, &cmd); if (err) - IPW_DEBUG_WARNING( + printk(KERN_WARNING DRV_NAME ": " "%s: Power down command failed: Error %d\n", priv->net_dev->name, err); else { @@ -1533,7 +1483,7 @@ } if (i == 0) - IPW_DEBUG_WARNING(DRV_NAME + printk(KERN_WARNING DRV_NAME ": %s: Could now power down adapter.\n", priv->net_dev->name); @@ -1573,13 +1523,13 @@ err = ipw2100_hw_send_command(priv, &cmd); if (err) { - IPW_DEBUG_WARNING("exit - failed to send CARD_DISABLE command\n"); + printk(KERN_WARNING DRV_NAME ": exit - failed to send CARD_DISABLE command\n"); goto fail_up; } err = ipw2100_wait_for_card_state(priv, IPW_HW_STATE_DISABLED); if (err) { - IPW_DEBUG_WARNING("exit - card failed to change to DISABLED\n"); + printk(KERN_WARNING DRV_NAME ": exit - card failed to change to DISABLED\n"); goto fail_up; } @@ -1689,7 +1633,7 @@ (priv->status & STATUS_RESET_PENDING)) { /* Power cycle the card ... */ if (ipw2100_power_cycle_adapter(priv)) { - IPW_DEBUG_WARNING("%s: Could not cycle adapter.\n", + printk(KERN_WARNING DRV_NAME ": %s: Could not cycle adapter.\n", priv->net_dev->name); rc = 1; goto exit; @@ -1699,7 +1643,7 @@ /* Load the firmeware, start the clocks, etc. */ if (ipw2100_start_adapter(priv)) { - IPW_DEBUG_ERROR("%s: Failed to start the firmware.\n", + printk(KERN_ERR DRV_NAME ": %s: Failed to start the firmware.\n", priv->net_dev->name); rc = 1; goto exit; @@ -1709,7 +1653,7 @@ /* Determine capabilities of this particular HW configuration */ if (ipw2100_get_hw_features(priv)) { - IPW_DEBUG_ERROR("%s: Failed to determine HW features.\n", + printk(KERN_ERR DRV_NAME ": %s: Failed to determine HW features.\n", priv->net_dev->name); rc = 1; goto exit; @@ -1717,7 +1661,7 @@ lock = LOCK_NONE; if (ipw2100_set_ordinal(priv, IPW_ORD_PERS_DB_LOCK, &lock, &ord_len)) { - IPW_DEBUG_ERROR("%s: Failed to clear ordinal lock.\n", + printk(KERN_ERR DRV_NAME ": %s: Failed to clear ordinal lock.\n", priv->net_dev->name); rc = 1; goto exit; @@ -1743,7 +1687,7 @@ /* Send all of the commands that must be sent prior to * HOST_COMPLETE */ if (ipw2100_adapter_setup(priv)) { - IPW_DEBUG_ERROR("%s: Failed to start the card.\n", + printk(KERN_ERR DRV_NAME ": %s: Failed to start the card.\n", priv->net_dev->name); rc = 1; goto exit; @@ -1752,7 +1696,7 @@ if (!deferred) { /* Enable the adapter - sends HOST_COMPLETE */ if (ipw2100_enable_adapter(priv)) { - IPW_DEBUG_ERROR( + printk(KERN_ERR DRV_NAME ": " "%s: failed in call to enable adapter.\n", priv->net_dev->name); ipw2100_hw_stop_adapter(priv); @@ -1810,7 +1754,7 @@ spin_unlock_irqrestore(&priv->low_lock, flags); if (ipw2100_hw_stop_adapter(priv)) - IPW_DEBUG_ERROR("%s: Error stopping adapter.\n", + printk(KERN_ERR DRV_NAME ": %s: Error stopping adapter.\n", priv->net_dev->name); /* Do not disable the interrupt until _after_ we disable @@ -2417,7 +2312,7 @@ /* We need to allocate a new SKB and attach it to the RDB. */ if (unlikely(ipw2100_alloc_skb(priv, packet))) { - IPW_DEBUG_WARNING( + printk(KERN_WARNING DRV_NAME ": " "%s: Unable to allocate SKB onto RBD ring - disabling " "adapter.\n", priv->net_dev->name); /* TODO: schedule adapter shutdown */ @@ -2679,7 +2574,7 @@ break; default: - IPW_DEBUG_WARNING("%s: Bad fw_pend_list entry!\n", + printk(KERN_WARNING DRV_NAME ": %s: Bad fw_pend_list entry!\n", priv->net_dev->name); return 0; } @@ -2693,7 +2588,7 @@ read_register(priv->net_dev, IPW_MEM_HOST_SHARED_TX_QUEUE_WRITE_INDEX, &w); if (w != txq->next) - IPW_DEBUG_WARNING("%s: write index mismatch\n", + printk(KERN_WARNING DRV_NAME ": %s: write index mismatch\n", priv->net_dev->name); /* @@ -2754,7 +2649,7 @@ switch (packet->type) { case DATA: if (txq->drv[txq->oldest].status.info.fields.txType != 0) - IPW_DEBUG_WARNING("%s: Queue mismatch. " + printk(KERN_WARNING DRV_NAME ": %s: Queue mismatch. " "Expecting DATA TBD but pulled " "something else: ids %d=%d.\n", priv->net_dev->name, txq->oldest, packet->index); @@ -2801,7 +2696,7 @@ case COMMAND: if (txq->drv[txq->oldest].status.info.fields.txType != 1) - IPW_DEBUG_WARNING("%s: Queue mismatch. " + printk(KERN_WARNING DRV_NAME ": %s: Queue mismatch. " "Expecting COMMAND TBD but pulled " "something else: ids %d=%d.\n", priv->net_dev->name, txq->oldest, packet->index); @@ -3085,7 +2980,7 @@ (unsigned long)inta & IPW_INTERRUPT_MASK); if (inta & IPW2100_INTA_FATAL_ERROR) { - IPW_DEBUG_WARNING(DRV_NAME + printk(KERN_WARNING DRV_NAME ": Fatal interrupt. Scheduling firmware restart.\n"); priv->inta_other++; write_register( @@ -3105,7 +3000,7 @@ } if (inta & IPW2100_INTA_PARITY_ERROR) { - IPW_DEBUG_ERROR("***** PARITY ERROR INTERRUPT !!!! \n"); + printk(KERN_ERR DRV_NAME ": ***** PARITY ERROR INTERRUPT !!!! \n"); priv->inta_other++; write_register( dev, IPW_REG_INTA, @@ -3223,7 +3116,7 @@ if (inta == 0xFFFFFFFF) { /* Hardware disappeared */ - IPW_DEBUG_WARNING("IRQ INTA == 0xFFFFFFFF\n"); + printk(KERN_WARNING DRV_NAME ": IRQ INTA == 0xFFFFFFFF\n"); goto none; } @@ -3308,7 +3201,7 @@ IPW_COMMAND_POOL_SIZE * sizeof(struct ipw2100_tx_packet), GFP_KERNEL); if (!priv->msg_buffers) { - IPW_DEBUG_ERROR("%s: PCI alloc failed for msg " + printk(KERN_ERR DRV_NAME ": %s: PCI alloc failed for msg " "buffers.\n", priv->net_dev->name); return -ENOMEM; } @@ -3319,7 +3212,7 @@ sizeof(struct ipw2100_cmd_header), &p); if (!v) { - IPW_DEBUG_ERROR( + printk(KERN_ERR DRV_NAME ": " "%s: PCI alloc failed for msg " "buffers.\n", priv->net_dev->name); @@ -3826,7 +3289,7 @@ err = ipw2100_disable_adapter(priv); if (err) { - IPW_DEBUG_ERROR("%s: Could not disable adapter %d\n", + printk(KERN_ERR DRV_NAME ": %s: Could not disable adapter %d\n", priv->net_dev->name, err); return err; } @@ -4272,7 +3557,7 @@ TX_PENDED_QUEUE_LENGTH * sizeof(struct ipw2100_tx_packet), GFP_ATOMIC); if (!priv->tx_buffers) { - IPW_DEBUG_ERROR("%s: alloc failed form tx buffers.\n", + printk(KERN_ERR DRV_NAME ": %s: alloc failed form tx buffers.\n", priv->net_dev->name); bd_queue_free(priv, &priv->tx_queue); return -ENOMEM; @@ -4282,7 +3567,7 @@ v = pci_alloc_consistent( priv->pci_dev, sizeof(struct ipw2100_data_header), &p); if (!v) { - IPW_DEBUG_ERROR("%s: PCI alloc failed for tx " + printk(KERN_ERR DRV_NAME ": %s: PCI alloc failed for tx " "buffers.\n", priv->net_dev->name); err = -ENOMEM; break; @@ -4593,7 +3852,7 @@ if (!batch_mode) { err = ipw2100_disable_adapter(priv); if (err) { - IPW_DEBUG_ERROR("%s: Could not disable adapter %d\n", + printk(KERN_ERR DRV_NAME ": %s: Could not disable adapter %d\n", priv->net_dev->name, err); return err; } @@ -5212,7 +4390,7 @@ if (!batch_mode) { err = ipw2100_disable_adapter(priv); if (err) { - IPW_DEBUG_ERROR("%s: Could not disable adapter %d\n", + printk(KERN_ERR DRV_NAME ": %s: Could not disable adapter %d\n", priv->net_dev->name, err); return err; } @@ -5300,7 +4478,7 @@ err = ipw2100_disable_adapter(priv); /* FIXME: IPG: shouldn't this prink be in _disable_adapter()? */ if (err) { - IPW_DEBUG_ERROR("%s: Could not disable adapter %d\n", + printk(KERN_ERR DRV_NAME ": %s: Could not disable adapter %d\n", priv->net_dev->name, err); return err; } @@ -5336,7 +4514,7 @@ if (!batch_mode) { err = ipw2100_disable_adapter(priv); if (err) { - IPW_DEBUG_ERROR("%s: Could not disable adapter %d\n", + printk(KERN_ERR DRV_NAME ": %s: Could not disable adapter %d\n", priv->net_dev->name, err); return err; } @@ -5884,7 +5049,7 @@ break; default: - IPW_DEBUG_ERROR("%s: Unknown WPA param: %d\n", + printk(KERN_ERR DRV_NAME ": %s: Unknown WPA param: %d\n", dev->name, name); ret = -EOPNOTSUPP; } @@ -5907,7 +5072,7 @@ break; default: - IPW_DEBUG_ERROR("%s: Unknown MLME request: %d\n", + printk(KERN_ERR DRV_NAME ": %s: Unknown MLME request: %d\n", dev->name, command); ret = -EOPNOTSUPP; } @@ -6157,7 +5322,7 @@ break; default: - IPW_DEBUG_ERROR("%s: Unknown WPA supplicant request: %d\n", + printk(KERN_ERR DRV_NAME ": %s: Unknown WPA supplicant request: %d\n", dev->name, param->cmd); ret = -EOPNOTSUPP; @@ -8395,7 +7531,7 @@ (struct ipw2100_fw_header *)fw->fw_entry->data; if (IPW2100_FW_MAJOR(h->version) != IPW2100_FW_MAJOR_VERSION) { - IPW_DEBUG_WARNING("Firmware image not compatible " + printk(KERN_WARNING DRV_NAME ": Firmware image not compatible " "(detected version id of %u). " "See Documentation/networking/README.ipw2100\n", h->version); @@ -8438,7 +7574,7 @@ rc = request_firmware(&fw->fw_entry, fw_name, &priv->pci_dev->dev); if (rc < 0) { - IPW_DEBUG_ERROR( + printk(KERN_ERR DRV_NAME ": " "%s: Firmware '%s' not available or load failed.\n", priv->net_dev->name, fw_name); return rc; @@ -8520,7 +7656,7 @@ firmware_data_left -= 2; if (len > 32) { - IPW_DEBUG_ERROR( + printk(KERN_ERR DRV_NAME ": " "Invalid firmware run-length of %d bytes\n", len); return -EINVAL; @@ -8630,7 +7766,7 @@ } if (i == 10) { - IPW_DEBUG_ERROR("%s: Error initializing Symbol\n", + printk(KERN_ERR DRV_NAME ": %s: Error initializing Symbol\n", dev->name); return -EIO; } @@ -8651,7 +7787,7 @@ } if (i == 30) { - IPW_DEBUG_ERROR("%s: No response from Symbol - hw not alive\n", + printk(KERN_ERR DRV_NAME ": %s: No response from Symbol - hw not alive\n", dev->name); printk_buf(IPW_DL_ERROR, (u8*)&response, sizeof(response)); return -EIO; From pavel@ucw.cz Wed Jun 8 06:36:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:36:32 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DaEXq026067 for ; Wed, 8 Jun 2005 06:36:17 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id C9B8F8B8A7; Wed, 8 Jun 2005 15:35:06 +0200 (CEST) Date: Wed, 8 Jun 2005 15:35:06 +0200 From: Pavel Machek To: Netdev list , "James P. Ketrenos" , Andrew Morton Subject: [-mm] ipw2100: kill dead macros Message-ID: <20050608133506.GA3028@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2224 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 3849 Lines: 93 There are several never used macros in ipw2100. This removes them. Signed-off-by: Pavel Machek --- /data/l/clean-mm/drivers/net/wireless/ipw2100.c 2005-06-08 12:11:29.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.c 2005-06-08 15:09:26.000000000 +0200 @@ -1118,7 +1074,6 @@ { #define MAX_RF_KILL_CHECKS 5 #define RF_KILL_CHECK_DELAY 40 -#define RF_KILL_CHECK_THRESHOLD 3 unsigned short value = 0; u32 reg = 0; --- /data/l/clean-mm/drivers/net/wireless/ipw2100.h 2005-06-08 12:11:29.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.h 2005-06-08 15:07:31.000000000 +0200 @@ -167,15 +141,6 @@ #define IPW_DEBUG_STATE(f, a...) IPW_DEBUG(IPW_DL_STATE | IPW_DL_ASSOC | IPW_DL_INFO, f, ## a) #define IPW_DEBUG_ASSOC(f, a...) IPW_DEBUG(IPW_DL_ASSOC | IPW_DL_INFO, f, ## a) - -#define VERIFY(f) \ -{ \ - int status = 0; \ - status = f; \ - if(status) \ - return status; \ -} - enum { IPW_HW_STATE_DISABLED = 1, IPW_HW_STATE_ENABLED = 0 @@ -210,8 +175,6 @@ } info; } __attribute__ ((packed)); -#define IPW_BUFDESC_LAST_FRAG 0 - struct ipw2100_bd { u32 host_addr; u32 buf_length; @@ -648,9 +606,6 @@ struct semaphore adapter_sem; wait_queue_head_t wait_command_queue; -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,10) - u32 pm_state[PM_STATE_SIZE]; -#endif }; @@ -761,41 +707,6 @@ #define IPW_MEM_HOST_SHARED_TX_QUEUE_WRITE_INDEX \ (IPW_MEM_SRAM_HOST_INTERRUPT_AREA_LOWER_BOUND) - -#if 0 -#define IPW_MEM_HOST_SHARED_TX_QUEUE_0_BD_BASE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x00) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_0_BD_SIZE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x04) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_1_BD_BASE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x08) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_1_BD_SIZE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x0c) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_2_BD_BASE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x10) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_2_BD_SIZE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x14) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_3_BD_BASE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x18) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_3_BD_SIZE (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x1c) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_0_READ_INDEX (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x80) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_1_READ_INDEX (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x84) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_2_READ_INDEX (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x88) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_3_READ_INDEX (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x8c) - -#define IPW_MEM_HOST_SHARED_TX_QUEUE_BD_BASE(QueueNum) \ - (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + (QueueNum<<3)) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_BD_SIZE(QueueNum) \ - (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x0004+(QueueNum<<3)) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_READ_INDEX(QueueNum) \ - (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x0080+(QueueNum<<2)) - -#define IPW_MEM_HOST_SHARED_TX_QUEUE_0_WRITE_INDEX \ - (IPW_MEM_SRAM_HOST_INTERRUPT_AREA_LOWER_BOUND + 0x00) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_1_WRITE_INDEX \ - (IPW_MEM_SRAM_HOST_INTERRUPT_AREA_LOWER_BOUND + 0x04) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_2_WRITE_INDEX \ - (IPW_MEM_SRAM_HOST_INTERRUPT_AREA_LOWER_BOUND + 0x08) -#define IPW_MEM_HOST_SHARED_TX_QUEUE_3_WRITE_INDEX \ - (IPW_MEM_SRAM_HOST_INTERRUPT_AREA_LOWER_BOUND + 0x0c) -#define IPW_MEM_HOST_SHARED_SLAVE_MODE_INT_REGISTER \ - (IPW_MEM_SRAM_HOST_INTERRUPT_AREA_LOWER_BOUND + 0x78) - -#endif - #define IPW_MEM_HOST_SHARED_ORDINALS_TABLE_1 (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x180) #define IPW_MEM_HOST_SHARED_ORDINALS_TABLE_2 (IPW_MEM_SRAM_HOST_SHARED_LOWER_BOUND + 0x184) From hadi@cyberus.ca Wed Jun 8 06:37:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:37:35 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DbQXq026369 for ; Wed, 8 Jun 2005 06:37:27 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1Dg0jK-0004Ba-2A for netdev@oss.sgi.com; Wed, 08 Jun 2005 09:36:26 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1Dg0jE-0007OE-RI; Wed, 08 Jun 2005 09:36:20 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: jesse.brandeburg@intel.com, john.ronciak@intel.com, shemminger@osdl.org, mitch.a.williams@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com In-Reply-To: <20050607.204339.21591152.davem@davemloft.net> References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> <20050607.204339.21591152.davem@davemloft.net> Content-Type: text/plain Organization: unknown Date: Wed, 08 Jun 2005 09:36:15 -0400 Message-Id: <1118237775.6382.34.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2225 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 2789 Lines: 68 On Tue, 2005-07-06 at 20:43 -0700, David S. Miller wrote: > From: Jesse Brandeburg > Date: Tue, 7 Jun 2005 19:20:37 -0700 (PDT) [..] > > I tried the experiment today where I replenish buffers to hardware every > > 16 packets or so. This appears to mitigate all drops at the hardware > > level (no drops). We're still at 100% with the rc5 kernel, however. > > > > even with this replenish fix, the addition of dropping the weight to 16 > > helped increase our throughput, although only about 1%. > > Any minor timing difference of any kind can have up to a %3 or > %4 difference in TCP performance when the receiver is CPU > limited. > Agreed. [..] > I don't see how supertso can help the receiver, which is where > the RX drops should be occuring. That's a little weird. > > I can't believe a 2.5 GHZ machine can't keep up with a simple 1 Gbit > TCP stream. Do you have some other computation going on in that > system? As stated yesterday my 1.5 GHZ crappy sparc64 box can receive > a 1 Gbit TCP stream with much cpu to spare, my 750 MHZ sparc64 box can > nearly do so as well. > > Something is up, if a single gigabit TCP stream can fully CPU > load your machine. 10 gigabit, yeah, definitely all current > generation machines are cpu limited over that link speed, but > 1 gigabit should be no problem. > Yes, sir. BTW, all along i thought the sender and receiver are hooked up directly (there was some mention of chariot a while back). Even if they did have some smart ass thing in the middle that reorders, it is still suprising that such a fast CPU cant handle a mere one Gig of what seems to be MTU=1500 bytes sized packets. I suppose a netstat -s would help for visualization in addition to those dumps. Heres what i am deducing from their data, correct me if i am wrong: ->The evidence is that something is expensive in their code path (duh). -> Whatever that expensive thing code is, it not helped by them replenishing the descriptors after all the budget is exhausted since the descriptor departure rate is much slower than packet arrival. ---> This is why they would be seeing that the reduction of weight improves performance since the replenishing happens sooner with a smaller weight. ------> Clearly the driver needs some fixing - if they could do what their competitor's(who shall remain nameless) driver does or replenish more often, then that would go some way to help (Jesse's result with replenish after 16 is proof). This still hasnt resolved what the problem is but we may be getting close. Even if they SACKed for every packet, this still would not make any sense. So i think a profile of where the cycles are spent would also help. I am suspecting the driver at this point but i could be wrong. cheers, jamal From pavel@ucw.cz Wed Jun 8 06:43:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:43:42 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DhYXq027483 for ; Wed, 8 Jun 2005 06:43:36 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 5A81A8B8A7; Wed, 8 Jun 2005 15:42:26 +0200 (CEST) Date: Wed, 8 Jun 2005 15:42:26 +0200 From: Pavel Machek To: Netdev list , Jeff Garzik , kernel list Subject: Intel, please fix your email system Message-ID: <20050608134226.GA3063@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2226 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 1475 Lines: 47 Hi! I attempted to mail ipw2100 maintainer, and got back message about "inappropriate subject". Subject was "Subject: [-mm] ipw2100: kill dead macros". I do not think that's inappropriate. Pavel This is the Postfix program at host amd.ucw.cz. I'm sorry to have to inform you that your message could not be be delivered to one or more recipients. It's attached below. For further assistance, please send mail to If you do so, please include this problem report. You can delete your own text from the attached returned message. The Postfix program : host jf-in.intel.com[134.134.136.18] said: 553 5.0.0 Inappropriate subject (in reply to end of DATA command) [-- Attachment #2: Delivery report --] [-- Type: message/delivery-status, Encoding: 7bit, Size: 0.4K --] Reporting-MTA: dns; amd.ucw.cz X-Postfix-Queue-ID: C9B8F8B8A7 X-Postfix-Sender: rfc822; pavel@ucw.cz Arrival-Date: Wed, 8 Jun 2005 15:35:06 +0200 (CEST) Final-Recipient: rfc822; ipw2100-admin@linux.intel.com Action: failed Status: 5.0.0 Diagnostic-Code: X-Postfix; host jf-in.intel.com[134.134.136.18] said: 553 5.0.0 Inappropriate subject (in reply to end of DATA command) [-- Attachment #3: Undelivered Message --] [-- Type: message/rfc822, Encoding: 7bit, Size: 4.3K --] To: Netdev list , "James P. Ketrenos" , Andrew Morton From hadi@cyberus.ca Wed Jun 8 06:45:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 06:45:35 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58DjWXq028232 for ; Wed, 8 Jun 2005 06:45:32 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1Dg0r9-0000PY-Ee for netdev@oss.sgi.com; Wed, 08 Jun 2005 09:44:31 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1Dg0r5-0000bG-2M; Wed, 08 Jun 2005 09:44:27 -0400 Subject: Re: netdev munching messages again? From: jamal Reply-To: hadi@cyberus.ca To: Thomas Graf Cc: Ralf Baechle , "David S. Miller" , netdev@oss.sgi.com In-Reply-To: <20050608132953.GK20969@postel.suug.ch> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> <20050607.144237.93024273.davem@davemloft.net> <20050608132953.GK20969@postel.suug.ch> Content-Type: text/plain Organization: unknown Date: Wed, 08 Jun 2005 09:44:23 -0400 Message-Id: <1118238264.6382.43.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2227 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 1331 Lines: 36 I thought netdev just picks on me ;-> My stoopid ISP as well as oss.sgi.com have some "clever" (read: questionable) ways of delivering email which violates end to end semantics of SMTP. I too noticed some emails were swallowed in the last 1-2 days. I know from past experience in fact they will never be seen again;-> Or someone, who doesnt look at the headers, will flame me for repeating what has already been discussed and agreed on (has happened to me at least 5 times on netdev ;->). It's quiet ironic when packets delivered over TCP dont make it to the remote end, even when the app tries to help in reliable delivery;-> CCing El-sido Bacchus. cheers, jamal On Wed, 2005-08-06 at 15:29 +0200, Thomas Graf wrote: > * David S. Miller <20050607.144237.93024273.davem@davemloft.net> 2005-06-07 14:42 > > I did get all of your postings because you sent them > > with me on the CC: list, but netdev only sent out > > 6 and 7 to me just as you observed. > > I tried to resend, the message were accepted by oss.sgi.com but > none of them came back. Maybe dropped due to duplicated message > ids though. > > > This has become a regular occurance, it may be time to finally move > > this thing over to vger.kernel.org. Thoughts? > > I have no personal objections, would be valuable to take over the > archives though. > > From jdmason@gmail.com Wed Jun 8 07:11:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 07:11:37 -0700 (PDT) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.194]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58EBNXq009858 for ; Wed, 8 Jun 2005 07:11:23 -0700 Received: by rproxy.gmail.com with SMTP id r35so109835rna for ; Wed, 08 Jun 2005 07:10:17 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=geUxxG0ljjPL2a4f6MffJ84cdyCsfp+VYwRz0O5IDovAfsQhIHbbaYnGnmDEiWnaVKyLn43rEFVNPSuC01rBhbdtHcPXn88c2095MUldIL7WuAUyNlUKtvvrqZUVlELT7geFZBZ5RXSPl7mh1A9PoGVe1A6XA/vEGaQ2hltaarE= Received: by 10.11.88.30 with SMTP id l30mr91804cwb; Wed, 08 Jun 2005 07:10:16 -0700 (PDT) Received: by 10.11.100.22 with HTTP; Wed, 8 Jun 2005 07:10:16 -0700 (PDT) Message-ID: <8924577505060807103eac03b2@mail.gmail.com> Date: Wed, 8 Jun 2005 09:10:16 -0500 From: Jon Mason Reply-To: Jon Mason To: MOUNIER Emmanuel Subject: dl2k tx timeout problems Cc: netdev@oss.sgi.com In-Reply-To: <883AD1ABBCC79842ACCEDB1BE3E5B78C03B2B5F6@srvexch01siege.Outremer.rfo.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline References: <883AD1ABBCC79842ACCEDB1BE3E5B78C03B2B5F6@srvexch01siege.Outremer.rfo.fr> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j58EBNXq009858 X-archive-position: 2228 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jdmason@gmail.com Precedence: bulk X-list: netdev Content-Length: 8423 Lines: 261 Bonjour, Please see my comments below. On 6/8/05, MOUNIER Emmanuel wrote: > Hello! > I'm not sure, but I think the EMT 64 is for the extended PCI Slot (64bits), right? Actually no, EMT64 is Intel's version of 64bit extensions (similar to AMD's athlon64/opteron). This enables you to run a 64bit kernel or a 32bit kernel. If you installed a standard x86 version of Linux, then you are running in 32bit mode. if you installed something for x86_64 (sometimes called amd64), then you are running in 64bit mode. "uname -a" will show you which one you are running. > If it's true, yes, my card work in 64bits mode, and I think it's maybe the problem, because the DGE-550SX card work perfectly on some of our old server in standard PCI slot (32bits). I think this is worth noting. I'll investigate this with my copper adapter. > I've tried many kernel versions without success, but actually I'm running kernel 2.6.8.1-10smp on a Mandrake Linux 10.1. What other kernels have you tried? Have you tried a vanilla kernel from kernel.org? > Now, I will try to explain my problem as clear as I can: > > I've plugged the card, and turned on my Linux box. The card was detected perfectly and the module was loaded at the boot. > > I can assign an IP address to the card, and I'm able to ping my network. After a short time, the network traffic completely hangs and it says: TX timeout, is buffer full? how long (an estimate is fine) before the system experiences the tx timeout? What kind of network traffic is the systeming doing during this time? Are the systems idle? are they running NFS? > When I restart the network service, I can see in my logs that Linux simply disable the IRQ of my NIC: Can you send me the output of "lspci -v"? This will help confirm that no other devices shares the same interrupt. > /var/log/messages : > > > > FWRFO kernel: eth2: D-Link DGE-550SX Gigabit Ethernet Adapter, 00:0d:88:b5:f3:f5, IRQ 4 > > FWRFO kernel: tx_coalesce:^I16 packets > > FWRFO kernel: rx_coalesce:^I10 packets > > FWRFO kernel: rx_timeout: ^I128000 ns > > FWRFO kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue > > FWRFO kernel: You probably have a hardware problem with your RAM chips The error above is a memory parity error. That is definately not good. Are you seeing this error very often? > FWRFO kernel: eth2: Link off > > FWRFO kernel: eth2: Link up > > FWRFO kernel: Auto 1000 Mbps, Full duplex > > FWRFO kernel: Enable Tx Flow Control > > FWRFO kernel: Enable Rx Flow Control > > FWRFO kernel: irq 4: nobody cared! > > FWRFO kernel: [dump_stack+30/32] dump_stack+0x1e/0x20 > > FWRFO kernel: [] dump_stack+0x1e/0x20 > > FWRFO kernel: [__report_bad_irq+43/144] __report_bad_irq+0x2b/0x90 > > FWRFO kernel: [] __report_bad_irq+0x2b/0x90 > > FWRFO kernel: [note_interrupt+144/176] note_interrupt+0x90/0xb0 > > FWRFO kernel: [] note_interrupt+0x90/0xb0 > > FWRFO kernel: [do_IRQ+272/304] do_IRQ+0x110/0x130 > > FWRFO kernel: [] do_IRQ+0x110/0x130 > > FWRFO kernel: [common_interrupt+24/32] common_interrupt+0x18/0x20 > > FWRFO kernel: [] common_interrupt+0x18/0x20 > > FWRFO kernel: [do_softirq+53/64] do_softirq+0x35/0x40 > > FWRFO kernel: [] do_softirq+0x35/0x40 > > FWRFO kernel: [do_IRQ+279/304] do_IRQ+0x117/0x130 > > FWRFO kernel: [] do_IRQ+0x117/0x130 > > FWRFO kernel: [common_interrupt+24/32] common_interrupt+0x18/0x20 > > FWRFO kernel: [] common_interrupt+0x18/0x20 > > FWRFO kernel: [pg0+945814381/1069203456] rio_open+0x5d/0x210 [dl2k] > > FWRFO kernel: [] rio_open+0x5d/0x210 [dl2k] > > FWRFO kernel: [dev_open+232/256] dev_open+0xe8/0x100 > > FWRFO kernel: [] dev_open+0xe8/0x100 > > FWRFO kernel: [dev_change_flags+88/304] dev_change_flags+0x58/0x130 > > FWRFO kernel: [] dev_change_flags+0x58/0x130 > > FWRFO kernel: [devinet_ioctl+1392/1584] devinet_ioctl+0x570/0x630 > > FWRFO kernel: [] devinet_ioctl+0x570/0x630 > > FWRFO kernel: [inet_ioctl+192/208] inet_ioctl+0xc0/0xd0 > > FWRFO kernel: [] inet_ioctl+0xc0/0xd0 > > FWRFO kernel: [sock_ioctl+522/720] sock_ioctl+0x20a/0x2d0 > > FWRFO kernel: [] sock_ioctl+0x20a/0x2d0 > > FWRFO kernel: [sys_ioctl+586/662] sys_ioctl+0x24a/0x296 > > FWRFO kernel: [] sys_ioctl+0x24a/0x296 > > FWRFO kernel: [sysenter_past_esp+82/113] sysenter_past_esp+0x52/0x71 > > FWRFO kernel: [] sysenter_past_esp+0x52/0x71 > > FWRFO kernel: handlers: > > FWRFO kernel: [pg0+945816320/1069203456] (rio_interrupt+0x0/0xf0 [dl2k]) > > FWRFO kernel: [] (rio_interrupt+0x0/0xf0 [dl2k]) > > FWRFO kernel: Disabling IRQ #4 The bad interrupt is most likely related to the restarting of the network while the adapter is hung. > I went to the BIOS setup, and I set the system to not share the IRQ for my NIC. > > > > I've tried with several DLINK NIC of the same series, and in 4 DL-360 HP servers, so I don't think it's a hardware malfunction. > > > > I also tried to build a new kernel without power management, and with the Dlink drivers include in the kernel (not in a module). > > > > I can try as many debug patch as you want =) Great! I'm sure I'll have something for you to test. I can send you the patch that I sent to Richard. It solves the problem under light load, but the network will still hang under high load. > > > And sure, you can forward our mails to the Linux kernel network mailing list. I have CC'ed them on this e-mail, and changed the subject accordingly. > I have some knowledge in Linux OS, but I'm very poor in software development, so maybe you must explain me in details what I must do for patching, etc... > I'll be happy to explain when the time comes. > Thanks you very much, and sorry for my poor English... Your English is very good (and loads better than my French). > Emmanuel Mounier > > Chargé de projet direction Technique > > RFO ( www.rfo.fr ) > > mail : emmanuel.mounier@rfo.fr > > > ________________________________ > > De: Jon Mason [mailto:jdmason@gmail.com] > Date: mar. 07/06/2005 18:35 > À: MOUNIER Emmanuel > Objet : Re: Help : Big Problem With DLINK Fiber NIC > > > > Bonjour! > > I am happy to help. My previous experience has been with the copper > adapters (I have one at home), but the fiber ones should be fairly > similar. > > From "http://h18004.www1.hp.com/products/servers/proliantdl360/", I > see that your systems are EMT64. Are you running them in 64bit or > 32bit? What kernel version are you running? > > When you refer to the same problem, I assume you mean tx timeouts. > How are you causing the error? > > I never fully fixed Richards issue, but I was able to get it working > under light traffic. I got side tracked, and have't looked at the > problem in a little while. Are you willing to try some debug patches? > > With your approval, I would like to CC the netdev mailing list > (netdev@oss.sgi.com) on these e-mails. netdev is the linux kernel > network mailing list (incase you didn't already know). > > Thanks, > Jon > > On 6/7/05, MOUNIER Emmanuel wrote: > > > > > > > > Hello. > > > > I'm a french network manager, and I have a big problem with some Dlink > > Fiber Network cards (DGE-550SX). > > > > I've seen on a website that you helped Mr Richard EMS to try to find a > > solution. > > (http://www.ussg.iu.edu/hypermail/linux/kernel/0412.2/0371.html) > > > > I've contacted him, but he said he have bought another Fiber NIC card. > > > > My problem is that I have 13 DGE-550SX cards for 8 HP Server Proliant > > DL-360 G4, and I have the same problem. > > > > Just want to know if you have any idea now, or maybe, if you can bring me > > some help... > > > > Fiber NIC card is very expensive, and I hope I will find a way to solve the > > problem but, either DLink or HP seem to be able to give me a solution. > > > > If I can do something to help you, just tell me what ! > > > > Thanks per advance. > > > > Emmanuel Mounier > > Chargé de projet direction Technique > > RFO ( www.rfo.fr ) > > mail : emmanuel.mounier@rfo.fr > > > > From pavel@ucw.cz Wed Jun 8 07:22:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 07:22:11 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58ELtXq019711 for ; Wed, 8 Jun 2005 07:21:57 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id C96F98B8A7; Wed, 8 Jun 2005 16:20:47 +0200 (CEST) Date: Wed, 8 Jun 2005 16:20:47 +0200 From: Pavel Machek To: Jeff Garzik , Netdev list , Andrew Morton Subject: [-mm] ipw2100: small cleanups Message-ID: <20050608142047.GA2310@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2229 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 2152 Lines: 61 Fix few typos/thinkos in ipw, remove ugly macro (it is commented around, anyway), and fix types passed to pci_set_power_state. --- middle-mm//drivers/net/wireless/ipw2100.c 2005-06-08 16:15:23.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.c 2005-06-08 16:16:27.000000000 +0200 @@ -916,7 +916,7 @@ } /********************************************************************* - Procedure : ipw2100_ipw2100_download_firmware + Procedure : ipw2100_download_firmware Purpose : Initiaze adapter after power on. The sequence is: 1. assert s/w reset first! @@ -1150,7 +1150,6 @@ */ static int ipw2100_start_adapter(struct ipw2100_priv *priv) { -#define IPW_WAIT_FW_INIT_COMPLETE_DELAY (40 * HZ / 1000) int i; u32 inta, inta_mask, gpio; @@ -1185,7 +1184,7 @@ i = 5000; do { set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(IPW_WAIT_FW_INIT_COMPLETE_DELAY); + schedule_timeout(40 * HZ / 1000); /* Todo... wait for sync command ... */ read_register(priv->net_dev, IPW_REG_INTA, &inta); @@ -1641,7 +1640,7 @@ } else priv->status |= STATUS_POWERED; - /* Load the firmeware, start the clocks, etc. */ + /* Load the firmware, start the clocks, etc. */ if (ipw2100_start_adapter(priv)) { printk(KERN_ERR DRV_NAME ": %s: Failed to start the firmware.\n", priv->net_dev->name); @@ -5679,7 +5678,7 @@ if ((val & 0x0000ff00) != 0) pci_write_config_dword(pci_dev, 0x40, val & 0xffff00ff); - pci_set_power_state(pci_dev, 0); + pci_set_power_state(pci_dev, PCI_D0); if (!ipw2100_hw_is_adapter_in_system(dev)) { printk(KERN_WARNING DRV_NAME @@ -7206,7 +7205,7 @@ ipw2100_wx_set_wap, /* SIOCSIWAP */ ipw2100_wx_get_wap, /* SIOCGIWAP */ NULL, /* -- hole -- */ - NULL, /* SIOCGIWAPLIST -- depricated */ + NULL, /* SIOCGIWAPLIST -- deprecated */ ipw2100_wx_set_scan, /* SIOCSIWSCAN */ ipw2100_wx_get_scan, /* SIOCGIWSCAN */ ipw2100_wx_set_essid, /* SIOCSIWESSID */ From pavel@ucw.cz Wed Jun 8 07:24:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 07:24:34 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58EOIXq020159 for ; Wed, 8 Jun 2005 07:24:20 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 06E7B8B8A7; Wed, 8 Jun 2005 16:23:10 +0200 (CEST) Date: Wed, 8 Jun 2005 16:23:10 +0200 From: Pavel Machek To: Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: ipw2100: firmware problem Message-ID: <20050608142310.GA2339@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2230 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 589 Lines: 15 Hi! I'm fighting with firmware problem: if ipw2100 is compiled into kernel, it is loaded while kernel boots and firmware loader is not yet available. That leads to uninitialized (=> useless) adapter. What's the prefered way to solve this one? Only load firmware when user does ifconfig eth1 up? [It is wifi, it looks like it would be better to start firmware sooner so that it can associate to the AP...]. Last initcall available in kernel is late_initcall; that's not late enough for me. Is adding one more initcall that is started when userland is available a solution? Pavel From vda@ilport.com.ua Wed Jun 8 07:45:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 07:45:50 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j58EjhXq022278 for ; Wed, 8 Jun 2005 07:45:45 -0700 Received: (qmail 14350 invoked by alias); 8 Jun 2005 14:44:30 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 08 Jun 2005 14:44:25 -0000 From: Denis Vlasenko To: Pavel Machek , Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem Date: Wed, 8 Jun 2005 17:44:20 +0300 User-Agent: KMail/1.5.4 References: <20050608142310.GA2339@elf.ucw.cz> In-Reply-To: <20050608142310.GA2339@elf.ucw.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506081744.20687.vda@ilport.com.ua> X-archive-position: 2231 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 1194 Lines: 34 On Wednesday 08 June 2005 17:23, Pavel Machek wrote: > Hi! > > I'm fighting with firmware problem: if ipw2100 is compiled into > kernel, it is loaded while kernel boots and firmware loader is not yet > available. That leads to uninitialized (=> useless) adapter. > > What's the prefered way to solve this one? Only load firmware when > user does ifconfig eth1 up? [It is wifi, it looks like it would be > better to start firmware sooner so that it can associate to the > AP...]. Do you want to associate to an AP when your kernel boots, _before_ any iwconfig had a chance to configure anything? That's strange. My position is that wifi drivers must start up in an "OFF" mode. Do not send anything. Do not join APs or start IBSS. Thus, no need to load fw in early boot. Driver may load firmware and start actively doing something only when iwconfig gets executed and thus driver is instructed what to do. Some drivers currently do not act this way, and thus less advanced users may unknowingly run a wireless STA (or worse, an AP!) on their notebook for years, interfering with neighbors and/or violating local regulations (there are countrles where 802.11 use needs licensing). -- vda From abonilla@linuxwireless.org Wed Jun 8 08:06:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 08:07:03 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58F6oXq024016 for ; Wed, 8 Jun 2005 08:06:51 -0700 Received: from WCRSJO2KPAB047 ([200.9.49.66]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with SMTP id j58F5Vl1024801; Wed, 8 Jun 2005 11:05:31 -0400 Reply-To: From: "Alejandro Bonilla" To: "'Denis Vlasenko'" , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" Subject: RE: ipw2100: firmware problem Date: Wed, 8 Jun 2005 09:05:27 -0600 Message-ID: <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <200506081744.20687.vda@ilport.com.ua> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478 Importance: Normal X-archive-position: 2232 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev Content-Length: 1656 Lines: 48 > On Wednesday 08 June 2005 17:23, Pavel Machek wrote: > > Hi! > > > > I'm fighting with firmware problem: if ipw2100 is compiled into > > kernel, it is loaded while kernel boots and firmware loader > is not yet > > available. That leads to uninitialized (=> useless) adapter. Pavel, I might be lost here but... How is the firmware loaded when using the ipw2100-1.0.0/patches Kernel patch? That patch normally works fine. It might not be the way you kernel developers would like it, but maybe that could work the same way? > > > > What's the prefered way to solve this one? Only load firmware when > > user does ifconfig eth1 up? [It is wifi, it looks like it would be > > better to start firmware sooner so that it can associate to the > > AP...]. > > Do you want to associate to an AP when your kernel boots, > _before_ any iwconfig had a chance to configure anything? > That's strange. Currently, when we install the driver, it associates to any open network on boot. This is good, cause we don't want to be typing the commands all the time just to associate. It works this way now and is pretty nice. > > My position is that wifi drivers must start up in an "OFF" mode. > Do not send anything. Do not join APs or start IBSS. > Thus, no need to load fw in early boot. > So, to scan a network, I would have to do ifconfig eth1 up ; iwlist eth1 scan? When moving from modes with the firmwares, would I have to do ifconfig eth1 up ; iwconfig eth1 mode monitor? or would the firmware be loaded with iwconfig? Does it have that function? I'm not sure, but I guess that you guys should use the same method that the source/patches uses? .Alejandro From jb@suse.cz Wed Jun 8 08:11:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 08:11:52 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58FBlXq024849 for ; Wed, 8 Jun 2005 08:11:48 -0700 Received: from dwarf.suse.cz (dwarf.suse.cz [10.20.1.32]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 24EBE628302; Wed, 8 Jun 2005 17:10:43 +0200 (CEST) Received: by dwarf.suse.cz (Postfix, from userid 10013) id CE63A12F101; Wed, 8 Jun 2005 16:56:53 +0200 (CEST) Date: Wed, 8 Jun 2005 16:56:53 +0200 From: Jirka Bohac To: Denis Vlasenko Cc: Pavel Machek , Jeff Garzik , Netdev list , kernel list Subject: Re: ipw2100: firmware problem Message-ID: <20050608145653.GA8844@dwarf.suse.cz> References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200506081744.20687.vda@ilport.com.ua> User-Agent: Mutt/1.5.6i X-archive-position: 2233 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbohac@suse.cz Precedence: bulk X-list: netdev Content-Length: 1488 Lines: 44 On Wed, Jun 08, 2005 at 05:44:20PM +0300, Denis Vlasenko wrote: > On Wednesday 08 June 2005 17:23, Pavel Machek wrote: > > What's the prefered way to solve this one? Only load firmware when > > user does ifconfig eth1 up? [It is wifi, it looks like it would be > > better to start firmware sooner so that it can associate to the > > AP...]. > > Do you want to associate to an AP when your kernel boots, > _before_ any iwconfig had a chance to configure anything? > That's strange. > > My position is that wifi drivers must start up in an "OFF" mode. > Do not send anything. Do not join APs or start IBSS. Agreed. > Thus, no need to load fw in early boot. I don't think this is true. Loading the firmware on the first "ifconfig up" is problematic. Often, people want to rename the device from ethX/wlanX/... to something stable. This is usually based on the adapter's MAC address, which is not visible until the firmware is loaded. Prism54 does it this way and it really sucks. You need to bring the adapter up to load the firmware, then bring it back down, rename it, and bring it up again. Denis: any plans for this to be fixed? I agree that drivers should initialize the adapter in the OFF state, but the firmware needs to be loaded earlier than the first ifconfig up. How about loading the firmware when the first ioctl touches the device? This way, it would get loaded just before the MAC address is retrieved. regards, -- Jirka Bohac SUSE Labs, SUSE CR From jbenc@suse.cz Wed Jun 8 08:24:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 08:25:02 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58FOnXq029653 for ; Wed, 8 Jun 2005 08:24:50 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 5586A628302; Wed, 8 Jun 2005 17:23:45 +0200 (CEST) Date: Wed, 8 Jun 2005 17:23:45 +0200 From: Jiri Benc To: Cc: "'Denis Vlasenko'" , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" Subject: Re: ipw2100: firmware problem Message-ID: <20050608172345.64613254@griffin.suse.cz> In-Reply-To: <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> References: <200506081744.20687.vda@ilport.com.ua> <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2234 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 1157 Lines: 30 On Wed, 8 Jun 2005 09:05:27 -0600, Alejandro Bonilla wrote: > I might be lost here but... How is the firmware loaded when using the > ipw2100-1.0.0/patches Kernel patch? It is loaded by request_firmware() during initialization of the adapter. That doesn't work, as at that time no hotplug binary can be executed (we are talking about ipw2100 built in the kernel, not built as a module). > Currently, when we install the driver, it associates to any open network on > boot. This is good, cause we don't want to be typing the commands all the > time just to associate. It works this way now and is pretty nice. It sounds very dangerous to me. > So, to scan a network, I would have to do ifconfig eth1 up ; iwlist eth1 > scan? No. Driver should request the firmware when it is told to perform a scan. > When moving from modes with the firmwares, would I have to do ifconfig eth1 > up ; iwconfig eth1 mode monitor? or would the firmware be loaded with > iwconfig? Does it have that function? Firmware can be loaded automatically by the driver when there is some request from userspace and the firmware has not been loaded yet. -- Jiri Benc SUSE Labs From ralf@linux-mips.org Wed Jun 8 09:09:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 09:09:07 -0700 (PDT) Received: from bacchus.net.dhis.org (extgw-uk.mips.com [62.254.210.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58G91Xq032414 for ; Wed, 8 Jun 2005 09:09:02 -0700 Received: from dea.linux-mips.net (localhost.localdomain [127.0.0.1]) by bacchus.net.dhis.org (8.13.1/8.13.1) with ESMTP id j58G4noG021210; Wed, 8 Jun 2005 17:04:50 +0100 Received: (from ralf@localhost) by dea.linux-mips.net (8.13.1/8.13.1/Submit) id j58G4jgg021193; Wed, 8 Jun 2005 17:04:45 +0100 Date: Wed, 8 Jun 2005 17:04:45 +0100 From: Ralf Baechle To: jamal Cc: Thomas Graf , "David S. Miller" , netdev@oss.sgi.com Subject: Re: netdev munching messages again? Message-ID: <20050608160444.GA17777@linux-mips.org> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> <20050607.144237.93024273.davem@davemloft.net> <20050608132953.GK20969@postel.suug.ch> <1118238264.6382.43.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1118238264.6382.43.camel@localhost.localdomain> User-Agent: Mutt/1.4.1i X-archive-position: 2235 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: netdev Content-Length: 1383 Lines: 30 On Wed, Jun 08, 2005 at 09:44:23AM -0400, jamal wrote: > I thought netdev just picks on me ;-> My stoopid ISP as well > as oss.sgi.com have some "clever" (read: questionable) ways > of delivering email which violates end to end semantics of SMTP. > I too noticed some emails were swallowed in the last 1-2 days. I know > from past experience in fact they will never be seen again;-> > Or someone, who doesnt look at the headers, will flame me for repeating > what has already been discussed and agreed on (has happened to me at > least 5 times on netdev ;->). > > It's quiet ironic when packets delivered over TCP dont make it to the > remote end, even when the app tries to help in reliable delivery;-> > > CCing El-sido Bacchus. Turns out that Thomas Graf's Email was intercepted by the spam filter, so I've tweaked the filter setup a bit - probably at the price of sacrificing some of the filter's effectivity. It unfortunately has become totally impractical to walk through the hundreds of moderator emails every day due to the volume, so I need to rely on people to report about such problem to postmaster@oss.sgi.com or me directly via email or irc. As for resending messages, due to people doing stupid things such as restoring their mail and news spools oss is keeping a non-expiring list of message IDs. However only non-spam message IDs are being recorded. Ralf From tgraf@suug.ch Wed Jun 8 09:14:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 09:14:20 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58GE7Xq000817 for ; Wed, 8 Jun 2005 09:14:16 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id C2EDE1C0F2; Wed, 8 Jun 2005 18:13:14 +0200 (CEST) Date: Wed, 8 Jun 2005 18:13:14 +0200 From: Thomas Graf To: Ralf Baechle Cc: jamal , "David S. Miller" , netdev@oss.sgi.com Subject: Re: netdev munching messages again? Message-ID: <20050608161314.GM20969@postel.suug.ch> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> <20050607.144237.93024273.davem@davemloft.net> <20050608132953.GK20969@postel.suug.ch> <1118238264.6382.43.camel@localhost.localdomain> <20050608160444.GA17777@linux-mips.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608160444.GA17777@linux-mips.org> X-archive-position: 2236 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 565 Lines: 12 * Ralf Baechle <20050608160444.GA17777@linux-mips.org> 2005-06-08 17:04 > Turns out that Thomas Graf's Email was intercepted by the spam filter, > so I've tweaked the filter setup a bit - probably at the price of > sacrificing some of the filter's effectivity. Can you tell me why it was filtered? It might be a problem with my patch script which I could fix on my side. How much work would it be to whitelist a few people? The weird thing is that patches 6-7 which could not be delivered immediately due to connection refused from oss.sgi.com came through fine. From belyshev@depni.sinp.msu.ru Wed Jun 8 09:17:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 09:17:59 -0700 (PDT) Received: from depni.sinp.msu.ru (depni.sinp.msu.ru [213.131.7.21]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58GHiXq001502 for ; Wed, 8 Jun 2005 09:17:47 -0700 Received: by depni.sinp.msu.ru (Postfix, from userid 1109) id 585CAD6C28; Wed, 8 Jun 2005 20:16:40 +0400 (MSD) To: netdev@oss.sgi.com Subject: Kernel BUG at "net/ipv4/tcp_output.c":928 From: belyshev@depni.sinp.msu.ru In-Reply-To: <56hdg93rxb.fsf@depni.sinp.msu.ru> Date: Wed, 08 Jun 2005 20:16:40 +0400 Message-ID: <561x7c963b.fsf@depni.sinp.msu.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2237 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: belyshev@depni.sinp.msu.ru Precedence: bulk X-list: netdev Content-Length: 2348 Lines: 42 >Seems that this oops happens only if using hostap. Apparently this has nothing to do with hostap, as I was able to reproduce this without it: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at "net/ipv4/tcp_output.c":928 invalid operand: 0000 [1] CPU 0 Modules linked in: Pid: 2854, comm: nc Not tainted 2.6.12-rc6-mm1-gcc34 RIP: 0010:[] {tcp_tso_should_defer+55} RSP: 0018:ffff810018b29c08 EFLAGS: 00010246 RAX: 000000000000002c RBX: ffff81001e0cdd40 RCX: 0000000005a80100 RDX: ffff81001e0cdd40 RSI: ffff81001ed44040 RDI: 0000000000000002 RBP: ffff81001ed44040 R08: 0000000000000000 R09: ffff810018b29d60 R10: 0000000000000002 R11: ffffffff8018c200 R12: ffff81001ed44040 R13: ffff81001ed44040 R14: 000000000000002d R15: 00000000000005a8 FS: 00002aaaaae00c80(0000) GS:ffffffff8081c840(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000007bd0d8 CR3: 0000000018b7d000 CR4: 00000000000006e0 Process nc (pid: 2854, threadinfo ffff810018b28000, task ffff81001f704f70) Stack: ffff81001e0cdd40 ffffffff803daea4 ffff81001ed440d8 0000000000000296 0000000100000001 ffff81001ed44040 ffff81001ed44040 0000000000000000 0000000000000000 ffff81001a824088 Call Trace:{tcp_write_xmit+212} {__tcp_push_pending_frames+41} {tcp_close+595} {inet_release+88} {sock_release+33} {sock_close+53} {__fput+194} {filp_close+110} {put_files_struct+115} {do_exit+484} {__dequeue_signal+501} {do_group_exit+159} {get_signal_to_deliver+1239} {do_signal+162} {pipe_readv+823} {cond_resched+56} {inotify_inode_queue_event+49} {autoremove_wake_function+0} {vfs_write+317} {sysret_signal+28} {ptregscall_common+103} Code: 0f 0b 56 a9 4e 80 ff ff ff ff a0 03 44 8b 86 14 03 00 00 44 RIP {tcp_tso_should_defer+55} RSP <1>Fixing recursive fault but reboot is needed! From jketreno@linux.intel.com Wed Jun 8 10:00:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 10:00:11 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58H08Xq004735 for ; Wed, 8 Jun 2005 10:00:08 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j58Gx3Ob017565; Wed, 8 Jun 2005 16:59:03 GMT Received: from [192.168.1.154] (hdlrvguser-123.hd.intel.com [10.127.52.142]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j58Gx04G027033; Wed, 8 Jun 2005 16:59:01 GMT Message-ID: <42A723D3.3060001@linux.intel.com> Date: Wed, 08 Jun 2005 11:58:59 -0500 From: James Ketrenos User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050519 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pavel Machek CC: Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> In-Reply-To: <20050608142310.GA2339@elf.ucw.cz> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2238 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jketreno@linux.intel.com Precedence: bulk X-list: netdev Content-Length: 1099 Lines: 31 Pavel Machek wrote: >Hi! > >I'm fighting with firmware problem: if ipw2100 is compiled into >kernel, it is loaded while kernel boots and firmware loader is not yet >available. That leads to uninitialized (=> useless) adapter. > > We've been looking into whether the initrd can have the firmware affixed to the end w/ some magic bytes to identify it. If it works, enhancing the request_firmware to support both hotplug and an initrd approach may be reasonable. >What's the prefered way to solve this one? Only load firmware when >user does ifconfig eth1 up? [It is wifi, it looks like it would be >better to start firmware sooner so that it can associate to the >AP...]. > > The debate goes back and forth on whether devices should come up only after they are told, or initialize and start looking for a network as soon as the module is loaded. I lean more toward having the driver just do what it is told, defaulting to trying to scan and associate so link is ready as soon as possible. We've added module parameters to change that behavior (disable and associate for the ipw2100). James From jketreno@linux.intel.com Wed Jun 8 10:12:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 10:12:05 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58HC2Xq005934 for ; Wed, 8 Jun 2005 10:12:02 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j58HAhOb020794; Wed, 8 Jun 2005 17:10:43 GMT Received: from [192.168.1.154] (hdlrvguser-123.hd.intel.com [10.127.52.142]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j58HAb4G008081; Wed, 8 Jun 2005 17:10:39 GMT Message-ID: <42A7268D.9020402@linux.intel.com> Date: Wed, 08 Jun 2005 12:10:37 -0500 From: James Ketrenos User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050519 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Denis Vlasenko CC: Pavel Machek , Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> In-Reply-To: <200506081744.20687.vda@ilport.com.ua> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2239 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jketreno@linux.intel.com Precedence: bulk X-list: netdev Content-Length: 968 Lines: 24 Denis Vlasenko wrote: >My position is that wifi drivers must start up in an "OFF" mode. >Do not send anything. Do not join APs or start IBSS. >Thus, no need to load fw in early boot. > > This should be an option for the user if that is the desired behavior. We support that with the ipw2100 and ipw2200 projects via module parameters to disable the radio during module load. Having a standard module parameter for wireless drivers would be nice. My approach is to make the driver so it supports as many usage models as possible, leaving policy to other components of the system. If the user wants it to scan and associate immediately, that should be supported. Likewise if they want the module to be loaded w/ the radio off, they can do that as well. Since most (if not all) laptops support an RF kill switch, I tend to defer to the physical switch as the user's point of control and set the driver defaults to try and use the radio if it is enabled. James From ralf@linux-mips.org Wed Jun 8 10:32:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 10:32:35 -0700 (PDT) Received: from bacchus.net.dhis.org (extgw-uk.mips.com [62.254.210.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58HWRXq007570 for ; Wed, 8 Jun 2005 10:32:28 -0700 Received: from dea.linux-mips.net (localhost.localdomain [127.0.0.1]) by bacchus.net.dhis.org (8.13.1/8.13.1) with ESMTP id j58HSA8w024789; Wed, 8 Jun 2005 18:28:10 +0100 Received: (from ralf@localhost) by dea.linux-mips.net (8.13.1/8.13.1/Submit) id j58HS9BU024788; Wed, 8 Jun 2005 18:28:09 +0100 Date: Wed, 8 Jun 2005 18:28:09 +0100 From: Ralf Baechle To: Thomas Graf Cc: jamal , "David S. Miller" , netdev@oss.sgi.com Subject: Re: netdev munching messages again? Message-ID: <20050608172809.GF5520@linux-mips.org> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> <20050607.144237.93024273.davem@davemloft.net> <20050608132953.GK20969@postel.suug.ch> <1118238264.6382.43.camel@localhost.localdomain> <20050608160444.GA17777@linux-mips.org> <20050608161314.GM20969@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608161314.GM20969@postel.suug.ch> User-Agent: Mutt/1.4.1i X-archive-position: 2240 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: netdev Content-Length: 1398 Lines: 30 On Wed, Jun 08, 2005 at 06:13:14PM +0200, Thomas Graf wrote: > * Ralf Baechle <20050608160444.GA17777@linux-mips.org> 2005-06-08 17:04 > > Turns out that Thomas Graf's Email was intercepted by the spam filter, > > so I've tweaked the filter setup a bit - probably at the price of > > sacrificing some of the filter's effectivity. > > Can you tell me why it was filtered? It might be a problem with my > patch script which I could fix on my side. How much work would it > be to whitelist a few people? Whitelists tend to be problematic due to the enormous amounts of spam and malware emails that come with forged email addresses. I'll send you the rules in question and the original spamyness scores they did compute in separate email, it's somewhat bigish. > The weird thing is that patches 6-7 which could not be delivered > immediately due to connection refused from oss.sgi.com came through > fine. Whatever it was, it was probably a separate issue from this spamfilter faux-pas. Note there is a firewall in front of oss.sgi.com which will accept the SMTP TCP connection only to drop the connection shortly after if it can't build a connection to the "real" oss. So if you only get a connection refused message when telneting to oss it really means the firewall had some issues. Unfortunately I don't have any control over it, if I had I'd replace it with a nice patchcable ;-) Ralf From dlstevens@us.ibm.com Wed Jun 8 11:51:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 11:51:49 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58IpXXq012800 for ; Wed, 8 Jun 2005 11:51:40 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j58IoHPX393976 for ; Wed, 8 Jun 2005 14:50:19 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j58IoGAA184008 for ; Wed, 8 Jun 2005 12:50:17 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j58IoG1B004009 for ; Wed, 8 Jun 2005 12:50:16 -0600 Received: from d03nm121.boulder.ibm.com (d03nm121.boulder.ibm.com [9.17.195.147]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j58Io56i003150; Wed, 8 Jun 2005 12:50:16 -0600 In-Reply-To: <20050607.171423.106079530.yoshfuji@linux-ipv6.org> To: YOSHIFUJI Hideaki / =?ISO-2022-JP?B?GyRCNUhGIzFRTEAbKEI=?= Cc: davem@davemloft.net, netdev@oss.sgi.com MIME-Version: 1.0 Subject: Re: IPV6 RFC3542 compliance [PATCH] X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Message-ID: From: David Stevens Date: Wed, 8 Jun 2005 12:49:36 -0600 X-MIMETrack: Serialize by Router on D03NM121/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 06/08/2005 12:50:15 Content-Type: multipart/mixed; boundary="=_mixed 006767FF8825701A_=" X-archive-position: 2250 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dlstevens@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 48419 Lines: 971 --=_mixed 006767FF8825701A_= Content-Type: text/plain; charset="US-ASCII" Below is a patch that adds a warning message for rfc2292-style use of socket options, and uses a different numeric value for options that have a different meaning in the rfc3542 API. It also, of course, adds support for sending and receiving traffic class, and the new IPV6_RECVx socket options from rfc3542. +-DLS Signed-off-by: David L Stevens diff -ruNp linux-2.6.11.10/include/linux/in6.h linux-2.6.11.10T3/include/linux/in6.h --- linux-2.6.11.10/include/linux/in6.h 2005-05-16 10:51:43.000000000 -0700 +++ linux-2.6.11.10T3/include/linux/in6.h 2005-06-08 10:45:25.000000000 -0700 @@ -148,10 +148,10 @@ struct in6_flowlabel_req */ #define IPV6_ADDRFORM 1 -#define IPV6_PKTINFO 2 -#define IPV6_HOPOPTS 3 -#define IPV6_DSTOPTS 4 -#define IPV6_RTHDR 5 +#define IPV6_2292PKTINFO 2 +#define IPV6_2292HOPOPTS 3 +#define IPV6_2292DSTOPTS 4 +#define IPV6_2292RTHDR 5 #define IPV6_PKTOPTIONS 6 #define IPV6_CHECKSUM 7 #define IPV6_HOPLIMIT 8 @@ -184,6 +184,12 @@ struct in6_flowlabel_req #define IPV6_IPSEC_POLICY 34 #define IPV6_XFRM_POLICY 35 +#define IPV6_RECVPKTINFO 36 +#define IPV6_RECVHOPLIMIT 37 +#define IPV6_RECVRTHDR 38 +#define IPV6_RECVHOPOPTS 39 +#define IPV6_RECVDSTOPTS 40 +#define IPV6_RECVTCLASS 41 /* * Multicast: @@ -198,4 +204,11 @@ struct in6_flowlabel_req * MCAST_MSFILTER 48 */ +#define IPV6_PKTINFO 49 +#define IPV6_RTHDR 50 +#define IPV6_HOPOPTS 51 +#define IPV6_DSTOPTS 52 +#define IPV6_TCLASS 53 +#define IPV6_RTHDRDSTOPTS 54 + #endif diff -ruNp linux-2.6.11.10/include/linux/ipv6.h linux-2.6.11.10T3/include/linux/ipv6.h --- linux-2.6.11.10/include/linux/ipv6.h 2005-05-16 10:51:43.000000000 -0700 +++ linux-2.6.11.10T3/include/linux/ipv6.h 2005-06-07 15:10:28.000000000 -0700 @@ -221,7 +221,8 @@ struct ipv6_pinfo { rxhlim:1, hopopts:1, dstopts:1, - rxflow:1; + rxflow:1, + rxtclass:1; } bits; __u8 all; } rxopt; @@ -231,7 +232,8 @@ struct ipv6_pinfo { recverr:1, sndflow:1, pmtudisc:2, - ipv6only:1; + ipv6only:1, + rfc2292:1; struct ipv6_mc_socklist *ipv6_mc_list; struct ipv6_ac_socklist *ipv6_ac_list; @@ -244,6 +246,7 @@ struct ipv6_pinfo { struct ipv6_txoptions *opt; struct rt6_info *rt; int hop_limit; + int tclass; } cork; }; diff -ruNp linux-2.6.11.10/include/net/ipv6.h linux-2.6.11.10T3/include/net/ipv6.h --- linux-2.6.11.10/include/net/ipv6.h 2005-05-16 10:51:49.000000000 -0700 +++ linux-2.6.11.10T3/include/net/ipv6.h 2005-05-24 14:57:23.000000000 -0700 @@ -347,6 +347,7 @@ extern int ip6_append_data(struct sock int length, int transhdrlen, int hlimit, + int tclass, struct ipv6_txoptions *opt, struct flowi *fl, struct rt6_info *rt, diff -ruNp linux-2.6.11.10/include/net/transp_v6.h linux-2.6.11.10T3/include/net/transp_v6.h --- linux-2.6.11.10/include/net/transp_v6.h 2005-05-16 10:51:51.000000000 -0700 +++ linux-2.6.11.10T3/include/net/transp_v6.h 2005-05-24 14:04:11.000000000 -0700 @@ -37,7 +37,7 @@ extern int datagram_recv_ctl(struct so extern int datagram_send_ctl(struct msghdr *msg, struct flowi *fl, struct ipv6_txoptions *opt, - int *hlimit); + int *hlimit, int *tclass); #define LOOPBACK4_IPV6 __constant_htonl(0x7f000006) diff -ruNp linux-2.6.11.10/net/ipv6/datagram.c linux-2.6.11.10T3/net/ipv6/datagram.c --- linux-2.6.11.10/net/ipv6/datagram.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T3/net/ipv6/datagram.c 2005-06-08 11:29:31.000000000 -0700 @@ -381,13 +381,19 @@ int datagram_recv_ctl(struct sock *sk, s src_info.ipi6_ifindex = opt->iif; ipv6_addr_copy(&src_info.ipi6_addr, &skb->nh.ipv6h->daddr); - put_cmsg(msg, SOL_IPV6, IPV6_PKTINFO, sizeof(src_info), &src_info); + put_cmsg(msg, SOL_IPV6, np->rfc2292 ? IPV6_2292PKTINFO : + IPV6_PKTINFO, sizeof(src_info), &src_info); } if (np->rxopt.bits.rxhlim) { int hlim = skb->nh.ipv6h->hop_limit; put_cmsg(msg, SOL_IPV6, IPV6_HOPLIMIT, sizeof(hlim), &hlim); } + if (np->rxopt.bits.rxtclass) { + u8 tclass = (skb->nh.ipv6h->priority << 4) | + ((skb->nh.ipv6h->flow_lbl[0]>>4) & 0xf); + put_cmsg(msg, SOL_IPV6, IPV6_TCLASS, sizeof(tclass), &tclass); + } if (np->rxopt.bits.rxflow && (*(u32*)skb->nh.raw & IPV6_FLOWINFO_MASK)) { u32 flowinfo = *(u32*)skb->nh.raw & IPV6_FLOWINFO_MASK; @@ -395,26 +401,30 @@ int datagram_recv_ctl(struct sock *sk, s } if (np->rxopt.bits.hopopts && opt->hop) { u8 *ptr = skb->nh.raw + opt->hop; - put_cmsg(msg, SOL_IPV6, IPV6_HOPOPTS, (ptr[1]+1)<<3, ptr); + put_cmsg(msg, SOL_IPV6, np->rfc2292 ? IPV6_2292HOPOPTS : + IPV6_HOPOPTS, (ptr[1]+1)<<3, ptr); } if (np->rxopt.bits.dstopts && opt->dst0) { u8 *ptr = skb->nh.raw + opt->dst0; - put_cmsg(msg, SOL_IPV6, IPV6_DSTOPTS, (ptr[1]+1)<<3, ptr); + put_cmsg(msg, SOL_IPV6, np->rfc2292 ? IPV6_2292DSTOPTS : + IPV6_DSTOPTS, (ptr[1]+1)<<3, ptr); } if (np->rxopt.bits.srcrt && opt->srcrt) { struct ipv6_rt_hdr *rthdr = (struct ipv6_rt_hdr *)(skb->nh.raw + opt->srcrt); - put_cmsg(msg, SOL_IPV6, IPV6_RTHDR, (rthdr->hdrlen+1) << 3, rthdr); + put_cmsg(msg, SOL_IPV6, np->rfc2292 ? IPV6_2292RTHDR : + IPV6_RTHDR, (rthdr->hdrlen+1) << 3, rthdr); } if (np->rxopt.bits.dstopts && opt->dst1) { u8 *ptr = skb->nh.raw + opt->dst1; - put_cmsg(msg, SOL_IPV6, IPV6_DSTOPTS, (ptr[1]+1)<<3, ptr); + put_cmsg(msg, SOL_IPV6, np->rfc2292 ? IPV6_2292DSTOPTS : + IPV6_DSTOPTS, (ptr[1]+1)<<3, ptr); } return 0; } int datagram_send_ctl(struct msghdr *msg, struct flowi *fl, struct ipv6_txoptions *opt, - int *hlimit) + int *hlimit, int *tclass) { struct in6_pktinfo *src_info; struct cmsghdr *cmsg; @@ -436,6 +446,7 @@ int datagram_send_ctl(struct msghdr *msg continue; switch (cmsg->cmsg_type) { + case IPV6_2292PKTINFO: case IPV6_PKTINFO: if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct in6_pktinfo))) { err = -EINVAL; @@ -491,6 +502,7 @@ int datagram_send_ctl(struct msghdr *msg fl->fl6_flowlabel = IPV6_FLOWINFO_MASK & *(u32 *)CMSG_DATA(cmsg); break; + case IPV6_2292HOPOPTS: case IPV6_HOPOPTS: if (opt->hopopt || cmsg->cmsg_len < CMSG_LEN(sizeof(struct ipv6_opt_hdr))) { err = -EINVAL; @@ -511,6 +523,7 @@ int datagram_send_ctl(struct msghdr *msg opt->hopopt = hdr; break; + case IPV6_2292DSTOPTS: case IPV6_DSTOPTS: if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct ipv6_opt_hdr))) { err = -EINVAL; @@ -535,6 +548,7 @@ int datagram_send_ctl(struct msghdr *msg opt->dst1opt = hdr; break; + case IPV6_2292RTHDR: case IPV6_RTHDR: if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct ipv6_rt_hdr))) { err = -EINVAL; @@ -587,6 +601,15 @@ int datagram_send_ctl(struct msghdr *msg *hlimit = *(int *)CMSG_DATA(cmsg); break; + case IPV6_TCLASS: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(int))) { + err = -EINVAL; + goto exit_f; + } + + *tclass = *(int *)CMSG_DATA(cmsg); + break; + default: LIMIT_NETDEBUG( printk(KERN_DEBUG "invalid cmsg type: %d\n", cmsg->cmsg_type)); diff -ruNp linux-2.6.11.10/net/ipv6/icmp.c linux-2.6.11.10T3/net/ipv6/icmp.c --- linux-2.6.11.10/net/ipv6/icmp.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T3/net/ipv6/icmp.c 2005-05-24 15:05:14.000000000 -0700 @@ -287,7 +287,7 @@ void icmpv6_send(struct sk_buff *skb, in int iif = 0; int addr_type = 0; int len; - int hlimit; + int hlimit, tclass; int err = 0; if ((u8*)hdr < skb->head || (u8*)(hdr+1) > skb->tail) @@ -381,6 +381,9 @@ void icmpv6_send(struct sk_buff *skb, in hlimit = np->hop_limit; if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; msg.skb = skb; msg.offset = skb->nh.raw - skb->data; @@ -398,7 +401,7 @@ void icmpv6_send(struct sk_buff *skb, in err = ip6_append_data(sk, icmpv6_getfrag, &msg, len + sizeof(struct icmp6hdr), sizeof(struct icmp6hdr), - hlimit, NULL, &fl, (struct rt6_info*)dst, + hlimit, tclass, NULL, &fl, (struct rt6_info*)dst, MSG_DONTWAIT); if (err) { ip6_flush_pending_frames(sk); @@ -432,6 +435,7 @@ static void icmpv6_echo_reply(struct sk_ struct dst_entry *dst; int err = 0; int hlimit; + int tclass; saddr = &skb->nh.ipv6h->daddr; @@ -467,15 +471,18 @@ static void icmpv6_echo_reply(struct sk_ hlimit = np->hop_limit; if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; idev = in6_dev_get(skb->dev); msg.skb = skb; msg.offset = 0; - err = ip6_append_data(sk, icmpv6_getfrag, &msg, skb->len + sizeof(struct icmp6hdr), - sizeof(struct icmp6hdr), hlimit, NULL, &fl, - (struct rt6_info*)dst, MSG_DONTWAIT); + err = ip6_append_data(sk, icmpv6_getfrag, &msg, skb->len + + sizeof(struct icmp6hdr), sizeof(struct icmp6hdr), hlimit, + tclass, NULL, &fl, (struct rt6_info*)dst, MSG_DONTWAIT); if (err) { ip6_flush_pending_frames(sk); diff -ruNp linux-2.6.11.10/net/ipv6/ip6_flowlabel.c linux-2.6.11.10T3/net/ipv6/ip6_flowlabel.c --- linux-2.6.11.10/net/ipv6/ip6_flowlabel.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T3/net/ipv6/ip6_flowlabel.c 2005-05-24 14:04:28.000000000 -0700 @@ -311,7 +311,7 @@ fl_create(struct in6_flowlabel_req *freq msg.msg_control = (void*)(fl->opt+1); flowi.oif = 0; - err = datagram_send_ctl(&msg, &flowi, fl->opt, &junk); + err = datagram_send_ctl(&msg, &flowi, fl->opt, &junk, &junk); if (err) goto done; err = -EINVAL; diff -ruNp linux-2.6.11.10/net/ipv6/ip6_output.c linux-2.6.11.10T3/net/ipv6/ip6_output.c --- linux-2.6.11.10/net/ipv6/ip6_output.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T3/net/ipv6/ip6_output.c 2005-05-24 14:58:51.000000000 -0700 @@ -211,7 +211,7 @@ int ip6_xmit(struct sock *sk, struct sk_ struct ipv6hdr *hdr; u8 proto = fl->proto; int seg_len = skb->len; - int hlimit; + int hlimit, tclass; u32 mtu; if (opt) { @@ -253,6 +253,13 @@ int ip6_xmit(struct sock *sk, struct sk_ hlimit = np->hop_limit; if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); + tclass = -1; + if (np) + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; + hdr->priority = (np->cork.tclass>>4) &0xf; + hdr->flow_lbl[0] |= (np->cork.tclass & 0xf)<<4; hdr->payload_len = htons(seg_len); hdr->nexthdr = proto; @@ -806,10 +813,11 @@ out_err_release: return err; } -int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), - void *from, int length, int transhdrlen, - int hlimit, struct ipv6_txoptions *opt, struct flowi *fl, struct rt6_info *rt, - unsigned int flags) +int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, + int offset, int len, int odd, struct sk_buff *skb), + void *from, int length, int transhdrlen, + int hlimit, int tclass, struct ipv6_txoptions *opt, struct flowi *fl, + struct rt6_info *rt, unsigned int flags) { struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); @@ -847,6 +855,7 @@ int ip6_append_data(struct sock *sk, int np->cork.rt = rt; inet->cork.fl = *fl; np->cork.hop_limit = hlimit; + np->cork.tclass = tclass; inet->cork.fragsize = mtu = dst_pmtu(&rt->u.dst); inet->cork.length = 0; sk->sk_sndmsg_page = NULL; @@ -1130,6 +1139,10 @@ int ip6_push_pending_frames(struct sock *(u32*)hdr = fl->fl6_flowlabel | htonl(0x60000000); + /* traffic class */ + hdr->priority = (np->cork.tclass>>4) & 0xf; + hdr->flow_lbl[0] |= (np->cork.tclass & 0xf)<<4; + if (skb->len <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); else diff -ruNp linux-2.6.11.10/net/ipv6/ipv6_sockglue.c linux-2.6.11.10T3/net/ipv6/ipv6_sockglue.c --- linux-2.6.11.10/net/ipv6/ipv6_sockglue.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T3/net/ipv6/ipv6_sockglue.c 2005-06-08 11:06:47.000000000 -0700 @@ -115,6 +115,15 @@ extern int ip6_mc_msfilter(struct sock * extern int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf, struct group_filter __user *optval, int __user *optlen); +/* + * warn of obsolete RFC 2292 socket API use + */ +static void warn2292(char *optname) +{ + printk(KERN_WARNING "process '%s' is using obsolete %s socket option\n", + current->comm, optname); +} + int ipv6_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) @@ -208,33 +217,53 @@ int ipv6_setsockopt(struct sock *sk, int retv = 0; break; - case IPV6_PKTINFO: + case IPV6_2292PKTINFO: + warn2292("IPV6_PKTINFO"); + case IPV6_RECVPKTINFO: + np->rfc2292 = optname == IPV6_2292PKTINFO; np->rxopt.bits.rxinfo = valbool; retv = 0; break; case IPV6_HOPLIMIT: + warn2292("IPV6_HOPLIMIT"); + case IPV6_RECVHOPLIMIT: + np->rfc2292 = optname == IPV6_HOPLIMIT; np->rxopt.bits.rxhlim = valbool; retv = 0; break; - case IPV6_RTHDR: + case IPV6_2292RTHDR: + warn2292("IPV6_RTHDR"); + case IPV6_RECVRTHDR: if (val < 0 || val > 2) goto e_inval; + np->rfc2292 = optname == IPV6_2292RTHDR; np->rxopt.bits.srcrt = val; retv = 0; break; - case IPV6_HOPOPTS: + case IPV6_2292HOPOPTS: + warn2292("IPV6_HOPOPTS"); + case IPV6_RECVHOPOPTS: + np->rfc2292 = optname == IPV6_2292HOPOPTS; np->rxopt.bits.hopopts = valbool; retv = 0; break; - case IPV6_DSTOPTS: + case IPV6_2292DSTOPTS: + warn2292("IPV6_DSTOPTS"); + case IPV6_RECVDSTOPTS: + np->rfc2292 = optname == IPV6_2292DSTOPTS; np->rxopt.bits.dstopts = valbool; retv = 0; break; + case IPV6_RECVTCLASS: + np->rxopt.bits.rxtclass = valbool; + retv = 0; + break; + case IPV6_FLOWINFO: np->rxopt.bits.rxflow = valbool; retv = 0; @@ -274,7 +303,7 @@ int ipv6_setsockopt(struct sock *sk, int msg.msg_controllen = optlen; msg.msg_control = (void*)(opt+1); - retv = datagram_send_ctl(&msg, &fl, opt, &junk); + retv = datagram_send_ctl(&msg, &fl, opt, &junk, &junk); if (retv) goto done; update: @@ -620,26 +649,45 @@ int ipv6_getsockopt(struct sock *sk, int val = np->ipv6only; break; - case IPV6_PKTINFO: + case IPV6_2292PKTINFO: + warn2292("IPV6_PKTINFO"); + case IPV6_RECVPKTINFO: + np->rfc2292 = optname == IPV6_2292PKTINFO; val = np->rxopt.bits.rxinfo; break; case IPV6_HOPLIMIT: + warn2292("IPV6_HOPLIMIT"); + case IPV6_RECVHOPLIMIT: + np->rfc2292 = optname == IPV6_HOPLIMIT; val = np->rxopt.bits.rxhlim; break; - case IPV6_RTHDR: + case IPV6_2292RTHDR: + warn2292("IPV6_RTHDR"); + case IPV6_RECVRTHDR: + np->rfc2292 = optname == IPV6_2292RTHDR; val = np->rxopt.bits.srcrt; break; - case IPV6_HOPOPTS: + case IPV6_2292HOPOPTS: + warn2292("IPV6_HOPOPTS"); + case IPV6_RECVHOPOPTS: + np->rfc2292 = optname == IPV6_2292HOPOPTS; val = np->rxopt.bits.hopopts; break; - case IPV6_DSTOPTS: + case IPV6_2292DSTOPTS: + warn2292("IPV6_DSTOPTS"); + case IPV6_RECVDSTOPTS: + np->rfc2292 = optname == IPV6_2292DSTOPTS; val = np->rxopt.bits.dstopts; break; + case IPV6_RECVTCLASS: + val = np->rxopt.bits.rxtclass; + break; + case IPV6_FLOWINFO: val = np->rxopt.bits.rxflow; break; diff -ruNp linux-2.6.11.10/net/ipv6/raw.c linux-2.6.11.10T3/net/ipv6/raw.c --- linux-2.6.11.10/net/ipv6/raw.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T3/net/ipv6/raw.c 2005-05-24 15:09:42.000000000 -0700 @@ -617,6 +617,7 @@ static int rawv6_sendmsg(struct kiocb *i struct flowi fl; int addr_len = msg->msg_namelen; int hlimit = -1; + int tclass = -1; u16 proto; int err; @@ -702,7 +703,7 @@ static int rawv6_sendmsg(struct kiocb *i memset(opt, 0, sizeof(struct ipv6_txoptions)); opt->tot_len = sizeof(struct ipv6_txoptions); - err = datagram_send_ctl(msg, &fl, opt, &hlimit); + err = datagram_send_ctl(msg, &fl, opt, &hlimit, &tclass); if (err < 0) { fl6_sock_release(flowlabel); return err; @@ -758,6 +759,12 @@ static int rawv6_sendmsg(struct kiocb *i hlimit = dst_metric(dst, RTAX_HOPLIMIT); } + if (tclass < 0) { + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; + } + if (msg->msg_flags&MSG_CONFIRM) goto do_confirm; @@ -766,8 +773,9 @@ back_from_confirm: err = rawv6_send_hdrinc(sk, msg->msg_iov, len, &fl, (struct rt6_info*)dst, msg->msg_flags); } else { lock_sock(sk); - err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, len, 0, - hlimit, opt, &fl, (struct rt6_info*)dst, msg->msg_flags); + err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, + len, 0, hlimit, tclass, opt, &fl, (struct rt6_info*)dst, + msg->msg_flags); if (err) ip6_flush_pending_frames(sk); diff -ruNp linux-2.6.11.10/net/ipv6/udp.c linux-2.6.11.10T3/net/ipv6/udp.c --- linux-2.6.11.10/net/ipv6/udp.c 2005-05-16 10:52:00.000000000 -0700 +++ linux-2.6.11.10T3/net/ipv6/udp.c 2005-05-24 15:11:58.000000000 -0700 @@ -637,6 +637,7 @@ static int udpv6_sendmsg(struct kiocb *i int addr_len = msg->msg_namelen; int ulen = len; int hlimit = -1; + int tclass = -1; int corkreq = up->corkflag || msg->msg_flags&MSG_MORE; int err; @@ -758,7 +759,7 @@ do_udp_sendmsg: memset(opt, 0, sizeof(struct ipv6_txoptions)); opt->tot_len = sizeof(*opt); - err = datagram_send_ctl(msg, fl, opt, &hlimit); + err = datagram_send_ctl(msg, fl, opt, &hlimit, &tclass); if (err < 0) { fl6_sock_release(flowlabel); return err; @@ -812,6 +813,11 @@ do_udp_sendmsg: if (hlimit < 0) hlimit = dst_metric(dst, RTAX_HOPLIMIT); } + if (tclass < 0) { + tclass = np->cork.tclass; + if (tclass < 0) + tclass = 0; + } if (msg->msg_flags&MSG_CONFIRM) goto do_confirm; @@ -832,9 +838,10 @@ back_from_confirm: do_append_data: up->len += ulen; - err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen, sizeof(struct udphdr), - hlimit, opt, fl, (struct rt6_info*)dst, - corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags); + err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen, + sizeof(struct udphdr), hlimit, tclass, opt, fl, + (struct rt6_info*)dst, + corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags); if (err) udp_v6_flush_pending_frames(sk); else if (!corkreq) --=_mixed 006767FF8825701A_= Content-Type: application/octet-stream; name="rfc3542-2.patch" Content-Disposition: attachment; filename="rfc3542-2.patch" Content-Transfer-Encoding: base64 ZGlmZiAtcnVOcCBsaW51eC0yLjYuMTEuMTAvaW5jbHVkZS9saW51eC9pbjYuaCBsaW51eC0yLjYu MTEuMTBUMy9pbmNsdWRlL2xpbnV4L2luNi5oCi0tLSBsaW51eC0yLjYuMTEuMTAvaW5jbHVkZS9s aW51eC9pbjYuaAkyMDA1LTA1LTE2IDEwOjUxOjQzLjAwMDAwMDAwMCAtMDcwMAorKysgbGludXgt Mi42LjExLjEwVDMvaW5jbHVkZS9saW51eC9pbjYuaAkyMDA1LTA2LTA4IDEwOjQ1OjI1LjAwMDAw MDAwMCAtMDcwMApAQCAtMTQ4LDEwICsxNDgsMTAgQEAgc3RydWN0IGluNl9mbG93bGFiZWxfcmVx CiAgKi8KIAogI2RlZmluZSBJUFY2X0FERFJGT1JNCQkxCi0jZGVmaW5lIElQVjZfUEtUSU5GTwkJ MgotI2RlZmluZSBJUFY2X0hPUE9QVFMJCTMKLSNkZWZpbmUgSVBWNl9EU1RPUFRTCQk0Ci0jZGVm aW5lIElQVjZfUlRIRFIJCTUKKyNkZWZpbmUgSVBWNl8yMjkyUEtUSU5GTwkyCisjZGVmaW5lIElQ VjZfMjI5MkhPUE9QVFMJMworI2RlZmluZSBJUFY2XzIyOTJEU1RPUFRTCTQKKyNkZWZpbmUgSVBW Nl8yMjkyUlRIRFIJCTUKICNkZWZpbmUgSVBWNl9QS1RPUFRJT05TCQk2CiAjZGVmaW5lIElQVjZf Q0hFQ0tTVU0JCTcKICNkZWZpbmUgSVBWNl9IT1BMSU1JVAkJOApAQCAtMTg0LDYgKzE4NCwxMiBA QCBzdHJ1Y3QgaW42X2Zsb3dsYWJlbF9yZXEKIAogI2RlZmluZSBJUFY2X0lQU0VDX1BPTElDWQkz NAogI2RlZmluZSBJUFY2X1hGUk1fUE9MSUNZCTM1CisjZGVmaW5lIElQVjZfUkVDVlBLVElORk8J MzYKKyNkZWZpbmUgSVBWNl9SRUNWSE9QTElNSVQJMzcKKyNkZWZpbmUgSVBWNl9SRUNWUlRIRFIJ CTM4CisjZGVmaW5lIElQVjZfUkVDVkhPUE9QVFMJMzkKKyNkZWZpbmUgSVBWNl9SRUNWRFNUT1BU Uwk0MAorI2RlZmluZSBJUFY2X1JFQ1ZUQ0xBU1MJCTQxCiAKIC8qCiAgKiBNdWx0aWNhc3Q6CkBA IC0xOTgsNCArMjA0LDExIEBAIHN0cnVjdCBpbjZfZmxvd2xhYmVsX3JlcQogICogTUNBU1RfTVNG SUxURVIJCTQ4CiAgKi8KIAorI2RlZmluZSBJUFY2X1BLVElORk8JCTQ5CisjZGVmaW5lIElQVjZf UlRIRFIJCTUwCisjZGVmaW5lIElQVjZfSE9QT1BUUwkJNTEKKyNkZWZpbmUgSVBWNl9EU1RPUFRT CQk1MgorI2RlZmluZSBJUFY2X1RDTEFTUwkJNTMKKyNkZWZpbmUgSVBWNl9SVEhEUkRTVE9QVFMJ NTQKKwogI2VuZGlmCmRpZmYgLXJ1TnAgbGludXgtMi42LjExLjEwL2luY2x1ZGUvbGludXgvaXB2 Ni5oIGxpbnV4LTIuNi4xMS4xMFQzL2luY2x1ZGUvbGludXgvaXB2Ni5oCi0tLSBsaW51eC0yLjYu MTEuMTAvaW5jbHVkZS9saW51eC9pcHY2LmgJMjAwNS0wNS0xNiAxMDo1MTo0My4wMDAwMDAwMDAg LTA3MDAKKysrIGxpbnV4LTIuNi4xMS4xMFQzL2luY2x1ZGUvbGludXgvaXB2Ni5oCTIwMDUtMDYt MDcgMTU6MTA6MjguMDAwMDAwMDAwIC0wNzAwCkBAIC0yMjEsNyArMjIxLDggQEAgc3RydWN0IGlw djZfcGluZm8gewogCQkJCXJ4aGxpbToxLAogCQkJCWhvcG9wdHM6MSwKIAkJCQlkc3RvcHRzOjEs Ci0gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHJ4ZmxvdzoxOworICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICByeGZsb3c6MSwKKwkJCQlyeHRjbGFzczoxOwogCQl9IGJpdHM7 CiAJCV9fdTgJCWFsbDsKIAl9IHJ4b3B0OwpAQCAtMjMxLDcgKzIzMiw4IEBAIHN0cnVjdCBpcHY2 X3BpbmZvIHsKIAkgICAgICAgICAgICAgICAgICAgICAgICByZWN2ZXJyOjEsCiAJICAgICAgICAg ICAgICAgICAgICAgICAgc25kZmxvdzoxLAogCQkJCXBtdHVkaXNjOjIsCi0JCQkJaXB2Nm9ubHk6 MTsKKwkJCQlpcHY2b25seToxLAorCQkJCXJmYzIyOTI6MTsKIAogCXN0cnVjdCBpcHY2X21jX3Nv Y2tsaXN0CSppcHY2X21jX2xpc3Q7CiAJc3RydWN0IGlwdjZfYWNfc29ja2xpc3QJKmlwdjZfYWNf bGlzdDsKQEAgLTI0NCw2ICsyNDYsNyBAQCBzdHJ1Y3QgaXB2Nl9waW5mbyB7CiAJCXN0cnVjdCBp cHY2X3R4b3B0aW9ucyAqb3B0OwogCQlzdHJ1Y3QgcnQ2X2luZm8JKnJ0OwogCQlpbnQgaG9wX2xp bWl0OworCQlpbnQgdGNsYXNzOwogCX0gY29yazsKIH07CiAKZGlmZiAtcnVOcCBsaW51eC0yLjYu MTEuMTAvaW5jbHVkZS9uZXQvaXB2Ni5oIGxpbnV4LTIuNi4xMS4xMFQzL2luY2x1ZGUvbmV0L2lw djYuaAotLS0gbGludXgtMi42LjExLjEwL2luY2x1ZGUvbmV0L2lwdjYuaAkyMDA1LTA1LTE2IDEw OjUxOjQ5LjAwMDAwMDAwMCAtMDcwMAorKysgbGludXgtMi42LjExLjEwVDMvaW5jbHVkZS9uZXQv aXB2Ni5oCTIwMDUtMDUtMjQgMTQ6NTc6MjMuMDAwMDAwMDAwIC0wNzAwCkBAIC0zNDcsNiArMzQ3 LDcgQEAgZXh0ZXJuIGludAkJCWlwNl9hcHBlbmRfZGF0YShzdHJ1Y3Qgc29jawogCQkJCQkJaW50 IGxlbmd0aCwKIAkJCQkJCWludCB0cmFuc2hkcmxlbiwKIAkJICAgICAgCQkJCWludCBobGltaXQs CisJCSAgICAgIAkJCQlpbnQgdGNsYXNzLAogCQkJCQkJc3RydWN0IGlwdjZfdHhvcHRpb25zICpv cHQsCiAJCQkJCQlzdHJ1Y3QgZmxvd2kgKmZsLAogCQkJCQkJc3RydWN0IHJ0Nl9pbmZvICpydCwK ZGlmZiAtcnVOcCBsaW51eC0yLjYuMTEuMTAvaW5jbHVkZS9uZXQvdHJhbnNwX3Y2LmggbGludXgt Mi42LjExLjEwVDMvaW5jbHVkZS9uZXQvdHJhbnNwX3Y2LmgKLS0tIGxpbnV4LTIuNi4xMS4xMC9p bmNsdWRlL25ldC90cmFuc3BfdjYuaAkyMDA1LTA1LTE2IDEwOjUxOjUxLjAwMDAwMDAwMCAtMDcw MAorKysgbGludXgtMi42LjExLjEwVDMvaW5jbHVkZS9uZXQvdHJhbnNwX3Y2LmgJMjAwNS0wNS0y NCAxNDowNDoxMS4wMDAwMDAwMDAgLTA3MDAKQEAgLTM3LDcgKzM3LDcgQEAgZXh0ZXJuIGludAkJ CWRhdGFncmFtX3JlY3ZfY3RsKHN0cnVjdCBzbwogZXh0ZXJuIGludAkJCWRhdGFncmFtX3NlbmRf Y3RsKHN0cnVjdCBtc2doZHIgKm1zZywKIAkJCQkJCSAgc3RydWN0IGZsb3dpICpmbCwKIAkJCQkJ CSAgc3RydWN0IGlwdjZfdHhvcHRpb25zICpvcHQsCi0JCQkJCQkgIGludCAqaGxpbWl0KTsKKwkJ CQkJCSAgaW50ICpobGltaXQsIGludCAqdGNsYXNzKTsKIAogI2RlZmluZQkJTE9PUEJBQ0s0X0lQ VjYJCV9fY29uc3RhbnRfaHRvbmwoMHg3ZjAwMDAwNikKIApkaWZmIC1ydU5wIGxpbnV4LTIuNi4x MS4xMC9uZXQvaXB2Ni9kYXRhZ3JhbS5jIGxpbnV4LTIuNi4xMS4xMFQzL25ldC9pcHY2L2RhdGFn cmFtLmMKLS0tIGxpbnV4LTIuNi4xMS4xMC9uZXQvaXB2Ni9kYXRhZ3JhbS5jCTIwMDUtMDUtMTYg MTA6NTI6MDAuMDAwMDAwMDAwIC0wNzAwCisrKyBsaW51eC0yLjYuMTEuMTBUMy9uZXQvaXB2Ni9k YXRhZ3JhbS5jCTIwMDUtMDYtMDggMTE6Mjk6MzEuMDAwMDAwMDAwIC0wNzAwCkBAIC0zODEsMTMg KzM4MSwxOSBAQCBpbnQgZGF0YWdyYW1fcmVjdl9jdGwoc3RydWN0IHNvY2sgKnNrLCBzCiAKIAkJ c3JjX2luZm8uaXBpNl9pZmluZGV4ID0gb3B0LT5paWY7CiAJCWlwdjZfYWRkcl9jb3B5KCZzcmNf aW5mby5pcGk2X2FkZHIsICZza2ItPm5oLmlwdjZoLT5kYWRkcik7Ci0JCXB1dF9jbXNnKG1zZywg U09MX0lQVjYsIElQVjZfUEtUSU5GTywgc2l6ZW9mKHNyY19pbmZvKSwgJnNyY19pbmZvKTsKKwkJ cHV0X2Ntc2cobXNnLCBTT0xfSVBWNiwgbnAtPnJmYzIyOTIgPyBJUFY2XzIyOTJQS1RJTkZPIDoK KwkJCUlQVjZfUEtUSU5GTywgc2l6ZW9mKHNyY19pbmZvKSwgJnNyY19pbmZvKTsKIAl9CiAKIAlp ZiAobnAtPnJ4b3B0LmJpdHMucnhobGltKSB7CiAJCWludCBobGltID0gc2tiLT5uaC5pcHY2aC0+ aG9wX2xpbWl0OwogCQlwdXRfY21zZyhtc2csIFNPTF9JUFY2LCBJUFY2X0hPUExJTUlULCBzaXpl b2YoaGxpbSksICZobGltKTsKIAl9CisJaWYgKG5wLT5yeG9wdC5iaXRzLnJ4dGNsYXNzKSB7CisJ CXU4IHRjbGFzcyA9IChza2ItPm5oLmlwdjZoLT5wcmlvcml0eSA8PCA0KSB8CisJCQkoKHNrYi0+ bmguaXB2NmgtPmZsb3dfbGJsWzBdPj40KSAmIDB4Zik7CisJCXB1dF9jbXNnKG1zZywgU09MX0lQ VjYsIElQVjZfVENMQVNTLCBzaXplb2YodGNsYXNzKSwgJnRjbGFzcyk7CisJfQogCiAJaWYgKG5w LT5yeG9wdC5iaXRzLnJ4ZmxvdyAmJiAoKih1MzIqKXNrYi0+bmgucmF3ICYgSVBWNl9GTE9XSU5G T19NQVNLKSkgewogCQl1MzIgZmxvd2luZm8gPSAqKHUzMiopc2tiLT5uaC5yYXcgJiBJUFY2X0ZM T1dJTkZPX01BU0s7CkBAIC0zOTUsMjYgKzQwMSwzMCBAQCBpbnQgZGF0YWdyYW1fcmVjdl9jdGwo c3RydWN0IHNvY2sgKnNrLCBzCiAJfQogCWlmIChucC0+cnhvcHQuYml0cy5ob3BvcHRzICYmIG9w dC0+aG9wKSB7CiAJCXU4ICpwdHIgPSBza2ItPm5oLnJhdyArIG9wdC0+aG9wOwotCQlwdXRfY21z Zyhtc2csIFNPTF9JUFY2LCBJUFY2X0hPUE9QVFMsIChwdHJbMV0rMSk8PDMsIHB0cik7CisJCXB1 dF9jbXNnKG1zZywgU09MX0lQVjYsIG5wLT5yZmMyMjkyID8gSVBWNl8yMjkySE9QT1BUUyA6CisJ CQlJUFY2X0hPUE9QVFMsIChwdHJbMV0rMSk8PDMsIHB0cik7CiAJfQogCWlmIChucC0+cnhvcHQu Yml0cy5kc3RvcHRzICYmIG9wdC0+ZHN0MCkgewogCQl1OCAqcHRyID0gc2tiLT5uaC5yYXcgKyBv cHQtPmRzdDA7Ci0JCXB1dF9jbXNnKG1zZywgU09MX0lQVjYsIElQVjZfRFNUT1BUUywgKHB0clsx XSsxKTw8MywgcHRyKTsKKwkJcHV0X2Ntc2cobXNnLCBTT0xfSVBWNiwgbnAtPnJmYzIyOTIgPyBJ UFY2XzIyOTJEU1RPUFRTIDoKKwkJCUlQVjZfRFNUT1BUUywgKHB0clsxXSsxKTw8MywgcHRyKTsK IAl9CiAJaWYgKG5wLT5yeG9wdC5iaXRzLnNyY3J0ICYmIG9wdC0+c3JjcnQpIHsKIAkJc3RydWN0 IGlwdjZfcnRfaGRyICpydGhkciA9IChzdHJ1Y3QgaXB2Nl9ydF9oZHIgKikoc2tiLT5uaC5yYXcg KyBvcHQtPnNyY3J0KTsKLQkJcHV0X2Ntc2cobXNnLCBTT0xfSVBWNiwgSVBWNl9SVEhEUiwgKHJ0 aGRyLT5oZHJsZW4rMSkgPDwgMywgcnRoZHIpOworCQlwdXRfY21zZyhtc2csIFNPTF9JUFY2LCBu cC0+cmZjMjI5MiA/IElQVjZfMjI5MlJUSERSIDoKKwkJCUlQVjZfUlRIRFIsIChydGhkci0+aGRy bGVuKzEpIDw8IDMsIHJ0aGRyKTsKIAl9CiAJaWYgKG5wLT5yeG9wdC5iaXRzLmRzdG9wdHMgJiYg b3B0LT5kc3QxKSB7CiAJCXU4ICpwdHIgPSBza2ItPm5oLnJhdyArIG9wdC0+ZHN0MTsKLQkJcHV0 X2Ntc2cobXNnLCBTT0xfSVBWNiwgSVBWNl9EU1RPUFRTLCAocHRyWzFdKzEpPDwzLCBwdHIpOwor CQlwdXRfY21zZyhtc2csIFNPTF9JUFY2LCBucC0+cmZjMjI5MiA/IElQVjZfMjI5MkRTVE9QVFMg OgorCQkJSVBWNl9EU1RPUFRTLCAocHRyWzFdKzEpPDwzLCBwdHIpOwogCX0KIAlyZXR1cm4gMDsK IH0KIAogaW50IGRhdGFncmFtX3NlbmRfY3RsKHN0cnVjdCBtc2doZHIgKm1zZywgc3RydWN0IGZs b3dpICpmbCwKIAkJICAgICAgc3RydWN0IGlwdjZfdHhvcHRpb25zICpvcHQsCi0JCSAgICAgIGlu dCAqaGxpbWl0KQorCQkgICAgICBpbnQgKmhsaW1pdCwgaW50ICp0Y2xhc3MpCiB7CiAJc3RydWN0 IGluNl9wa3RpbmZvICpzcmNfaW5mbzsKIAlzdHJ1Y3QgY21zZ2hkciAqY21zZzsKQEAgLTQzNiw2 ICs0NDYsNyBAQCBpbnQgZGF0YWdyYW1fc2VuZF9jdGwoc3RydWN0IG1zZ2hkciAqbXNnCiAJCQlj b250aW51ZTsKIAogCQlzd2l0Y2ggKGNtc2ctPmNtc2dfdHlwZSkgeworCQljYXNlIElQVjZfMjI5 MlBLVElORk86CiAgCQljYXNlIElQVjZfUEtUSU5GTzoKICAJCQlpZiAoY21zZy0+Y21zZ19sZW4g PCBDTVNHX0xFTihzaXplb2Yoc3RydWN0IGluNl9wa3RpbmZvKSkpIHsKIAkJCQllcnIgPSAtRUlO VkFMOwpAQCAtNDkxLDYgKzUwMiw3IEBAIGludCBkYXRhZ3JhbV9zZW5kX2N0bChzdHJ1Y3QgbXNn aGRyICptc2cKIAkJCWZsLT5mbDZfZmxvd2xhYmVsID0gSVBWNl9GTE9XSU5GT19NQVNLICYgKih1 MzIgKilDTVNHX0RBVEEoY21zZyk7CiAJCQlicmVhazsKIAorCQljYXNlIElQVjZfMjI5MkhPUE9Q VFM6CiAJCWNhc2UgSVBWNl9IT1BPUFRTOgogICAgICAgICAgICAgICAgICAgICAgICAgaWYgKG9w dC0+aG9wb3B0IHx8IGNtc2ctPmNtc2dfbGVuIDwgQ01TR19MRU4oc2l6ZW9mKHN0cnVjdCBpcHY2 X29wdF9oZHIpKSkgewogCQkJCWVyciA9IC1FSU5WQUw7CkBAIC01MTEsNiArNTIzLDcgQEAgaW50 IGRhdGFncmFtX3NlbmRfY3RsKHN0cnVjdCBtc2doZHIgKm1zZwogCQkJb3B0LT5ob3BvcHQgPSBo ZHI7CiAJCQlicmVhazsKIAorCQljYXNlIElQVjZfMjI5MkRTVE9QVFM6CiAJCWNhc2UgSVBWNl9E U1RPUFRTOgogICAgICAgICAgICAgICAgICAgICAgICAgaWYgKGNtc2ctPmNtc2dfbGVuIDwgQ01T R19MRU4oc2l6ZW9mKHN0cnVjdCBpcHY2X29wdF9oZHIpKSkgewogCQkJCWVyciA9IC1FSU5WQUw7 CkBAIC01MzUsNiArNTQ4LDcgQEAgaW50IGRhdGFncmFtX3NlbmRfY3RsKHN0cnVjdCBtc2doZHIg Km1zZwogCQkJb3B0LT5kc3Qxb3B0ID0gaGRyOwogCQkJYnJlYWs7CiAKKwkJY2FzZSBJUFY2XzIy OTJSVEhEUjoKIAkJY2FzZSBJUFY2X1JUSERSOgogICAgICAgICAgICAgICAgICAgICAgICAgaWYg KGNtc2ctPmNtc2dfbGVuIDwgQ01TR19MRU4oc2l6ZW9mKHN0cnVjdCBpcHY2X3J0X2hkcikpKSB7 CiAJCQkJZXJyID0gLUVJTlZBTDsKQEAgLTU4Nyw2ICs2MDEsMTUgQEAgaW50IGRhdGFncmFtX3Nl bmRfY3RsKHN0cnVjdCBtc2doZHIgKm1zZwogCQkJKmhsaW1pdCA9ICooaW50ICopQ01TR19EQVRB KGNtc2cpOwogCQkJYnJlYWs7CiAKKwkJY2FzZSBJUFY2X1RDTEFTUzoKKwkJCWlmIChjbXNnLT5j bXNnX2xlbiAhPSBDTVNHX0xFTihzaXplb2YoaW50KSkpIHsKKwkJCQllcnIgPSAtRUlOVkFMOwor CQkJCWdvdG8gZXhpdF9mOworCQkJfQorCisJCQkqdGNsYXNzID0gKihpbnQgKilDTVNHX0RBVEEo Y21zZyk7CisJCQlicmVhazsKKwogCQlkZWZhdWx0OgogCQkJTElNSVRfTkVUREVCVUcoCiAJCQkJ cHJpbnRrKEtFUk5fREVCVUcgImludmFsaWQgY21zZyB0eXBlOiAlZFxuIiwgY21zZy0+Y21zZ190 eXBlKSk7CmRpZmYgLXJ1TnAgbGludXgtMi42LjExLjEwL25ldC9pcHY2L2ljbXAuYyBsaW51eC0y LjYuMTEuMTBUMy9uZXQvaXB2Ni9pY21wLmMKLS0tIGxpbnV4LTIuNi4xMS4xMC9uZXQvaXB2Ni9p Y21wLmMJMjAwNS0wNS0xNiAxMDo1MjowMC4wMDAwMDAwMDAgLTA3MDAKKysrIGxpbnV4LTIuNi4x MS4xMFQzL25ldC9pcHY2L2ljbXAuYwkyMDA1LTA1LTI0IDE1OjA1OjE0LjAwMDAwMDAwMCAtMDcw MApAQCAtMjg3LDcgKzI4Nyw3IEBAIHZvaWQgaWNtcHY2X3NlbmQoc3RydWN0IHNrX2J1ZmYgKnNr YiwgaW4KIAlpbnQgaWlmID0gMDsKIAlpbnQgYWRkcl90eXBlID0gMDsKIAlpbnQgbGVuOwotCWlu dCBobGltaXQ7CisJaW50IGhsaW1pdCwgdGNsYXNzOwogCWludCBlcnIgPSAwOwogCiAJaWYgKCh1 OCopaGRyIDwgc2tiLT5oZWFkIHx8ICh1OCopKGhkcisxKSA+IHNrYi0+dGFpbCkKQEAgLTM4MSw2 ICszODEsOSBAQCB2b2lkIGljbXB2Nl9zZW5kKHN0cnVjdCBza19idWZmICpza2IsIGluCiAJCWhs aW1pdCA9IG5wLT5ob3BfbGltaXQ7CiAJaWYgKGhsaW1pdCA8IDApCiAJCWhsaW1pdCA9IGRzdF9t ZXRyaWMoZHN0LCBSVEFYX0hPUExJTUlUKTsKKwl0Y2xhc3MgPSBucC0+Y29yay50Y2xhc3M7CisJ aWYgKHRjbGFzcyA8IDApCisJCXRjbGFzcyA9IDA7CiAKIAltc2cuc2tiID0gc2tiOwogCW1zZy5v ZmZzZXQgPSBza2ItPm5oLnJhdyAtIHNrYi0+ZGF0YTsKQEAgLTM5OCw3ICs0MDEsNyBAQCB2b2lk IGljbXB2Nl9zZW5kKHN0cnVjdCBza19idWZmICpza2IsIGluCiAJZXJyID0gaXA2X2FwcGVuZF9k YXRhKHNrLCBpY21wdjZfZ2V0ZnJhZywgJm1zZywKIAkJCSAgICAgIGxlbiArIHNpemVvZihzdHJ1 Y3QgaWNtcDZoZHIpLAogCQkJICAgICAgc2l6ZW9mKHN0cnVjdCBpY21wNmhkciksCi0JCQkgICAg ICBobGltaXQsIE5VTEwsICZmbCwgKHN0cnVjdCBydDZfaW5mbyopZHN0LAorCQkJICAgICAgaGxp bWl0LCB0Y2xhc3MsIE5VTEwsICZmbCwgKHN0cnVjdCBydDZfaW5mbyopZHN0LAogCQkJICAgICAg TVNHX0RPTlRXQUlUKTsKIAlpZiAoZXJyKSB7CiAJCWlwNl9mbHVzaF9wZW5kaW5nX2ZyYW1lcyhz ayk7CkBAIC00MzIsNiArNDM1LDcgQEAgc3RhdGljIHZvaWQgaWNtcHY2X2VjaG9fcmVwbHkoc3Ry dWN0IHNrXwogCXN0cnVjdCBkc3RfZW50cnkgKmRzdDsKIAlpbnQgZXJyID0gMDsKIAlpbnQgaGxp bWl0OworCWludCB0Y2xhc3M7CiAKIAlzYWRkciA9ICZza2ItPm5oLmlwdjZoLT5kYWRkcjsKIApA QCAtNDY3LDE1ICs0NzEsMTggQEAgc3RhdGljIHZvaWQgaWNtcHY2X2VjaG9fcmVwbHkoc3RydWN0 IHNrXwogCQlobGltaXQgPSBucC0+aG9wX2xpbWl0OwogCWlmIChobGltaXQgPCAwKQogCQlobGlt aXQgPSBkc3RfbWV0cmljKGRzdCwgUlRBWF9IT1BMSU1JVCk7CisJdGNsYXNzID0gbnAtPmNvcmsu dGNsYXNzOworCWlmICh0Y2xhc3MgPCAwKQorCQl0Y2xhc3MgPSAwOwogCiAJaWRldiA9IGluNl9k ZXZfZ2V0KHNrYi0+ZGV2KTsKIAogCW1zZy5za2IgPSBza2I7CiAJbXNnLm9mZnNldCA9IDA7CiAK LQllcnIgPSBpcDZfYXBwZW5kX2RhdGEoc2ssIGljbXB2Nl9nZXRmcmFnLCAmbXNnLCBza2ItPmxl biArIHNpemVvZihzdHJ1Y3QgaWNtcDZoZHIpLAotCQkJCXNpemVvZihzdHJ1Y3QgaWNtcDZoZHIp LCBobGltaXQsIE5VTEwsICZmbCwKLQkJCQkoc3RydWN0IHJ0Nl9pbmZvKilkc3QsIE1TR19ET05U V0FJVCk7CisJZXJyID0gaXA2X2FwcGVuZF9kYXRhKHNrLCBpY21wdjZfZ2V0ZnJhZywgJm1zZywg c2tiLT5sZW4gKworCQlzaXplb2Yoc3RydWN0IGljbXA2aGRyKSwgc2l6ZW9mKHN0cnVjdCBpY21w NmhkciksIGhsaW1pdCwKKwkJdGNsYXNzLCBOVUxMLCAmZmwsIChzdHJ1Y3QgcnQ2X2luZm8qKWRz dCwgTVNHX0RPTlRXQUlUKTsKIAogCWlmIChlcnIpIHsKIAkJaXA2X2ZsdXNoX3BlbmRpbmdfZnJh bWVzKHNrKTsKZGlmZiAtcnVOcCBsaW51eC0yLjYuMTEuMTAvbmV0L2lwdjYvaXA2X2Zsb3dsYWJl bC5jIGxpbnV4LTIuNi4xMS4xMFQzL25ldC9pcHY2L2lwNl9mbG93bGFiZWwuYwotLS0gbGludXgt Mi42LjExLjEwL25ldC9pcHY2L2lwNl9mbG93bGFiZWwuYwkyMDA1LTA1LTE2IDEwOjUyOjAwLjAw MDAwMDAwMCAtMDcwMAorKysgbGludXgtMi42LjExLjEwVDMvbmV0L2lwdjYvaXA2X2Zsb3dsYWJl bC5jCTIwMDUtMDUtMjQgMTQ6MDQ6MjguMDAwMDAwMDAwIC0wNzAwCkBAIC0zMTEsNyArMzExLDcg QEAgZmxfY3JlYXRlKHN0cnVjdCBpbjZfZmxvd2xhYmVsX3JlcSAqZnJlcQogCQltc2cubXNnX2Nv bnRyb2wgPSAodm9pZCopKGZsLT5vcHQrMSk7CiAJCWZsb3dpLm9pZiA9IDA7CiAKLQkJZXJyID0g ZGF0YWdyYW1fc2VuZF9jdGwoJm1zZywgJmZsb3dpLCBmbC0+b3B0LCAmanVuayk7CisJCWVyciA9 IGRhdGFncmFtX3NlbmRfY3RsKCZtc2csICZmbG93aSwgZmwtPm9wdCwgJmp1bmssICZqdW5rKTsK IAkJaWYgKGVycikKIAkJCWdvdG8gZG9uZTsKIAkJZXJyID0gLUVJTlZBTDsKZGlmZiAtcnVOcCBs aW51eC0yLjYuMTEuMTAvbmV0L2lwdjYvaXA2X291dHB1dC5jIGxpbnV4LTIuNi4xMS4xMFQzL25l dC9pcHY2L2lwNl9vdXRwdXQuYwotLS0gbGludXgtMi42LjExLjEwL25ldC9pcHY2L2lwNl9vdXRw dXQuYwkyMDA1LTA1LTE2IDEwOjUyOjAwLjAwMDAwMDAwMCAtMDcwMAorKysgbGludXgtMi42LjEx LjEwVDMvbmV0L2lwdjYvaXA2X291dHB1dC5jCTIwMDUtMDUtMjQgMTQ6NTg6NTEuMDAwMDAwMDAw IC0wNzAwCkBAIC0yMTEsNyArMjExLDcgQEAgaW50IGlwNl94bWl0KHN0cnVjdCBzb2NrICpzaywg c3RydWN0IHNrXwogCXN0cnVjdCBpcHY2aGRyICpoZHI7CiAJdTggIHByb3RvID0gZmwtPnByb3Rv OwogCWludCBzZWdfbGVuID0gc2tiLT5sZW47Ci0JaW50IGhsaW1pdDsKKwlpbnQgaGxpbWl0LCB0 Y2xhc3M7CiAJdTMyIG10dTsKIAogCWlmIChvcHQpIHsKQEAgLTI1Myw2ICsyNTMsMTMgQEAgaW50 IGlwNl94bWl0KHN0cnVjdCBzb2NrICpzaywgc3RydWN0IHNrXwogCQlobGltaXQgPSBucC0+aG9w X2xpbWl0OwogCWlmIChobGltaXQgPCAwKQogCQlobGltaXQgPSBkc3RfbWV0cmljKGRzdCwgUlRB WF9IT1BMSU1JVCk7CisJdGNsYXNzID0gLTE7CisJaWYgKG5wKQorCQl0Y2xhc3MgPSBucC0+Y29y ay50Y2xhc3M7CisJaWYgKHRjbGFzcyA8IDApCisJCXRjbGFzcyA9IDA7CisJaGRyLT5wcmlvcml0 eSA9IChucC0+Y29yay50Y2xhc3M+PjQpICYweGY7CisJaGRyLT5mbG93X2xibFswXSB8PSAobnAt PmNvcmsudGNsYXNzICYgMHhmKTw8NDsKIAogCWhkci0+cGF5bG9hZF9sZW4gPSBodG9ucyhzZWdf bGVuKTsKIAloZHItPm5leHRoZHIgPSBwcm90bzsKQEAgLTgwNiwxMCArODEzLDExIEBAIG91dF9l cnJfcmVsZWFzZToKIAlyZXR1cm4gZXJyOwogfQogCi1pbnQgaXA2X2FwcGVuZF9kYXRhKHN0cnVj dCBzb2NrICpzaywgaW50IGdldGZyYWcodm9pZCAqZnJvbSwgY2hhciAqdG8sIGludCBvZmZzZXQs IGludCBsZW4sIGludCBvZGQsIHN0cnVjdCBza19idWZmICpza2IpLAotCQkgICAgdm9pZCAqZnJv bSwgaW50IGxlbmd0aCwgaW50IHRyYW5zaGRybGVuLAotCQkgICAgaW50IGhsaW1pdCwgc3RydWN0 IGlwdjZfdHhvcHRpb25zICpvcHQsIHN0cnVjdCBmbG93aSAqZmwsIHN0cnVjdCBydDZfaW5mbyAq cnQsCi0JCSAgICB1bnNpZ25lZCBpbnQgZmxhZ3MpCitpbnQgaXA2X2FwcGVuZF9kYXRhKHN0cnVj dCBzb2NrICpzaywgaW50IGdldGZyYWcodm9pZCAqZnJvbSwgY2hhciAqdG8sCisJaW50IG9mZnNl dCwgaW50IGxlbiwgaW50IG9kZCwgc3RydWN0IHNrX2J1ZmYgKnNrYiksCisJdm9pZCAqZnJvbSwg aW50IGxlbmd0aCwgaW50IHRyYW5zaGRybGVuLAorCWludCBobGltaXQsIGludCB0Y2xhc3MsIHN0 cnVjdCBpcHY2X3R4b3B0aW9ucyAqb3B0LCBzdHJ1Y3QgZmxvd2kgKmZsLAorCXN0cnVjdCBydDZf aW5mbyAqcnQsIHVuc2lnbmVkIGludCBmbGFncykKIHsKIAlzdHJ1Y3QgaW5ldF9zb2NrICppbmV0 ID0gaW5ldF9zayhzayk7CiAJc3RydWN0IGlwdjZfcGluZm8gKm5wID0gaW5ldDZfc2soc2spOwpA QCAtODQ3LDYgKzg1NSw3IEBAIGludCBpcDZfYXBwZW5kX2RhdGEoc3RydWN0IHNvY2sgKnNrLCBp bnQKIAkJbnAtPmNvcmsucnQgPSBydDsKIAkJaW5ldC0+Y29yay5mbCA9ICpmbDsKIAkJbnAtPmNv cmsuaG9wX2xpbWl0ID0gaGxpbWl0OworCQlucC0+Y29yay50Y2xhc3MgPSB0Y2xhc3M7CiAJCWlu ZXQtPmNvcmsuZnJhZ3NpemUgPSBtdHUgPSBkc3RfcG10dSgmcnQtPnUuZHN0KTsKIAkJaW5ldC0+ Y29yay5sZW5ndGggPSAwOwogCQlzay0+c2tfc25kbXNnX3BhZ2UgPSBOVUxMOwpAQCAtMTEzMCw2 ICsxMTM5LDEwIEBAIGludCBpcDZfcHVzaF9wZW5kaW5nX2ZyYW1lcyhzdHJ1Y3Qgc29jayAKIAkK IAkqKHUzMiopaGRyID0gZmwtPmZsNl9mbG93bGFiZWwgfCBodG9ubCgweDYwMDAwMDAwKTsKIAor CS8qIHRyYWZmaWMgY2xhc3MgKi8KKwloZHItPnByaW9yaXR5ID0gKG5wLT5jb3JrLnRjbGFzcz4+ NCkgJiAweGY7CisJaGRyLT5mbG93X2xibFswXSB8PSAobnAtPmNvcmsudGNsYXNzICYgMHhmKTw8 NDsKKwogCWlmIChza2ItPmxlbiA8PSBzaXplb2Yoc3RydWN0IGlwdjZoZHIpICsgSVBWNl9NQVhQ TEVOKQogCQloZHItPnBheWxvYWRfbGVuID0gaHRvbnMoc2tiLT5sZW4gLSBzaXplb2Yoc3RydWN0 IGlwdjZoZHIpKTsKIAllbHNlCmRpZmYgLXJ1TnAgbGludXgtMi42LjExLjEwL25ldC9pcHY2L2lw djZfc29ja2dsdWUuYyBsaW51eC0yLjYuMTEuMTBUMy9uZXQvaXB2Ni9pcHY2X3NvY2tnbHVlLmMK LS0tIGxpbnV4LTIuNi4xMS4xMC9uZXQvaXB2Ni9pcHY2X3NvY2tnbHVlLmMJMjAwNS0wNS0xNiAx MDo1MjowMC4wMDAwMDAwMDAgLTA3MDAKKysrIGxpbnV4LTIuNi4xMS4xMFQzL25ldC9pcHY2L2lw djZfc29ja2dsdWUuYwkyMDA1LTA2LTA4IDExOjA2OjQ3LjAwMDAwMDAwMCAtMDcwMApAQCAtMTE1 LDYgKzExNSwxNSBAQCBleHRlcm4gaW50IGlwNl9tY19tc2ZpbHRlcihzdHJ1Y3Qgc29jayAqCiBl eHRlcm4gaW50IGlwNl9tY19tc2ZnZXQoc3RydWN0IHNvY2sgKnNrLCBzdHJ1Y3QgZ3JvdXBfZmls dGVyICpnc2YsCiAJc3RydWN0IGdyb3VwX2ZpbHRlciBfX3VzZXIgKm9wdHZhbCwgaW50IF9fdXNl ciAqb3B0bGVuKTsKIAorLyoKKyAqIHdhcm4gb2Ygb2Jzb2xldGUgUkZDIDIyOTIgc29ja2V0IEFQ SSB1c2UKKyAqLworc3RhdGljIHZvaWQgd2FybjIyOTIoY2hhciAqb3B0bmFtZSkKK3sKKwlwcmlu dGsoS0VSTl9XQVJOSU5HICJwcm9jZXNzICclcycgaXMgdXNpbmcgb2Jzb2xldGUgJXMgc29ja2V0 IG9wdGlvblxuIiwKKwkJY3VycmVudC0+Y29tbSwgb3B0bmFtZSk7Cit9CisKIAogaW50IGlwdjZf c2V0c29ja29wdChzdHJ1Y3Qgc29jayAqc2ssIGludCBsZXZlbCwgaW50IG9wdG5hbWUsCiAJCSAg ICBjaGFyIF9fdXNlciAqb3B0dmFsLCBpbnQgb3B0bGVuKQpAQCAtMjA4LDMzICsyMTcsNTMgQEAg aW50IGlwdjZfc2V0c29ja29wdChzdHJ1Y3Qgc29jayAqc2ssIGludAogCQlyZXR2ID0gMDsKIAkJ YnJlYWs7CiAKLQljYXNlIElQVjZfUEtUSU5GTzoKKwljYXNlIElQVjZfMjI5MlBLVElORk86CisJ CXdhcm4yMjkyKCJJUFY2X1BLVElORk8iKTsKKwljYXNlIElQVjZfUkVDVlBLVElORk86CisJCW5w LT5yZmMyMjkyID0gb3B0bmFtZSA9PSBJUFY2XzIyOTJQS1RJTkZPOwogCQlucC0+cnhvcHQuYml0 cy5yeGluZm8gPSB2YWxib29sOwogCQlyZXR2ID0gMDsKIAkJYnJlYWs7CiAKIAljYXNlIElQVjZf SE9QTElNSVQ6CisJCXdhcm4yMjkyKCJJUFY2X0hPUExJTUlUIik7CisJY2FzZSBJUFY2X1JFQ1ZI T1BMSU1JVDoKKwkJbnAtPnJmYzIyOTIgPSBvcHRuYW1lID09IElQVjZfSE9QTElNSVQ7CiAJCW5w LT5yeG9wdC5iaXRzLnJ4aGxpbSA9IHZhbGJvb2w7CiAJCXJldHYgPSAwOwogCQlicmVhazsKIAot CWNhc2UgSVBWNl9SVEhEUjoKKwljYXNlIElQVjZfMjI5MlJUSERSOgorCQl3YXJuMjI5MigiSVBW Nl9SVEhEUiIpOworCWNhc2UgSVBWNl9SRUNWUlRIRFI6CiAJCWlmICh2YWwgPCAwIHx8IHZhbCA+ IDIpCiAJCQlnb3RvIGVfaW52YWw7CisJCW5wLT5yZmMyMjkyID0gb3B0bmFtZSA9PSBJUFY2XzIy OTJSVEhEUjsKIAkJbnAtPnJ4b3B0LmJpdHMuc3JjcnQgPSB2YWw7CiAJCXJldHYgPSAwOwogCQli cmVhazsKIAotCWNhc2UgSVBWNl9IT1BPUFRTOgorCWNhc2UgSVBWNl8yMjkySE9QT1BUUzoKKwkJ d2FybjIyOTIoIklQVjZfSE9QT1BUUyIpOworCWNhc2UgSVBWNl9SRUNWSE9QT1BUUzoKKwkJbnAt PnJmYzIyOTIgPSBvcHRuYW1lID09IElQVjZfMjI5MkhPUE9QVFM7CiAJCW5wLT5yeG9wdC5iaXRz LmhvcG9wdHMgPSB2YWxib29sOwogCQlyZXR2ID0gMDsKIAkJYnJlYWs7CiAKLQljYXNlIElQVjZf RFNUT1BUUzoKKwljYXNlIElQVjZfMjI5MkRTVE9QVFM6CisJCXdhcm4yMjkyKCJJUFY2X0RTVE9Q VFMiKTsKKwljYXNlIElQVjZfUkVDVkRTVE9QVFM6CisJCW5wLT5yZmMyMjkyID0gb3B0bmFtZSA9 PSBJUFY2XzIyOTJEU1RPUFRTOwogCQlucC0+cnhvcHQuYml0cy5kc3RvcHRzID0gdmFsYm9vbDsK IAkJcmV0diA9IDA7CiAJCWJyZWFrOwogCisJY2FzZSBJUFY2X1JFQ1ZUQ0xBU1M6CisJCW5wLT5y eG9wdC5iaXRzLnJ4dGNsYXNzID0gdmFsYm9vbDsKKwkJcmV0diA9IDA7CisJCWJyZWFrOworCiAJ Y2FzZSBJUFY2X0ZMT1dJTkZPOgogCQlucC0+cnhvcHQuYml0cy5yeGZsb3cgPSB2YWxib29sOwog CQlyZXR2ID0gMDsKQEAgLTI3NCw3ICszMDMsNyBAQCBpbnQgaXB2Nl9zZXRzb2Nrb3B0KHN0cnVj dCBzb2NrICpzaywgaW50CiAJCW1zZy5tc2dfY29udHJvbGxlbiA9IG9wdGxlbjsKIAkJbXNnLm1z Z19jb250cm9sID0gKHZvaWQqKShvcHQrMSk7CiAKLQkJcmV0diA9IGRhdGFncmFtX3NlbmRfY3Rs KCZtc2csICZmbCwgb3B0LCAmanVuayk7CisJCXJldHYgPSBkYXRhZ3JhbV9zZW5kX2N0bCgmbXNn LCAmZmwsIG9wdCwgJmp1bmssICZqdW5rKTsKIAkJaWYgKHJldHYpCiAJCQlnb3RvIGRvbmU7CiB1 cGRhdGU6CkBAIC02MjAsMjYgKzY0OSw0NSBAQCBpbnQgaXB2Nl9nZXRzb2Nrb3B0KHN0cnVjdCBz b2NrICpzaywgaW50CiAJCXZhbCA9IG5wLT5pcHY2b25seTsKIAkJYnJlYWs7CiAKLQljYXNlIElQ VjZfUEtUSU5GTzoKKwljYXNlIElQVjZfMjI5MlBLVElORk86CisJCXdhcm4yMjkyKCJJUFY2X1BL VElORk8iKTsKKwljYXNlIElQVjZfUkVDVlBLVElORk86CisJCW5wLT5yZmMyMjkyID0gb3B0bmFt ZSA9PSBJUFY2XzIyOTJQS1RJTkZPOwogCQl2YWwgPSBucC0+cnhvcHQuYml0cy5yeGluZm87CiAJ CWJyZWFrOwogCiAJY2FzZSBJUFY2X0hPUExJTUlUOgorCQl3YXJuMjI5MigiSVBWNl9IT1BMSU1J VCIpOworCWNhc2UgSVBWNl9SRUNWSE9QTElNSVQ6CisJCW5wLT5yZmMyMjkyID0gb3B0bmFtZSA9 PSBJUFY2X0hPUExJTUlUOwogCQl2YWwgPSBucC0+cnhvcHQuYml0cy5yeGhsaW07CiAJCWJyZWFr OwogCi0JY2FzZSBJUFY2X1JUSERSOgorCWNhc2UgSVBWNl8yMjkyUlRIRFI6CisJCXdhcm4yMjky KCJJUFY2X1JUSERSIik7CisJY2FzZSBJUFY2X1JFQ1ZSVEhEUjoKKwkJbnAtPnJmYzIyOTIgPSBv cHRuYW1lID09IElQVjZfMjI5MlJUSERSOwogCQl2YWwgPSBucC0+cnhvcHQuYml0cy5zcmNydDsK IAkJYnJlYWs7CiAKLQljYXNlIElQVjZfSE9QT1BUUzoKKwljYXNlIElQVjZfMjI5MkhPUE9QVFM6 CisJCXdhcm4yMjkyKCJJUFY2X0hPUE9QVFMiKTsKKwljYXNlIElQVjZfUkVDVkhPUE9QVFM6CisJ CW5wLT5yZmMyMjkyID0gb3B0bmFtZSA9PSBJUFY2XzIyOTJIT1BPUFRTOwogCQl2YWwgPSBucC0+ cnhvcHQuYml0cy5ob3BvcHRzOwogCQlicmVhazsKIAotCWNhc2UgSVBWNl9EU1RPUFRTOgorCWNh c2UgSVBWNl8yMjkyRFNUT1BUUzoKKwkJd2FybjIyOTIoIklQVjZfRFNUT1BUUyIpOworCWNhc2Ug SVBWNl9SRUNWRFNUT1BUUzoKKwkJbnAtPnJmYzIyOTIgPSBvcHRuYW1lID09IElQVjZfMjI5MkRT VE9QVFM7CiAJCXZhbCA9IG5wLT5yeG9wdC5iaXRzLmRzdG9wdHM7CiAJCWJyZWFrOwogCisJY2Fz ZSBJUFY2X1JFQ1ZUQ0xBU1M6CisJCXZhbCA9IG5wLT5yeG9wdC5iaXRzLnJ4dGNsYXNzOworCQli cmVhazsKKwogCWNhc2UgSVBWNl9GTE9XSU5GTzoKIAkJdmFsID0gbnAtPnJ4b3B0LmJpdHMucnhm bG93OwogCQlicmVhazsKZGlmZiAtcnVOcCBsaW51eC0yLjYuMTEuMTAvbmV0L2lwdjYvcmF3LmMg bGludXgtMi42LjExLjEwVDMvbmV0L2lwdjYvcmF3LmMKLS0tIGxpbnV4LTIuNi4xMS4xMC9uZXQv aXB2Ni9yYXcuYwkyMDA1LTA1LTE2IDEwOjUyOjAwLjAwMDAwMDAwMCAtMDcwMAorKysgbGludXgt Mi42LjExLjEwVDMvbmV0L2lwdjYvcmF3LmMJMjAwNS0wNS0yNCAxNTowOTo0Mi4wMDAwMDAwMDAg LTA3MDAKQEAgLTYxNyw2ICs2MTcsNyBAQCBzdGF0aWMgaW50IHJhd3Y2X3NlbmRtc2coc3RydWN0 IGtpb2NiICppCiAJc3RydWN0IGZsb3dpIGZsOwogCWludCBhZGRyX2xlbiA9IG1zZy0+bXNnX25h bWVsZW47CiAJaW50IGhsaW1pdCA9IC0xOworCWludCB0Y2xhc3MgPSAtMTsKIAl1MTYgcHJvdG87 CiAJaW50IGVycjsKIApAQCAtNzAyLDcgKzcwMyw3IEBAIHN0YXRpYyBpbnQgcmF3djZfc2VuZG1z ZyhzdHJ1Y3Qga2lvY2IgKmkKIAkJbWVtc2V0KG9wdCwgMCwgc2l6ZW9mKHN0cnVjdCBpcHY2X3R4 b3B0aW9ucykpOwogCQlvcHQtPnRvdF9sZW4gPSBzaXplb2Yoc3RydWN0IGlwdjZfdHhvcHRpb25z KTsKIAotCQllcnIgPSBkYXRhZ3JhbV9zZW5kX2N0bChtc2csICZmbCwgb3B0LCAmaGxpbWl0KTsK KwkJZXJyID0gZGF0YWdyYW1fc2VuZF9jdGwobXNnLCAmZmwsIG9wdCwgJmhsaW1pdCwgJnRjbGFz cyk7CiAJCWlmIChlcnIgPCAwKSB7CiAJCQlmbDZfc29ja19yZWxlYXNlKGZsb3dsYWJlbCk7CiAJ CQlyZXR1cm4gZXJyOwpAQCAtNzU4LDYgKzc1OSwxMiBAQCBzdGF0aWMgaW50IHJhd3Y2X3NlbmRt c2coc3RydWN0IGtpb2NiICppCiAJCQlobGltaXQgPSBkc3RfbWV0cmljKGRzdCwgUlRBWF9IT1BM SU1JVCk7CiAJfQogCisJaWYgKHRjbGFzcyA8IDApIHsKKwkJdGNsYXNzID0gbnAtPmNvcmsudGNs YXNzOworCQlpZiAodGNsYXNzIDwgMCkKKwkJCXRjbGFzcyA9IDA7CisJfQorCiAJaWYgKG1zZy0+ bXNnX2ZsYWdzJk1TR19DT05GSVJNKQogCQlnb3RvIGRvX2NvbmZpcm07CiAKQEAgLTc2Niw4ICs3 NzMsOSBAQCBiYWNrX2Zyb21fY29uZmlybToKIAkJZXJyID0gcmF3djZfc2VuZF9oZHJpbmMoc2ss IG1zZy0+bXNnX2lvdiwgbGVuLCAmZmwsIChzdHJ1Y3QgcnQ2X2luZm8qKWRzdCwgbXNnLT5tc2df ZmxhZ3MpOwogCX0gZWxzZSB7CiAJCWxvY2tfc29jayhzayk7Ci0JCWVyciA9IGlwNl9hcHBlbmRf ZGF0YShzaywgaXBfZ2VuZXJpY19nZXRmcmFnLCBtc2ctPm1zZ19pb3YsIGxlbiwgMCwKLQkJCQkJ aGxpbWl0LCBvcHQsICZmbCwgKHN0cnVjdCBydDZfaW5mbyopZHN0LCBtc2ctPm1zZ19mbGFncyk7 CisJCWVyciA9IGlwNl9hcHBlbmRfZGF0YShzaywgaXBfZ2VuZXJpY19nZXRmcmFnLCBtc2ctPm1z Z19pb3YsCisJCQlsZW4sIDAsIGhsaW1pdCwgdGNsYXNzLCBvcHQsICZmbCwgKHN0cnVjdCBydDZf aW5mbyopZHN0LAorCQkJbXNnLT5tc2dfZmxhZ3MpOwogCiAJCWlmIChlcnIpCiAJCQlpcDZfZmx1 c2hfcGVuZGluZ19mcmFtZXMoc2spOwpkaWZmIC1ydU5wIGxpbnV4LTIuNi4xMS4xMC9uZXQvaXB2 Ni91ZHAuYyBsaW51eC0yLjYuMTEuMTBUMy9uZXQvaXB2Ni91ZHAuYwotLS0gbGludXgtMi42LjEx LjEwL25ldC9pcHY2L3VkcC5jCTIwMDUtMDUtMTYgMTA6NTI6MDAuMDAwMDAwMDAwIC0wNzAwCisr KyBsaW51eC0yLjYuMTEuMTBUMy9uZXQvaXB2Ni91ZHAuYwkyMDA1LTA1LTI0IDE1OjExOjU4LjAw MDAwMDAwMCAtMDcwMApAQCAtNjM3LDYgKzYzNyw3IEBAIHN0YXRpYyBpbnQgdWRwdjZfc2VuZG1z ZyhzdHJ1Y3Qga2lvY2IgKmkKIAlpbnQgYWRkcl9sZW4gPSBtc2ctPm1zZ19uYW1lbGVuOwogCWlu dCB1bGVuID0gbGVuOwogCWludCBobGltaXQgPSAtMTsKKwlpbnQgdGNsYXNzID0gLTE7CiAJaW50 IGNvcmtyZXEgPSB1cC0+Y29ya2ZsYWcgfHwgbXNnLT5tc2dfZmxhZ3MmTVNHX01PUkU7CiAJaW50 IGVycjsKIApAQCAtNzU4LDcgKzc1OSw3IEBAIGRvX3VkcF9zZW5kbXNnOgogCQltZW1zZXQob3B0 LCAwLCBzaXplb2Yoc3RydWN0IGlwdjZfdHhvcHRpb25zKSk7CiAJCW9wdC0+dG90X2xlbiA9IHNp emVvZigqb3B0KTsKIAotCQllcnIgPSBkYXRhZ3JhbV9zZW5kX2N0bChtc2csIGZsLCBvcHQsICZo bGltaXQpOworCQllcnIgPSBkYXRhZ3JhbV9zZW5kX2N0bChtc2csIGZsLCBvcHQsICZobGltaXQs ICZ0Y2xhc3MpOwogCQlpZiAoZXJyIDwgMCkgewogCQkJZmw2X3NvY2tfcmVsZWFzZShmbG93bGFi ZWwpOwogCQkJcmV0dXJuIGVycjsKQEAgLTgxMiw2ICs4MTMsMTEgQEAgZG9fdWRwX3NlbmRtc2c6 CiAJCWlmIChobGltaXQgPCAwKQogCQkJaGxpbWl0ID0gZHN0X21ldHJpYyhkc3QsIFJUQVhfSE9Q TElNSVQpOwogCX0KKwlpZiAodGNsYXNzIDwgMCkgeworCQl0Y2xhc3MgPSBucC0+Y29yay50Y2xh c3M7CisJCWlmICh0Y2xhc3MgPCAwKQorCQkJdGNsYXNzID0gMDsKKwl9CiAKIAlpZiAobXNnLT5t c2dfZmxhZ3MmTVNHX0NPTkZJUk0pCiAJCWdvdG8gZG9fY29uZmlybTsKQEAgLTgzMiw5ICs4Mzgs MTAgQEAgYmFja19mcm9tX2NvbmZpcm06CiAKIGRvX2FwcGVuZF9kYXRhOgogCXVwLT5sZW4gKz0g dWxlbjsKLQllcnIgPSBpcDZfYXBwZW5kX2RhdGEoc2ssIGlwX2dlbmVyaWNfZ2V0ZnJhZywgbXNn LT5tc2dfaW92LCB1bGVuLCBzaXplb2Yoc3RydWN0IHVkcGhkciksCi0JCQkgICAgICBobGltaXQs IG9wdCwgZmwsIChzdHJ1Y3QgcnQ2X2luZm8qKWRzdCwKLQkJCSAgICAgIGNvcmtyZXEgPyBtc2ct Pm1zZ19mbGFnc3xNU0dfTU9SRSA6IG1zZy0+bXNnX2ZsYWdzKTsKKwllcnIgPSBpcDZfYXBwZW5k X2RhdGEoc2ssIGlwX2dlbmVyaWNfZ2V0ZnJhZywgbXNnLT5tc2dfaW92LCB1bGVuLAorCQlzaXpl b2Yoc3RydWN0IHVkcGhkciksIGhsaW1pdCwgdGNsYXNzLCBvcHQsIGZsLAorCQkoc3RydWN0IHJ0 Nl9pbmZvKilkc3QsCisJCWNvcmtyZXEgPyBtc2ctPm1zZ19mbGFnc3xNU0dfTU9SRSA6IG1zZy0+ bXNnX2ZsYWdzKTsKIAlpZiAoZXJyKQogCQl1ZHBfdjZfZmx1c2hfcGVuZGluZ19mcmFtZXMoc2sp OwogCWVsc2UgaWYgKCFjb3JrcmVxKQo= --=_mixed 006767FF8825701A_=-- From linville@bilbo.tuxdriver.com Wed Jun 8 12:13:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 12:13:10 -0700 (PDT) Received: from apollo.tuxdriver.com (apollo.tuxdriver.com [24.172.12.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58JD5Xq014443 for ; Wed, 8 Jun 2005 12:13:06 -0700 Received: from bilbo.tuxdriver.com (azure.tuxdriver.com [24.172.12.5]) by apollo.tuxdriver.com (8.12.11/8.12.11) with ESMTP id j58IAfYu012841; Wed, 8 Jun 2005 14:10:41 -0400 Received: from bilbo.tuxdriver.com (localhost.localdomain [127.0.0.1]) by bilbo.tuxdriver.com (8.13.1/8.13.1) with ESMTP id j58JBx0L030992; Wed, 8 Jun 2005 15:12:00 -0400 Received: (from linville@localhost) by bilbo.tuxdriver.com (8.13.1/8.13.1/Submit) id j58JBvWO030991; Wed, 8 Jun 2005 15:11:57 -0400 Date: Wed, 8 Jun 2005 15:11:57 -0400 From: "John W. Linville" To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Cc: jgarzik@pobox.com Subject: [patch 2.6.12-rc6] b44: check link state during open Message-ID: <20050608191156.GA28376@tuxdriver.com> Mail-Followup-To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com, jgarzik@pobox.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-archive-position: 2251 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: linville@tuxdriver.com Precedence: bulk X-list: netdev Content-Length: 791 Lines: 26 Check the link state during b44_open. This closes a 1 HZ window that existed after b44_open ran but before the b44_timer handler ran, during which ethtool would report "Link detected: yes" no matter what the link state actually was. Signed-off-by: John W. Linville --- drivers/net/b44.c | 3 +++ 1 files changed, 3 insertions(+) --- linux-2.6.12-rc6/drivers/net/b44.c.orig 2005-06-08 14:52:35.000000000 -0400 +++ linux-2.6.12-rc6/drivers/net/b44.c 2005-06-08 14:52:43.000000000 -0400 @@ -1285,6 +1285,9 @@ static int b44_open(struct net_device *d b44_init_hw(bp); bp->flags |= B44_FLAG_INIT_COMPLETE; + netif_carrier_off(dev); + b44_check_phy(bp); + spin_unlock_irq(&bp->lock); init_timer(&bp->timer); -- John W. Linville linville@tuxdriver.com From davem@davemloft.net Wed Jun 8 12:44:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 12:44:54 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58JioXq019600 for ; Wed, 8 Jun 2005 12:44:51 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg6Sa-0002Ie-QJ; Wed, 08 Jun 2005 12:43:32 -0700 Date: Wed, 08 Jun 2005 12:43:32 -0700 (PDT) Message-Id: <20050608.124332.85408883.davem@davemloft.net> To: jketreno@linux.intel.com Cc: vda@ilport.com.ua, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <42A7268D.9020402@linux.intel.com> References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> <42A7268D.9020402@linux.intel.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2252 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 930 Lines: 21 From: James Ketrenos Date: Wed, 08 Jun 2005 12:10:37 -0500 > My approach is to make the driver so it supports as many usage models as > possible, leaving policy to other components of the system. I don't see how this kind of firmware load setup handles something like an NFS root over such a device that requires firmware. And let's not mention that I have to setup an initrd to make that work, that's rediculious. This is the kind of crap that happens when drivers in the kernel are not self contained, and need "external stuff" to work properly. It means that simple things like NFS root over the device do not work in a straightforward, simple, and elegant manner. I am likely to always take the position that device firmware belongs in the kernel proper, not via these userland and filesystem loading mechanism, none of which may be even _available_ when we first need to get the device going. From davem@davemloft.net Wed Jun 8 12:48:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 12:48:42 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58JmdXq020252 for ; Wed, 8 Jun 2005 12:48:39 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg6WO-0002JU-Qj; Wed, 08 Jun 2005 12:47:28 -0700 Date: Wed, 08 Jun 2005 12:47:28 -0700 (PDT) Message-Id: <20050608.124728.74558309.davem@davemloft.net> To: belyshev@depni.sinp.msu.ru Cc: netdev@oss.sgi.com Subject: Re: oops with hostap and 2.6.12-rc6-mm1: Kernel BUG at "net/ipv4/tcp_output.c":928 From: "David S. Miller" In-Reply-To: <56hdg93rxb.fsf@depni.sinp.msu.ru> References: <56hdg93rxb.fsf@depni.sinp.msu.ru> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2253 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 126 Lines: 3 Just remove the BUG_ON() assertion in tcp_tso_should_defer(), it's simply not a correct check when FIN is set in the packet. From davej@redhat.com Wed Jun 8 12:50:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 12:50:23 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58JoJXq020848 for ; Wed, 8 Jun 2005 12:50:20 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j58Jn29I019557; Wed, 8 Jun 2005 15:49:02 -0400 Received: from devserv.devel.redhat.com (devserv.devel.redhat.com [172.16.58.1]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j58Jn2O27726; Wed, 8 Jun 2005 15:49:02 -0400 Received: from devserv.devel.redhat.com (localhost.localdomain [127.0.0.1]) by devserv.devel.redhat.com (8.12.11/8.12.11) with ESMTP id j58Jn2dp029198; Wed, 8 Jun 2005 15:49:02 -0400 Received: (from davej@localhost) by devserv.devel.redhat.com (8.12.11/8.12.11/Submit) id j58Jn2cu029192; Wed, 8 Jun 2005 15:49:02 -0400 X-Authentication-Warning: devserv.devel.redhat.com: davej set sender to davej@redhat.com using -f Date: Wed, 8 Jun 2005 15:49:02 -0400 From: Dave Jones To: "David S. Miller" Cc: jketreno@linux.intel.com, vda@ilport.com.ua, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem Message-ID: <20050608194902.GK876@redhat.com> Mail-Followup-To: Dave Jones , "David S. Miller" , jketreno@linux.intel.com, vda@ilport.com.ua, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> <42A7268D.9020402@linux.intel.com> <20050608.124332.85408883.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608.124332.85408883.davem@davemloft.net> User-Agent: Mutt/1.4.1i X-archive-position: 2254 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davej@redhat.com Precedence: bulk X-list: netdev Content-Length: 721 Lines: 19 On Wed, Jun 08, 2005 at 12:43:32PM -0700, David S. Miller wrote: > I am likely to always take the position that device firmware > belongs in the kernel proper, not via these userland and filesystem > loading mechanism, none of which may be even _available_ when > we first need to get the device going. FWIW, I agree, though the licensing of the Intel firmware prevents that iirc. The biggest problem I face with this driver in Fedora kernels is users mismatching firmware rev with the driver version. Another problem that disappears if the two are shipped together. Of course this would then bring out the armchair lawyers on the list and cause another 500 emails debating whether it violates the gpl. Dave From davem@davemloft.net Wed Jun 8 12:55:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 12:55:43 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58JteXq021636 for ; Wed, 8 Jun 2005 12:55:40 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg6d1-0002KA-7h; Wed, 08 Jun 2005 12:54:19 -0700 Date: Wed, 08 Jun 2005 12:54:19 -0700 (PDT) Message-Id: <20050608.125419.18305230.davem@davemloft.net> To: davej@redhat.com Cc: jketreno@linux.intel.com, vda@ilport.com.ua, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <20050608194902.GK876@redhat.com> References: <42A7268D.9020402@linux.intel.com> <20050608.124332.85408883.davem@davemloft.net> <20050608194902.GK876@redhat.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2255 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 579 Lines: 14 From: Dave Jones Date: Wed, 8 Jun 2005 15:49:02 -0400 > FWIW, I agree, though the licensing of the Intel firmware > prevents that iirc. The biggest problem I face with this driver > in Fedora kernels is users mismatching firmware rev with the > driver version. Another problem that disappears if the two > are shipped together. Yep, this license definitely hurts users, in many many ways. If this kind of external firmware requirement existed for a popular SCSI controller, more people would be up in arms about this and understand the issue more clearly. From tgraf@suug.ch Wed Jun 8 13:01:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 13:01:36 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58K1WXq022362 for ; Wed, 8 Jun 2005 13:01:32 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 3D2C71C0F2; Wed, 8 Jun 2005 22:00:48 +0200 (CEST) Date: Wed, 8 Jun 2005 22:00:48 +0200 From: Thomas Graf To: Ralf Baechle Cc: jamal , "David S. Miller" , netdev@oss.sgi.com Subject: Re: netdev munching messages again? Message-ID: <20050608200048.GP20969@postel.suug.ch> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> <20050607.144237.93024273.davem@davemloft.net> <20050608132953.GK20969@postel.suug.ch> <1118238264.6382.43.camel@localhost.localdomain> <20050608160444.GA17777@linux-mips.org> <20050608161314.GM20969@postel.suug.ch> <20050608172809.GF5520@linux-mips.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608172809.GF5520@linux-mips.org> X-archive-position: 2256 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 731 Lines: 14 * Ralf Baechle <20050608172809.GF5520@linux-mips.org> 2005-06-08 18:28 > Whatever it was, it was probably a separate issue from this spamfilter > faux-pas. Note there is a firewall in front of oss.sgi.com which will > accept the SMTP TCP connection only to drop the connection shortly after > if it can't build a connection to the "real" oss. I'm not worried about a refused connections once in a while, I'm worried about that exactly those two messages that have been sitting in _my_ queue due to a refused connection to oss.sgi.com have been succesfully delivered and the others have not. Is there a messages/time limit somwhere? If so what's the limit? I tried with 30 seconds delay between each patch but that didn't help. From davem@davemloft.net Wed Jun 8 13:12:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 13:12:10 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58KC5Xq023261 for ; Wed, 8 Jun 2005 13:12:06 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg6su-0002OO-Ox; Wed, 08 Jun 2005 13:10:44 -0700 Date: Wed, 08 Jun 2005 13:10:44 -0700 (PDT) Message-Id: <20050608.131044.31642070.davem@davemloft.net> To: tgraf@suug.ch Cc: ralf@linux-mips.org, hadi@cyberus.ca, netdev@oss.sgi.com Subject: Re: netdev munching messages again? From: "David S. Miller" In-Reply-To: <20050608200048.GP20969@postel.suug.ch> References: <20050608161314.GM20969@postel.suug.ch> <20050608172809.GF5520@linux-mips.org> <20050608200048.GP20969@postel.suug.ch> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2257 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1219 Lines: 31 From: Thomas Graf Date: Wed, 8 Jun 2005 22:00:48 +0200 > I'm not worried about a refused connections once in a while, I'm > worried about that exactly those two messages that have been > sitting in _my_ queue due to a refused connection to oss.sgi.com > have been succesfully delivered and the others have not. > > Is there a messages/time limit somwhere? If so what's the limit? > I tried with 30 seconds delay between each patch but that didn't > help. I see the delay due to SGI's firewall when I send postings out too, and it's very annoying. The fact that I can send an email faster to Herbert Xu in Australia (several thousand miles away) than oss.sgi.com (which is a short drive away) would be an amusing anecdote if it didn't negatively impact my work. I think it's time to move this list to a more reliable and efficient place. Ralf, thanks for all of your effort and time maintaining oss.sgi.com for our stay as guests via the netdev list. I've created netdev@vger.kernel.org, and folks can start to join up there. If someone knows the appropriate archive maintainers to contact (marc.theaimsgroup.com et al.) please let them know about this transition. It would be much apprecited. From rdunlap@xenotime.net Wed Jun 8 13:32:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 13:32:16 -0700 (PDT) Received: from titan.genwebhost.com (titan.genwebhost.com [209.9.226.66]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58KWBXq024450 for ; Wed, 8 Jun 2005 13:32:11 -0700 Received: from pool-71-111-140-4.ptldor.dsl-w.verizon.net ([71.111.140.4] helo=midway.verizon.net) by titan.genwebhost.com with esmtpa (Exim 4.51) id 1Dg74K-00043A-K9; Wed, 08 Jun 2005 16:22:33 -0400 Date: Wed, 8 Jun 2005 13:30:53 -0700 From: randy_dunlap To: "David S. Miller" Cc: tgraf@suug.ch, ralf@linux-mips.org, hadi@cyberus.ca, netdev@oss.sgi.com Subject: Re: netdev munching messages again? Message-Id: <20050608133053.6976e4a0.rdunlap@xenotime.net> In-Reply-To: <20050608.131044.31642070.davem@davemloft.net> References: <20050608161314.GM20969@postel.suug.ch> <20050608172809.GF5520@linux-mips.org> <20050608200048.GP20969@postel.suug.ch> <20050608.131044.31642070.davem@davemloft.net> Organization: YPO4 X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - titan.genwebhost.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - xenotime.net X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2258 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rdunlap@xenotime.net Precedence: bulk X-list: netdev Content-Length: 548 Lines: 18 On Wed, 08 Jun 2005 13:10:44 -0700 (PDT) David S. Miller wrote: | Ralf, thanks for all of your effort and time maintaining | oss.sgi.com for our stay as guests via the netdev list. Thanks from here also. | I've created netdev@vger.kernel.org, and folks can start | to join up there. If someone knows the appropriate | archive maintainers to contact (marc.theaimsgroup.com | et al.) please let them know about this transition. It | would be much apprecited. I've done that before and now. (made the request, not acked by Hank yet) --- ~Randy From pavel@atrey.karlin.mff.cuni.cz Wed Jun 8 13:34:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 13:34:18 -0700 (PDT) Received: from atrey.karlin.mff.cuni.cz (atrey.karlin.mff.cuni.cz [195.113.31.123]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58KYBXq024921 for ; Wed, 8 Jun 2005 13:34:14 -0700 Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 512) id 3BBF44B442E; Wed, 8 Jun 2005 22:33:01 +0200 (CEST) Date: Wed, 8 Jun 2005 18:29:48 +0200 From: Pavel Machek To: Jirka Bohac Cc: Denis Vlasenko , Pavel Machek , Jeff Garzik , Netdev list , kernel list Subject: Re: ipw2100: firmware problem Message-ID: <20050608162947.GB3969@openzaurus.ucw.cz> References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> <20050608145653.GA8844@dwarf.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608145653.GA8844@dwarf.suse.cz> User-Agent: Mutt/1.3.27i X-archive-position: 2259 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@suse.cz Precedence: bulk X-list: netdev Content-Length: 1271 Lines: 41 Hi! > > Do you want to associate to an AP when your kernel boots, > > _before_ any iwconfig had a chance to configure anything? > > That's strange. > > > > My position is that wifi drivers must start up in an "OFF" mode. > > Do not send anything. Do not join APs or start IBSS. > > Agreed. Me too ;-). > > Thus, no need to load fw in early boot. > > I don't think this is true. Loading the firmware on the first > "ifconfig up" is problematic. Often, people want to rename the > device from ethX/wlanX/... to something stable. This is usually > based on the adapter's MAC address, which is not visible until > the firmware is loaded. > > Prism54 does it this way and it really sucks. You need to bring > the adapter up to load the firmware, then bring it back down, > rename it, and bring it up again. > > Denis: any plans for this to be fixed? > > I agree that drivers should initialize the adapter in the OFF > state, but the firmware needs to be loaded earlier than the > first ifconfig up. > > How about loading the firmware when the first ioctl touches the > device? This way, it would get loaded just before the MAC address > is retrieved. Thats really ugly :-(. Pavel -- 64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms From davem@davemloft.net Wed Jun 8 14:26:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 14:26:03 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58LPxXq027659 for ; Wed, 8 Jun 2005 14:26:00 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg82Y-0004cS-2v; Wed, 08 Jun 2005 14:24:46 -0700 Date: Wed, 08 Jun 2005 14:24:45 -0700 (PDT) Message-Id: <20050608.142445.126941321.davem@davemloft.net> To: shemminger@osdl.org Cc: mitch.a.williams@intel.com, netdev@oss.sgi.com, john.ronciak@intel.com, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: [PATCH] net: allow controlling NAPI weight with sysfs From: "David S. Miller" In-Reply-To: <20050602111437.1c492138@dxpl.pdx.osdl.net> References: <20050602111437.1c492138@dxpl.pdx.osdl.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2260 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 321 Lines: 9 From: Stephen Hemminger Date: Thu, 2 Jun 2005 11:14:37 -0700 > Simple interface to allow changing network device scheduling weight > with sysfs. Please consider this for 2.6.12, since risk/impact is small. > > Signed-off-by: Stephen Hemminger Patch applied, thanks Stephen. From pavel@ucw.cz Wed Jun 8 14:28:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 14:28:24 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58LSGXq028102 for ; Wed, 8 Jun 2005 14:28:18 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id E9EC78B8A7; Wed, 8 Jun 2005 23:27:07 +0200 (CEST) Date: Wed, 8 Jun 2005 23:27:07 +0200 From: Pavel Machek To: James Ketrenos Cc: Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem Message-ID: <20050608212707.GA2535@elf.ucw.cz> References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42A723D3.3060001@linux.intel.com> X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2261 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 1383 Lines: 44 Hi! > >I'm fighting with firmware problem: if ipw2100 is compiled into > >kernel, it is loaded while kernel boots and firmware loader is not yet > >available. That leads to uninitialized (=> useless) adapter. > > > > > We've been looking into whether the initrd can have the firmware affixed > to the end w/ some magic bytes to identify it. If it works, enhancing > the request_firmware to support both hotplug and an initrd approach may > be reasonable. That seems pretty ugly to me... imagine more than one driver does this :-(. > >What's the prefered way to solve this one? Only load firmware when > >user does ifconfig eth1 up? [It is wifi, it looks like it would be > >better to start firmware sooner so that it can associate to the > >AP...]. > > > > > The debate goes back and forth on whether devices should come up only > after they are told, or initialize and start looking for a network as > soon as the module is loaded. > > I lean more toward having the driver just do what it is told, defaulting > to trying to scan and associate so link is ready as soon as possible. > We've added module parameters to change that behavior (disable and > associate for the ipw2100). Having a parameter to control this seems a bit too complex to me. How is insmod ipw2100 enable=1 different from insmod ipw2100 iwconfig eth1 start_scanning_or_whatever ? Pavel From jheffner@psc.edu Wed Jun 8 14:41:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 14:41:42 -0700 (PDT) Received: from mailer1.psc.edu (mailer1.psc.edu [128.182.58.100]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58LfdXq029223 for ; Wed, 8 Jun 2005 14:41:40 -0700 Received: from homer.psc.edu (homer.psc.edu [128.182.61.117]) by mailer1.psc.edu (8.13.3/8.13.3) with ESMTP id j58LeOmZ016033 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Wed, 8 Jun 2005 17:40:25 -0400 (EDT) From: John Heffner Organization: PSC To: "David S. Miller" Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO Date: Wed, 8 Jun 2005 17:40:10 -0400 User-Agent: KMail/1.8 Cc: netdev@oss.sgi.com, herbert@gondor.apana.org.au References: <20050606.210846.07641049.davem@davemloft.net> In-Reply-To: <20050606.210846.07641049.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506081740.11292.jheffner@psc.edu> X-archive-position: 2262 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jheffner@psc.edu Precedence: bulk X-list: netdev Content-Length: 583 Lines: 13 On Tuesday 07 June 2005 12:08 am, David S. Miller wrote: > Some folks, notable the S2IO guys, get performance degradation > from the Super TSO v2 patch (they get it from the first version > as well). It's a real pain to spot what causes such things > in such a huge patch... so I started splitting things up in > a very fine grained manner so we can catch regressions more > precisely. I'm curious about the details of this. Is there decreased performance relative to current TSO? Relative to no TSO? Sending to just one receiver or many, and is it receiver limited? -John From buytenh@wantstofly.org Wed Jun 8 14:46:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 14:46:53 -0700 (PDT) Received: from xi.wantstofly.org (alephnull.demon.nl [212.238.201.82]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58LkkXq029883 for ; Wed, 8 Jun 2005 14:46:47 -0700 Received: by xi.wantstofly.org (Postfix, from userid 500) id E6FDA945E0; Wed, 8 Jun 2005 23:45:40 +0200 (MEST) Date: Wed, 8 Jun 2005 23:45:40 +0200 From: Lennert Buytenhek To: netdev@oss.sgi.com Subject: Re: [PATCH 1/1] sysctl configurable icmperror sourceaddress Message-ID: <20050608214540.GF28207@xi.wantstofly.org> References: <1118136384.10479.15.camel@jeroens.office.netland.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1118136384.10479.15.camel@jeroens.office.netland.nl> User-Agent: Mutt/1.4.1i X-archive-position: 2263 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: buytenh@wantstofly.org Precedence: bulk X-list: netdev Content-Length: 2577 Lines: 82 On Tue, Jun 07, 2005 at 11:26:23AM +0200, J. Simonetti wrote: > This patch alows you to change the source address of icmp error > messages. It applies cleanly to 2.6.11.11 and retains the default > behaviour. > > In the old (default) behaviour icmp error messages are sent with the ip > of the exiting interface. > The new behaviour (when the sysctl variable is toggled on), it will send > the message with the ip of the interface that received the packet that > caused the icmp error. This is the behaviour network administrators will > expect from a router. It makes debugging complicated network layouts > much easier. Also, all 'vendor routers' I know of have the later > behaviour. Can this patch go in, pretty please? Here's the patch again for reference: --- include/linux/sysctl.h.orig 2004-12-24 22:34:58.000000000 +0100 +++ include/linux/sysctl.h 2005-06-07 10:16:39.730585288 +0200 @@ -345,6 +345,7 @@ NET_TCP_MODERATE_RCVBUF=106, NET_TCP_TSO_WIN_DIVISOR=107, NET_TCP_BIC_BETA=108, + NET_IPV4_ICMP_ERRORS_USE_INBOUND_IFADDR=109, }; enum { --- net/ipv4/icmp.c.orig 2004-12-24 22:35:28.000000000 +0100 +++ net/ipv4/icmp.c 2005-06-07 10:15:42.645263576 +0200 @@ -207,6 +207,7 @@ int sysctl_icmp_ratelimit = 1 * HZ; int sysctl_icmp_ratemask = 0x1818; +int sysctl_icmp_errors_use_inbound_ifaddr = 0; /* * ICMP control array. This specifies what to do with each ICMP. @@ -511,8 +512,12 @@ */ saddr = iph->daddr; - if (!(rt->rt_flags & RTCF_LOCAL)) - saddr = 0; + if (!(rt->rt_flags & RTCF_LOCAL)) { + if(sysctl_icmp_errors_use_inbound_ifaddr) + saddr = inet_select_addr(skb_in->dev, 0, RT_SCOPE_LINK); + else + saddr = 0; + } tos = icmp_pointers[type].error ? ((iph->tos & IPTOS_TOS_MASK) | IPTOS_PREC_INTERNETCONTROL) : --- net/ipv4/sysctl_net_ipv4.c.orig 2004-12-24 22:35:23.000000000 +0100 +++ net/ipv4/sysctl_net_ipv4.c 2005-06-07 10:19:44.538490216 +0200 @@ -23,6 +23,7 @@ extern int sysctl_icmp_echo_ignore_all; extern int sysctl_icmp_echo_ignore_broadcasts; extern int sysctl_icmp_ignore_bogus_error_responses; +extern int sysctl_icmp_errors_use_inbound_ifaddr; /* From ip_fragment.c */ extern int sysctl_ipfrag_low_thresh; @@ -396,6 +397,14 @@ .proc_handler = &proc_dointvec }, { + .ctl_name = NET_IPV4_ICMP_ERRORS_USE_INBOUND_IFADDR, + .procname = "icmp_errors_use_inbound_ifaddr", + .data = &sysctl_icmp_errors_use_inbound_ifaddr, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec + }, + { .ctl_name = NET_IPV4_ROUTE, .procname = "route", .maxlen = 0, From jketreno@linux.intel.com Wed Jun 8 14:47:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 14:47:15 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58LlBXq030021 for ; Wed, 8 Jun 2005 14:47:12 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j58Lk6Y7032176; Wed, 8 Jun 2005 21:46:07 GMT Received: from [192.168.1.154] (hdlrvguser-123.hd.intel.com [10.127.52.142]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j58Lk2Yf011245; Wed, 8 Jun 2005 21:46:02 GMT Message-ID: <42A76719.2060700@linux.intel.com> Date: Wed, 08 Jun 2005 16:46:01 -0500 From: James Ketrenos User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050519 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pavel Machek CC: Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> <20050608212707.GA2535@elf.ucw.cz> In-Reply-To: <20050608212707.GA2535@elf.ucw.cz> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2264 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jketreno@linux.intel.com Precedence: bulk X-list: netdev Content-Length: 1395 Lines: 49 Pavel Machek wrote: >>We've been looking into whether the initrd can have the firmware affixed >>to the end w/ some magic bytes to identify it. If it works, enhancing >>the request_firmware to support both hotplug and an initrd approach may >>be reasonable. >> >> > >That seems pretty ugly to me... imagine more than one driver does this >:-(. > > Not ideal, but not *that bad* if there is a standard way to stick the data on the initrd image. Its annoying to have to do it, but it does enable the most usage models and allows the network to be brought up as early as possible--which other components in the system may be relying on. >Having a parameter to control this seems a bit too complex to me. > >How is > >insmod ipw2100 enable=1 > >different from > >insmod ipw2100 >iwconfig eth1 start_scanning_or_whatever > >? > > It defaults to enabled, so you just need to do: insmod ipw2100 and it will auto associate with an open network. For the use case where users want the device to load but not initialize, they can use insmod ipw2100 disable=1 If hotplug and firmware loading worked early in the init sequence, no one would have issue with the current model; it works as users expect it to work. It magically finds and associates to networks, and your network scripts can then kick off DHCP, all with little to no special crafting or utility interfacing. James From davem@davemloft.net Wed Jun 8 14:50:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 14:50:29 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58LoLXq031012 for ; Wed, 8 Jun 2005 14:50:25 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg8Q7-00074o-1b; Wed, 08 Jun 2005 14:49:07 -0700 Date: Wed, 08 Jun 2005 14:49:06 -0700 (PDT) Message-Id: <20050608.144906.77057282.davem@davemloft.net> To: jheffner@psc.edu Cc: netdev@oss.sgi.com, herbert@gondor.apana.org.au Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO From: "David S. Miller" In-Reply-To: <200506081740.11292.jheffner@psc.edu> References: <20050606.210846.07641049.davem@davemloft.net> <200506081740.11292.jheffner@psc.edu> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2265 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1048 Lines: 24 From: John Heffner Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO Date: Wed, 8 Jun 2005 17:40:10 -0400 > On Tuesday 07 June 2005 12:08 am, David S. Miller wrote: > > Some folks, notable the S2IO guys, get performance degradation > > from the Super TSO v2 patch (they get it from the first version > > as well). It's a real pain to spot what causes such things > > in such a huge patch... so I started splitting things up in > > a very fine grained manner so we can catch regressions more > > precisely. > > I'm curious about the details of this. Is there decreased performance > relative to current TSO? Relative to no TSO? Sending to just one receiver > or many, and is it receiver limited? The receiver is limited in their tests. No current generation systems can fill a 10gbit pipe fully, especially at 1500 byte MTU. Performance went down, with both TSO enabled and disabled, compared to not having the patches applied. That's why I'm going through this entire exercise of doing things one piece at a time. From davem@davemloft.net Wed Jun 8 14:50:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 14:50:56 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58LopXq031247 for ; Wed, 8 Jun 2005 14:50:52 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg8Qf-00075G-Jc; Wed, 08 Jun 2005 14:49:41 -0700 Date: Wed, 08 Jun 2005 14:49:41 -0700 (PDT) Message-Id: <20050608.144941.26530856.davem@davemloft.net> To: buytenh@wantstofly.org Cc: netdev@oss.sgi.com Subject: Re: [PATCH 1/1] sysctl configurable icmperror sourceaddress From: "David S. Miller" In-Reply-To: <20050608214540.GF28207@xi.wantstofly.org> References: <1118136384.10479.15.camel@jeroens.office.netland.nl> <20050608214540.GF28207@xi.wantstofly.org> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2266 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 242 Lines: 9 From: Lennert Buytenhek Date: Wed, 8 Jun 2005 23:45:40 +0200 > Can this patch go in, pretty please? > > Here's the patch again for reference: We have a similar sysctl for ARP handling, why don't we make use of it? From herbert@gondor.apana.org.au Wed Jun 8 15:12:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 15:12:26 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58MCKXq000411 for ; Wed, 8 Jun 2005 15:12:21 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1Dg8l8-00036b-00; Thu, 09 Jun 2005 08:10:50 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1Dg8l5-0003Mu-00; Thu, 09 Jun 2005 08:10:47 +1000 Date: Thu, 9 Jun 2005 08:10:47 +1000 To: "David S. Miller" Cc: jheffner@psc.edu, netdev@oss.sgi.com Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO Message-ID: <20050608221047.GA12920@gondor.apana.org.au> References: <20050606.210846.07641049.davem@davemloft.net> <200506081740.11292.jheffner@psc.edu> <20050608.144906.77057282.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608.144906.77057282.davem@davemloft.net> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2267 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 483 Lines: 12 On Wed, Jun 08, 2005 at 02:49:06PM -0700, David S. Miller wrote: > > Performance went down, with both TSO enabled and disabled, compared to > not having the patches applied. What was the receiver running? Was the performance degradation more pronounced with TSO enabled? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From leonid.grossman@neterion.com Wed Jun 8 15:21:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 15:21:40 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58MLZXq001285 for ; Wed, 8 Jun 2005 15:21:36 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j58MJhOC029074; Wed, 8 Jun 2005 18:19:43 -0400 (EDT) Received: from lgt40 ([10.16.16.168]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j58MJYVG002386; Wed, 8 Jun 2005 18:19:35 -0400 (EDT) Message-Id: <200506082219.j58MJYVG002386@guinness.s2io.com> From: "Leonid Grossman" To: "'David S. Miller'" , Cc: , Subject: RE: [PATCH 0/9]: TCP: The Road to Super TSO Date: Wed, 8 Jun 2005 15:19:34 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 In-Reply-To: <20050608.144906.77057282.davem@davemloft.net> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 Thread-Index: AcVsc/eKPrIVD8LCTByU1U/ZF/fzMAAA7bDQ X-Scanned-By: MIMEDefang 2.34 X-archive-position: 2268 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: leonid.grossman@neterion.com Precedence: bulk X-list: netdev Content-Length: 1588 Lines: 46 > -----Original Message----- > From: netdev-bounce@oss.sgi.com > [mailto:netdev-bounce@oss.sgi.com] On Behalf Of David S. Miller > Sent: Wednesday, June 08, 2005 2:49 PM > To: jheffner@psc.edu > Cc: netdev@oss.sgi.com; herbert@gondor.apana.org.au > Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO > > From: John Heffner > Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO > Date: Wed, 8 Jun 2005 17:40:10 -0400 > > > On Tuesday 07 June 2005 12:08 am, David S. Miller wrote: > > > Some folks, notable the S2IO guys, get performance > degradation from > > > the Super TSO v2 patch (they get it from the first > version as well). > > > It's a real pain to spot what causes such things in such a huge > > > patch... so I started splitting things up in a very fine grained > > > manner so we can catch regressions more precisely. > > > > I'm curious about the details of this. Is there decreased > performance > > relative to current TSO? Relative to no TSO? Sending to just one > > receiver or many, and is it receiver limited? > > The receiver is limited in their tests. No current > generation systems can fill a 10gbit pipe fully, especially > at 1500 byte MTU. With jumbo frames, a single receiver can handle 10GbE line rate. With 1500 mtu, a single receiver becomes a bottleneck. I will forward the numbers later today. > > Performance went down, with both TSO enabled and disabled, > compared to not having the patches applied. > > That's why I'm going through this entire exercise of doing > things one piece at a time. > > From pavel@ucw.cz Wed Jun 8 15:36:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 15:36:05 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58MZqXq002290 for ; Wed, 8 Jun 2005 15:35:56 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 3DF2A8B8A7; Thu, 9 Jun 2005 00:34:37 +0200 (CEST) Date: Thu, 9 Jun 2005 00:34:37 +0200 From: Pavel Machek To: James Ketrenos Cc: Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem Message-ID: <20050608223437.GB2614@elf.ucw.cz> References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> <20050608212707.GA2535@elf.ucw.cz> <42A76719.2060700@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42A76719.2060700@linux.intel.com> X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2269 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 1115 Lines: 36 Hi! > >Having a parameter to control this seems a bit too complex to me. > > > >How is > > > >insmod ipw2100 enable=1 > > > >different from > > > >insmod ipw2100 > >iwconfig eth1 start_scanning_or_whatever > > > >? > It defaults to enabled, so you just need to do: > > insmod ipw2100 > > and it will auto associate with an open network. For the use case where > users want the device to load but not initialize, they can use > > insmod ipw2100 disable=1 > > If hotplug and firmware loading worked early in the init sequence, no > one would have issue with the current model; it works as users expect it > to work. It magically finds and associates to networks, and your > network scripts can then kick off DHCP, all with little to no special > crafting or utility interfacing. Actually it would still transmit when user did not want it to. I believe that staying "quiet" is right thing, long-term. And it could solve firmware-loading problems, short-term... How long does association with AP take? Anyway it should be easy to tell driver to associate ASAP, just after the insmod... Pavel From davem@davemloft.net Wed Jun 8 15:47:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 15:47:32 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58MlNXq003171 for ; Wed, 8 Jun 2005 15:47:23 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dg9JL-0000bx-OJ; Wed, 08 Jun 2005 15:46:11 -0700 Date: Wed, 08 Jun 2005 15:46:11 -0700 (PDT) Message-Id: <20050608.154611.71090840.davem@davemloft.net> To: niv@us.ibm.com Cc: rdunlap@xenotime.net, netdev@oss.sgi.com Subject: Re: netdev moved to vger; please subscribe From: "David S. Miller" In-Reply-To: <42A77446.3030102@us.ibm.com> References: <20050608.131044.31642070.davem@davemloft.net> <20050608133053.6976e4a0.rdunlap@xenotime.net> <42A77446.3030102@us.ibm.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2270 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 262 Lines: 7 From: Nivedita Singhvi Date: Wed, 08 Jun 2005 15:42:14 -0700 > Is the intention to phase out netdev@oss.sgi.com gradually? It should not be used any longer as of today. I'm changing oss.sgi.com to vger.kernel.org in every posting I reply to. From buytenh@wantstofly.org Wed Jun 8 16:41:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 16:41:39 -0700 (PDT) Received: from xi.wantstofly.org (alephnull.demon.nl [212.238.201.82]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j58NfZXq010129 for ; Wed, 8 Jun 2005 16:41:36 -0700 Received: by xi.wantstofly.org (Postfix, from userid 500) id ABB29945D5; Thu, 9 Jun 2005 01:40:29 +0200 (MEST) Date: Thu, 9 Jun 2005 01:40:29 +0200 From: Lennert Buytenhek To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: [PATCH 1/1] sysctl configurable icmperror sourceaddress Message-ID: <20050608234029.GJ28207@xi.wantstofly.org> References: <1118136384.10479.15.camel@jeroens.office.netland.nl> <20050608214540.GF28207@xi.wantstofly.org> <20050608.144941.26530856.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608.144941.26530856.davem@davemloft.net> User-Agent: Mutt/1.4.1i X-archive-position: 2271 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: buytenh@wantstofly.org Precedence: bulk X-list: netdev Content-Length: 2039 Lines: 70 On Wed, Jun 08, 2005 at 02:49:41PM -0700, David S. Miller wrote: > > Can this patch go in, pretty please? > > > > Here's the patch again for reference: > > We have a similar sysctl for ARP handling, why don't > we make use of it? Which one do you mean, the arp_{filter,announce,ignore} ones? IMHO this is a very different issue, this patch just selects which source address we use when we reply to a packet with an ICMP. In the case below, if all routers are linux routers, a traceroute from source to dest will show something like this: 1 10.0.0.254 x.xxx ms x.xxx ms x.xxx ms 2 12.0.0.100 x.xxx ms x.xxx ms x.xxx ms 3 12.0.0.1 x.xxx ms x.xxx ms x.xxx ms Whereas we'd prefer seeing this, partly because it makes more sense, partly because a very large fraction of networking hardware does it this way too: 1 10.0.0.254 x.xxx ms x.xxx ms x.xxx ms 2 11.0.0.100 x.xxx ms x.xxx ms x.xxx ms <=== 3 12.0.0.1 x.xxx ms x.xxx ms x.xxx ms I used to work at an ISP and there are a number of practical cases where the linux behavior is rather confusing. cheers, Lennert +------------+ | source | | 10.0.0.1 | +-----+------+ | V | | +-----+------+ | 10.0.0.254 | | | | router1 +---------------------+ | | | | 11.0.0.254 | | +-----+------+ | | | V ^ asymmetric route | | back to source | | +-----+------+ +-----+------+ | 11.0.0.100 | | 13.0.0.13 | | | | | | router2 | | router3 | | | | | | 12.0.0.100 | | 12.0.0.101 | +-----+------+ +-----+------+ | | +------------->--------------+ V | +-----+------+ | dest | | 12.0.0.1 | +------------+ From yi.zhu@intel.com Wed Jun 8 20:37:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 20:37:27 -0700 (PDT) Received: from fmsfmr002.fm.intel.com (fmr14.intel.com [192.55.52.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j593bKXq024029 for ; Wed, 8 Jun 2005 20:37:20 -0700 Received: from fmsfmr101.fm.intel.com (fmsfmr101.fm.intel.com [10.1.192.59]) by fmsfmr002.fm.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j593aDvB009177; Thu, 9 Jun 2005 03:36:14 GMT Received: from fmsmsxvs040.fm.intel.com (fmsmsxvs040.fm.intel.com [132.233.42.124]) by fmsfmr101.fm.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j593aCSa020695; Thu, 9 Jun 2005 03:36:13 GMT Received: from debian.sh.intel.com ([172.16.219.38]) by fmsmsxvs040.fm.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005060820360732582 ; Wed, 08 Jun 2005 20:36:11 -0700 Subject: Re: ipw2100: firmware problem From: Zhu Yi To: Pavel Machek Cc: James Ketrenos , Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" In-Reply-To: <20050608223437.GB2614@elf.ucw.cz> References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> <20050608212707.GA2535@elf.ucw.cz> <42A76719.2060700@linux.intel.com> <20050608223437.GB2614@elf.ucw.cz> Content-Type: text/plain Organization: Intel Corp. Date: Thu, 09 Jun 2005 11:33:10 +0800 Message-Id: <1118287990.10234.114.camel@debian.sh.intel.com> Mime-Version: 1.0 X-Mailer: Evolution 2.2.2 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2272 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yi.zhu@intel.com Precedence: bulk X-list: netdev Content-Length: 661 Lines: 21 Hi Pavel, On Thu, 2005-06-09 at 00:34 +0200, Pavel Machek wrote: > Actually it would still transmit when user did not want it to. I > believe that staying "quiet" is right thing, long-term. And it could > solve firmware-loading problems, short-term... If ipw2100 is built into kernel, you can disable it by kernel parameter ipw2100.disable=1. Then you can enable it with: $ echo 0 > /sys/bus/pci/drivers/ipw2100/*/rf_kill > How long does association with AP take? Anyway it should be easy to > tell driver to associate ASAP, just after the insmod... Are you suggesting by default it is disabled for built into kernel but enabled as a module? Thanks, -yi From Valdis.Kletnieks@vt.edu Wed Jun 8 21:24:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 21:24:48 -0700 (PDT) Received: from h80ad254e.async.vt.edu (h80ad254e.async.vt.edu [128.173.37.78]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j594OhXq026584 for ; Wed, 8 Jun 2005 21:24:44 -0700 Received: from turing-police.cc.vt.edu (localhost [127.0.0.1]) by turing-police.cc.vt.edu (8.13.4/8.13.4) with ESMTP id j594NWts004829; Thu, 9 Jun 2005 00:23:32 -0400 Message-Id: <200506090423.j594NWts004829@turing-police.cc.vt.edu> X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.1-RC3 To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: 2.6.12-rc6-mm1 OOPS in tcp_push_one() From: Valdis.Kletnieks@vt.edu Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1118291011_3588P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Thu, 09 Jun 2005 00:23:32 -0400 X-archive-position: 2273 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Valdis.Kletnieks@vt.edu Precedence: bulk X-list: netdev Content-Length: 3722 Lines: 70 --==_Exmh_1118291011_3588P Content-Type: text/plain; charset=us-ascii Am at home, running PPP over a modem. I get a request to push a patch to a Sourceforge project I have CVS commit access to. So I do a 'export CVS_RSH=ssh' and then do a 'cvs commit', and ker-blammo. Very reproducible - this is about the 6th time in the past hour, always on a cvs-over-ssh. Oddly enough, if I trigger it while logged on one of the virtual consoles, I can c-a-f2 to another console, login, and run dmesg to capture the wreckage. Doing it from an xterm window with the X server running causes the system to lock up hard - I'm betting the oops dies with a lock held, and the X server immediately hangs because it tries to do some networking/socket stuff.... [17179956.772000] Unable to handle kernel paging request at virtual address a56b6b75 [17179956.776000] printing eip: [17179956.788000] c0307f4c [17179956.792000] *pde = 00000000 [17179956.804000] Oops: 0000 [#1] [17179956.804000] PREEMPT [17179956.804000] Modules linked in: ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc tcp_bic orinoco_cs orinoco hermes pcmcia firmware_class ip_conntrack_ftp ipt_pkttype ipt_REJECT ipt_state ip_conntrack ipt_LOG ipt_limit ipt_u32 iptable_filter ip_tables ip6t_LOG ip6t_limit ip6table_filter ip6_tables thermal processor fan button battery ac i8k ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core intel_agp agpgart floppy [17179956.804000] CPU: 0 [17179956.804000] EIP: 0060:[] Not tainted VLI [17179956.804000] EFLAGS: 00010202 (2.6.12-rc6-mm1) [17179956.804000] EIP is at tcp_transmit_skb+0x568/0x62b [17179956.804000] eax: a56b6b6b ebx: 000004df ecx: 00000000 edx: c64de048 [17179956.804000] esi: c1ff2b88 edi: 00000001 ebp: c1c29dd4 esp: c1c29da0 [17179956.804000] ds: 007b es: 007b ss: 0068 [17179956.804000] Process ssh (pid: 2981, threadinfo=c1c28000 task=c3126a60) [17179956.804000] Stack: c64de048 c64de080 00000020 00000000 c1ff2b88 c64deb88 c64debc0 c1c29dd4 [17179956.804000] c1fc71bc c64deb88 c1ff2b88 c64deb88 c64debc0 c1c29df4 c0308f8c 000005a8 [17179956.804000] 0001f742 00000001 c64deb88 c1ff2b88 000005a8 c1c29e5c c02ffba8 00000000 [17179956.804000] Call Trace: [17179956.804000] [] show_stack+0x7a/0x83 [17179956.804000] [] show_registers+0x130/0x1a1 [17179956.804000] [] die+0xd0/0x150 [17179956.804000] [] do_page_fault+0x454/0x5e0 [17179956.804000] [] error_code+0x4f/0x54 [17179956.804000] [] tcp_push_one+0xea/0x190 [17179956.804000] [] tcp_sendmsg+0x71f/0x8f6 [17179956.804000] [] inet_sendmsg+0x3c/0x49 [17179956.804000] [] sock_aio_write+0x117/0x124 [17179956.804000] [] do_sync_write+0x89/0xb9 [17179956.804000] [] vfs_write+0xbe/0x156 [17179956.804000] [] sys_write+0x3b/0x60 [17179956.804000] [] syscall_call+0x7/0xb [17179956.804000] Code: e0 04 8b 80 50 bb 4f c0 ff 40 2c 8b 8e d8 02 00 00 31 d2 8b 45 cc ff 11 89 c7 85 c0 0f 8e c2 00 00 00 8b 55 cc 8b 82 a4 00 00 00 <0f> b7 58 0a 66 c7 86 1a 03 00 00 00 00 80 be 26 02 00 00 01 0f [17179956.804000] This look familiar to anybody? (On a related note, how did tcp_bic get loaded? I requested all the new congestion stuff be built as modules, didn't specifically request any of them to actually be loaded.... --==_Exmh_1118291011_3588P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iD8DBQFCp8RDcC3lWbTT17ARAr0GAKDZ9d3f4vKePmA6jo/wS1nHllRmwwCgx3bv uMi0pYbHo4FqUDVK90C58kg= =GQ+6 -----END PGP SIGNATURE----- --==_Exmh_1118291011_3588P-- From leonid.grossman@neterion.com Wed Jun 8 21:32:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 21:32:13 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j594W7Xq027295 for ; Wed, 8 Jun 2005 21:32:08 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j594UUOC000351; Thu, 9 Jun 2005 00:30:30 -0400 (EDT) Received: from lgt40 ([10.16.16.168]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j594USVG002685; Thu, 9 Jun 2005 00:30:29 -0400 (EDT) Message-Id: <200506090430.j594USVG002685@guinness.s2io.com> From: "Leonid Grossman" To: "'David S. Miller'" Cc: Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO Date: Wed, 8 Jun 2005 21:30:27 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 Thread-Index: AcVse3IKhpQmkW9OSuqMtDinBPPPEgAIuo1QAAM43KA= X-Scanned-By: MIMEDefang 2.34 X-archive-position: 2274 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: leonid.grossman@neterion.com Precedence: bulk X-list: netdev Content-Length: 1134 Lines: 31 FYI, looks like the code in the nine patches is not responsible for the performance drop; the problem is elsewhere in the Super TSO code. -----Original Message----- From: kshaw [mailto:kim.shaw@neterion.com] Sent: Wednesday, June 08, 2005 8:34 PM To: 'David S. Miller' Cc: ravinandan.arakali@neterion.com; leonid.grossman@neterion.com Subject: RE: test Super TSO David, I have applied all 9 patches (6-9 are done by editing source files), I don't see Tx performance drop from any patch, Tx throughput remains at 6.17 Gb/s - 6.18 Gb/s. The following is configuration: 4 way Opteron system .247 with shipping kernel 2.6.12-rc5 as TX system , 4 way Opteron system .226 with kernel 2.6.11.5 as Rx system, NIC driver REL_1-7-7-7_LX installed on both systems, Mtu is set to 9000 on both systems. Systems are connected back to back. Run 8 nttcp connections from Tx system to Rx system for 60 seconds. TSO is set to default on in both systems. I also re-tested the original TSO patch which I used weeks ago, With above same hardware, kernel 2.6.12-rc4 applied with original TSO patch on Tx System, Tx throughput drops to 5.28 Gb/s. From leonid.grossman@neterion.com Wed Jun 8 21:57:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 21:57:13 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j594v5Xq028739 for ; Wed, 8 Jun 2005 21:57:06 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j594tJOC000402; Thu, 9 Jun 2005 00:55:20 -0400 (EDT) Received: from lgt40 ([10.16.16.168]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j594tDVG006745; Thu, 9 Jun 2005 00:55:14 -0400 (EDT) Message-Id: <200506090455.j594tDVG006745@guinness.s2io.com> From: "Leonid Grossman" To: "'Herbert Xu'" , "'David S. Miller'" Cc: , Subject: RE: [PATCH 0/9]: TCP: The Road to Super TSO Date: Wed, 8 Jun 2005 21:55:13 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 In-Reply-To: <20050608221047.GA12920@gondor.apana.org.au> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 Thread-Index: AcVsdxF5GenJI1Q7Rq+LwCJg3iPQ3QANgyvw X-Scanned-By: MIMEDefang 2.34 X-archive-position: 2275 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: leonid.grossman@neterion.com Precedence: bulk X-list: netdev Content-Length: 7723 Lines: 171 Some of the original data that we got couple weeks ago are attached. On questions from Herbert and others: - The performance drop from the "super-TSO" with TSO OFF is marginal, with TSO ON is quite noticeable. - The numbers are similar in back-to-back and switch based (sender vs two receivers) tests. - The numbers are relative; we tested in pci-x 1.0 slots where ~7.5Gbps is a practical bus limit For TCP traffic. In pci-x 2.0 slots, the numbers are ~10Gbps with either Jumbo frames Or with 1500 mtu + TSO, (against two 1500 mtu receivers), at a fraction of a single Opteron %cpu - David is correct, with 1500 mtu the single receiver %cpu becomes a bottleneck; the best throughput with 1500 mtu I've seen was ~5Gbps. So, in B2B setup with 1500 mtu the advantages of TSO are mostly wasted since there is no TSO counterpart on the receive side. Receive side stateless offloads fix this, but we did not get around to deploy these ASIC capabilities in Linux yet. Anyway, here it goes: ---------------------------------------------------------- Configuration: Dual Opteron system .243 as Rx, dual Opteron system .117 as Rx, four way Opteron system .247 as Tx, connected via CISCO switch. .243 and .117 kernel source are patched with tcp_ack26.diff, .247 kernel source are patched with tcp_super_tso.diff. Run 8 nttcp connections from Tx system to each Rx system, Use package size 65535 for mtu 1500, Use package size 300000 for mtu 9000. Tx throughput on four way Opteron system .247: 2.6.12-rc4 Tx-1500 CPU usage Tx-9000 CPU usage ---------------- ------------------ TSO off 2.5Gb/s 55%(note 1) 5.3 40%(3) TSO on 4.0 47%(2) 6.1 35%(4) ========================================================== 2.6.12-rc4 with tcp_super_tso.diff patch Tx-1500 CPU usage Tx-9000 CPU usage ---------------- ------------------ TSO off 2.4Gb/s 60%(5) 5.0 41%(7) TSO on 3.5 45%(6) 5.7 35%(8) Note(1): 1500 tso off top - 08:45:41 up 13 min, 2 users, load average: 2.03, 1.01, 0.54 Tasks: 90 total, 3 running, 87 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 50.7% hi, 49.3% s Cpu1 : 0.3% us, 29.2% sy, 0.0% ni, 53.2% id, 0.0% wa, 0.0% hi, 17.3% s Cpu2 : 0.3% us, 27.9% sy, 0.0% ni, 53.2% id, 0.0% wa, 0.0% hi, 18.6% s Cpu3 : 0.3% us, 23.6% sy, 0.0% ni, 59.5% id, 0.0% wa, 0.0% hi, 16.6% s Mem: 2055724k total, 203172k used, 1852552k free, 24112k buffers Swap: 2040244k total, 0k used, 2040244k free, 79384k cached Note(2): 1500 tso on top - 08:48:19 up 16 min, 2 users, load average: 0.74, 0.71, 0.49 Tasks: 90 total, 4 running, 86 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3% us, 1.1% sy, 0.0% ni, 71.9% id, 0.6% wa, 12.2% hi, 13.8% s Cpu1 : 0.5% us, 7.8% sy, 0.0% ni, 88.2% id, 0.5% wa, 0.0% hi, 3.0% s Cpu2 : 0.4% us, 8.1% sy, 0.0% ni, 88.2% id, 0.5% wa, 0.0% hi, 2.9% s Cpu3 : 0.3% us, 6.6% sy, 0.0% ni, 90.3% id, 0.1% wa, 0.0% hi, 2.7% s Mem: 2055724k total, 203652k used, 1852072k free, 25308k buffers Swap: 2040244k total, 0k used, 2040244k free, 79412k cached Note(3): 9000 off top - 08:58:19 up 6 min, 2 users, load average: 0.88, 0.47, 0.21 Tasks: 90 total, 2 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.8% us, 8.8% sy, 0.0% ni, 79.1% id, 1.4% wa, 3.5% hi, 6.4% si Cpu1 : 0.7% us, 7.3% sy, 0.0% ni, 90.8% id, 0.4% wa, 0.0% hi, 0.8% si Cpu2 : 0.7% us, 6.9% sy, 0.0% ni, 90.8% id, 1.0% wa, 0.1% hi, 0.5% si Cpu3 : 0.5% us, 5.1% sy, 0.0% ni, 93.9% id, 0.3% wa, 0.0% hi, 0.2% si Mem: 2055724k total, 378620k used, 1677104k free, 18400k buffers Swap: 2040244k total, 0k used, 2040244k free, 72788k cached Note(4): 9000 on top - 08:55:55 up 4 min, 2 users, load average: 0.53, 0.26, 0.12 Tasks: 90 total, 2 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 1.1% us, 4.4% sy, 0.0% ni, 89.2% id, 2.2% wa, 1.2% hi, 1.9% si Cpu1 : 1.0% us, 3.5% sy, 0.0% ni, 94.3% id, 0.6% wa, 0.0% hi, 0.5% si Cpu2 : 1.1% us, 6.4% sy, 0.0% ni, 90.7% id, 1.6% wa, 0.1% hi, 0.2% si Cpu3 : 0.8% us, 5.3% sy, 0.0% ni, 93.5% id, 0.4% wa, 0.0% hi, 0.1% si Mem: 2055724k total, 375892k used, 1679832k free, 17424k buffers Swap: 2040244k total, 0k used, 2040244k free, 72676k cached Note (5): 1500 tso off top - 05:54:20 up 10 min, 2 users, load average: 1.48, 0.62, 0.29 Tasks: 91 total, 3 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.5% us, 0.5% sy, 0.0% ni, 81.3% id, 0.9% wa, 7.6% hi, 9.1% Cpu1 : 0.7% us, 5.4% sy, 0.0% ni, 91.5% id, 0.7% wa, 0.0% hi, 1.8% Cpu2 : 0.6% us, 6.5% sy, 0.0% ni, 90.2% id, 0.7% wa, 0.0% hi, 2.0% Cpu3 : 0.4% us, 5.5% sy, 0.0% ni, 92.1% id, 0.2% wa, 0.0% hi, 1.8% Mem: 2055724k total, 204100k used, 1851624k free, 24056k buffers Swap: 2040244k total, 0k used, 2040244k free, 79440k cached Note (6): 1500 tso on top - 05:49:36 up 6 min, 2 users, load average: 1.28, 0.45, 0.18 Tasks: 91 total, 6 running, 85 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 41.5% hi, 58.5% Cpu1 : 0.0% us, 26.4% sy, 0.0% ni, 69.9% id, 0.0% wa, 0.0% hi, 3.7% Cpu2 : 0.3% us, 24.3% sy, 0.0% ni, 71.3% id, 0.0% wa, 0.0% hi, 4.0% Cpu3 : 0.0% us, 19.1% sy, 0.0% ni, 77.6% id, 0.0% wa, 0.0% hi, 3.3% Mem: 2055724k total, 200496k used, 1855228k free, 22644k buffers Swap: 2040244k total, 0k used, 2040244k free, 79288k cached Note (7): 9000 off top - 06:03:13 up 19 min, 2 users, load average: 0.52, 0.27, 0.23 Tasks: 91 total, 3 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3% us, 1.0% sy, 0.0% ni, 86.0% id, 0.5% wa, 5.3% hi, 6.8% Cpu1 : 0.4% us, 4.3% sy, 0.0% ni, 93.7% id, 0.4% wa, 0.0% hi, 1.3% Cpu2 : 0.3% us, 4.5% sy, 0.0% ni, 93.2% id, 0.4% wa, 0.0% hi, 1.5% Cpu3 : 0.2% us, 3.8% sy, 0.0% ni, 94.7% id, 0.1% wa, 0.0% hi, 1.2% Mem: 2055724k total, 399540k used, 1656184k free, 25816k buffers Swap: 2040244k total, 0k used, 2040244k free, 79516k cached Note (8): 9000 on top - 06:05:16 up 21 min, 2 users, load average: 0.79, 0.42, 0.29 Tasks: 91 total, 1 running, 90 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3% us, 2.5% sy, 0.0% ni, 83.5% id, 0.5% wa, 5.6% hi, 7.7% Cpu1 : 0.4% us, 5.1% sy, 0.0% ni, 92.9% id, 0.3% wa, 0.0% hi, 1.3% Cpu2 : 0.3% us, 4.9% sy, 0.0% ni, 92.9% id, 0.4% wa, 0.0% hi, 1.4% Cpu3 : 0.2% us, 3.9% sy, 0.0% ni, 94.7% id, 0.1% wa, 0.0% hi, 1.2% Mem: 2055724k total, 397784k used, 1657940k free, 26892k buffers Swap: 2040244k total, 0k used, 2040244k free, 79528k cached > -----Original Message----- > From: netdev-bounce@oss.sgi.com > [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Herbert Xu > Sent: Wednesday, June 08, 2005 3:11 PM > To: David S. Miller > Cc: jheffner@psc.edu; netdev@oss.sgi.com > Subject: Re: [PATCH 0/9]: TCP: The Road to Super TSO > > On Wed, Jun 08, 2005 at 02:49:06PM -0700, David S. Miller wrote: > > > > Performance went down, with both TSO enabled and disabled, > compared to > > not having the patches applied. > > What was the receiver running? Was the performance > degradation more pronounced with TSO enabled? > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > > From davem@davemloft.net Wed Jun 8 22:59:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 22:59:36 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j595xRXq031452 for ; Wed, 8 Jun 2005 22:59:27 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgG3V-00020K-Bg; Wed, 08 Jun 2005 22:58:17 -0700 Date: Wed, 08 Jun 2005 22:58:17 -0700 (PDT) Message-Id: <20050608.225817.112619139.davem@davemloft.net> To: Valdis.Kletnieks@vt.edu Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 2.6.12-rc6-mm1 OOPS in tcp_push_one() From: "David S. Miller" In-Reply-To: <200506090423.j594NWts004829@turing-police.cc.vt.edu> References: <200506090423.j594NWts004829@turing-police.cc.vt.edu> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2276 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 339 Lines: 9 From: Valdis.Kletnieks@vt.edu Date: Thu, 09 Jun 2005 00:23:32 -0400 > (On a related note, how did tcp_bic get loaded? I requested all the new > congestion stuff be built as modules, didn't specifically request any of > them to actually be loaded.... It's the default algorithm, so when you open the first TCP socket it tries to load it. From vda@ilport.com.ua Wed Jun 8 23:05:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:05:13 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j59656Xq031971 for ; Wed, 8 Jun 2005 23:05:08 -0700 Received: (qmail 20950 invoked by alias); 9 Jun 2005 06:04:01 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 09 Jun 2005 06:03:55 -0000 From: Denis Vlasenko To: "David S. Miller" , jketreno@linux.intel.com Subject: Re: ipw2100: firmware problem Date: Thu, 9 Jun 2005 09:03:49 +0300 User-Agent: KMail/1.5.4 Cc: pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com References: <20050608142310.GA2339@elf.ucw.cz> <42A7268D.9020402@linux.intel.com> <20050608.124332.85408883.davem@davemloft.net> In-Reply-To: <20050608.124332.85408883.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506090903.49295.vda@ilport.com.ua> X-archive-position: 2277 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 1046 Lines: 25 On Wednesday 08 June 2005 22:43, David S. Miller wrote: > From: James Ketrenos > Date: Wed, 08 Jun 2005 12:10:37 -0500 > > > My approach is to make the driver so it supports as many usage models as > > possible, leaving policy to other components of the system. > > I don't see how this kind of firmware load setup handles something > like an NFS root over such a device that requires firmware. You practically cannot avoid having initrd because you are very likely to need to do some wifi config (at least ESSID and mode). Well, you can, but it gets more arcane with each turn (essid=,mode= module parameters - in each and every wifi driver! and what if you need to set basic rates? Yet another parameter?). It's analogous to DHCP+NFS_root boot - we do have ugly hack of kernelspace dhcp client, but IIRC it is agreed that the Right Thing is to do such things in userspace (if needed, via initrd/initramfs). It simply allows for way more options what you can do in early boot if you have early userspace. -- vda From jgarzik@pobox.com Wed Jun 8 23:07:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:07:39 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5967YXq032606 for ; Wed, 8 Jun 2005 23:07:35 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DgGB6-0004Xx-8g; Thu, 09 Jun 2005 06:06:10 +0000 Message-ID: <42A7DC4D.7000008@pobox.com> Date: Thu, 09 Jun 2005 02:06:05 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: jketreno@linux.intel.com, vda@ilport.com.ua, pavel@ucw.cz, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> <42A7268D.9020402@linux.intel.com> <20050608.124332.85408883.davem@davemloft.net> In-Reply-To: <20050608.124332.85408883.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2278 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 1825 Lines: 49 David S. Miller wrote: > From: James Ketrenos > Date: Wed, 08 Jun 2005 12:10:37 -0500 > > >>My approach is to make the driver so it supports as many usage models as >>possible, leaving policy to other components of the system. > > > I don't see how this kind of firmware load setup handles something > like an NFS root over such a device that requires firmware. > > And let's not mention that I have to setup an initrd to make that > work, that's rediculious. > > This is the kind of crap that happens when drivers in the kernel > are not self contained, and need "external stuff" to work properly. > It means that simple things like NFS root over the device do not > work in a straightforward, simple, and elegant manner. Actually these questions has already been answered (though I know you will probably grumble a bit :)) "early userspace" is the long term answer. usr/* in the current kernel tree is a placeholder for an image that is shipped with the kernel, which provides things (kernel modules, userspace programs, firmware) that are necessary to boot. The key is that it is shipped with the kernel source tree, and built into the kernel image, and _dropped from memory_ after init. The entire process should all be automatic. Linus ack'd the current stuff (by merging it, after some discussion) and would have merged klibc too, had it any users. ... As to $current_thread, initramfs exists but "early userspace" does not. There isn't AFAIK any infrastructure to automatically add firmware to initrd in any standard distribution (corrections welcome!). So today, initrd+firmware is just a big pain. Therefore, the easiest way to make things work today is to poke Intel to fix their firmware license so that we can distribute it with the kernel :) Jeff From vda@ilport.com.ua Wed Jun 8 23:11:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:11:17 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j596BBXq000730 for ; Wed, 8 Jun 2005 23:11:13 -0700 Received: (qmail 26515 invoked by alias); 9 Jun 2005 06:10:06 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 09 Jun 2005 06:09:59 -0000 From: Denis Vlasenko To: , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" Subject: Re: ipw2100: firmware problem Date: Thu, 9 Jun 2005 09:09:55 +0300 User-Agent: KMail/1.5.4 References: <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> In-Reply-To: <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506090909.55889.vda@ilport.com.ua> X-archive-position: 2279 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 1715 Lines: 46 On Wednesday 08 June 2005 18:05, Alejandro Bonilla wrote: > > > On Wednesday 08 June 2005 17:23, Pavel Machek wrote: > > > Hi! > > > > > > I'm fighting with firmware problem: if ipw2100 is compiled into > > > kernel, it is loaded while kernel boots and firmware loader > > is not yet > > > available. That leads to uninitialized (=> useless) adapter. > > Pavel, > > I might be lost here but... How is the firmware loaded when using the > ipw2100-1.0.0/patches Kernel patch? > > That patch normally works fine. It might not be the way you kernel > developers would like it, but maybe that could work the same way? > > > > > > > > What's the prefered way to solve this one? Only load firmware when > > > user does ifconfig eth1 up? [It is wifi, it looks like it would be > > > better to start firmware sooner so that it can associate to the > > > AP...]. > > > > Do you want to associate to an AP when your kernel boots, > > _before_ any iwconfig had a chance to configure anything? > > That's strange. > > Currently, when we install the driver, it associates to any open network on > boot. This is good, cause we don't want to be typing the commands all the > time just to associate. It works this way now and is pretty nice. What is so nice about this? That Linux novice user with his new lappie will join a neighbor's network every time he powers up the lappie, even without knowing that? That will be analogous to me plugging ethernet cable into the switch and wanting it to work, without any IP addr config, even without DHCP client. Just power up the box (or modprobe an eth module) and it works! Cool, eh? For some reason, we do not do this for wired nets. Why should wireless be different? -- vda From davem@davemloft.net Wed Jun 8 23:12:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:12:06 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596C2Xq000810 for ; Wed, 8 Jun 2005 23:12:02 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgGFZ-00057c-Tm; Wed, 08 Jun 2005 23:10:45 -0700 Date: Wed, 08 Jun 2005 23:10:45 -0700 (PDT) Message-Id: <20050608.231045.48808548.davem@davemloft.net> To: vda@ilport.com.ua Cc: jketreno@linux.intel.com, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <200506090903.49295.vda@ilport.com.ua> References: <42A7268D.9020402@linux.intel.com> <20050608.124332.85408883.davem@davemloft.net> <200506090903.49295.vda@ilport.com.ua> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2280 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 458 Lines: 12 From: Denis Vlasenko Date: Thu, 9 Jun 2005 09:03:49 +0300 > You practically cannot avoid having initrd because you are very likely > to need to do some wifi config (at least ESSID and mode). I need neither at home. It comes up by default just fine with ifconfig. Your points are valid, but they do not detract from the fact that pieced up drivers, half in the kernel half somewhere else, is total madness. It is a lose for the user. From michael@ellerman.id.au Wed Jun 8 23:13:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:13:10 -0700 (PDT) Received: from ozlabs.org (ozlabs.org [203.10.76.45]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596D5Xq001560 for ; Wed, 8 Jun 2005 23:13:06 -0700 Received: from localhost (localhost [127.0.0.1]) by ozlabs.org (Postfix) with ESMTP id EBD8667A06; Thu, 9 Jun 2005 16:11:59 +1000 (EST) From: Michael Ellerman Reply-To: michael@ellerman.id.au To: Jeff Garzik , Andrew Morton Subject: [PATCH] iseries_veth: Supress spurious WARN_ON() at module unload Date: Thu, 9 Jun 2005 16:11:59 +1000 User-Agent: KMail/1.8 Cc: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506091611.59648.michael@ellerman.id.au> X-archive-position: 2281 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 1651 Lines: 50 Hi Andrew, Jeff, My patch from a few weeks back (now in mainline), called "Cleanup skbs to prevent unregister_netdevice() hanging", can cause our TX timeout code to fire on machines with lots of VLANs (because it takes > 2 seconds between when we stop the queues and when we're finished stopping the connections). When that happens the TX timeout code freaks out and does a WARN_ON() because as far as it's concerned there shouldn't be a TX timeout happening, which is fair enough. I have a "proper" fix for this, which is to a) do refcounting on connections and b) implement a proper ack timer so we don't keep unacked skbs lying around for ever. But for 2.6.12 I propose just supressing the WARN_ON(). Users will still see the "NETDEV WATCHDOG" warning, but that's not nearly as bad as a WARN_ON() which users interpret as an Oops. cheers -- Supress a spurious WARN_ON() in the iseries_veth driver which can occur at module unload on machines with many VLANs. Signed-off-by: Michael Ellerman --- a/drivers/net/iseries_veth.c +++ b/drivers/net/iseries_veth.c @@ -802,12 +802,13 @@ spin_lock_irqsave(&port->pending_gate, flags); + if (! port->pending_lpmask) { + spin_unlock_irqrestore(&port->pending_gate, flags); + return; + } + printk(KERN_WARNING "%s: Tx timeout! Resetting lp connections: %08x\n", dev->name, port->pending_lpmask); - - /* If we've timed out the queue must be stopped, which should - * only ever happen when there is a pending packet. */ - WARN_ON(! port->pending_lpmask); for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { struct veth_lpar_connection *cnx = veth_cnx[i]; From davem@davemloft.net Wed Jun 8 23:14:31 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:14:34 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596EVXq002295 for ; Wed, 8 Jun 2005 23:14:31 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgGI4-0005a3-1u; Wed, 08 Jun 2005 23:13:20 -0700 Date: Wed, 08 Jun 2005 23:13:19 -0700 (PDT) Message-Id: <20050608.231319.95056824.davem@davemloft.net> To: jgarzik@pobox.com Cc: jketreno@linux.intel.com, vda@ilport.com.ua, pavel@ucw.cz, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <42A7DC4D.7000008@pobox.com> References: <42A7268D.9020402@linux.intel.com> <20050608.124332.85408883.davem@davemloft.net> <42A7DC4D.7000008@pobox.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2282 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 593 Lines: 13 From: Jeff Garzik Date: Thu, 09 Jun 2005 02:06:05 -0400 > Therefore, the easiest way to make things work today is to poke Intel to > fix their firmware license so that we can distribute it with the kernel :) Seperate firmware from the in-kernel driver is a big headache for users. As DaveJ has stated, people make mistakes and try to match up the wrong firmware version with the driver and stuff like that. And he should know as he has to deal sift through bogus bug reports from people running into this problem. If it's integrated, there are no problems like this. From davem@davemloft.net Wed Jun 8 23:18:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:18:17 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596IEXq002895 for ; Wed, 8 Jun 2005 23:18:14 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgGLZ-0006Ax-MY; Wed, 08 Jun 2005 23:16:57 -0700 Date: Wed, 08 Jun 2005 23:16:57 -0700 (PDT) Message-Id: <20050608.231657.59660080.davem@davemloft.net> To: vda@ilport.com.ua Cc: abonilla@linuxwireless.org, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <200506090909.55889.vda@ilport.com.ua> References: <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> <200506090909.55889.vda@ilport.com.ua> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2283 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 911 Lines: 25 From: Denis Vlasenko Date: Thu, 9 Jun 2005 09:09:55 +0300 > On Wednesday 08 June 2005 18:05, Alejandro Bonilla wrote: > > Currently, when we install the driver, it associates to any open network on > > boot. This is good, cause we don't want to be typing the commands all the > > time just to associate. It works this way now and is pretty nice. > > What is so nice about this? That Linux novice user with his new lappie > will join a neighbor's network every time he powers up the lappie, > even without knowing that? I love this behavior, because it means that I don't have to do anything special to get my setup at home working. Me caveman Me plug in wireless router Me watch pretty lights Me turn on computer Me up interface Computer work Me no care other cavemen use wireless link Configuration knobs are _madness_. Things should work with minimal intervention and configuration. From vda@ilport.com.ua Wed Jun 8 23:19:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:19:07 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j596IgXq002966 for ; Wed, 8 Jun 2005 23:18:47 -0700 Received: (qmail 27120 invoked by alias); 9 Jun 2005 06:17:34 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 09 Jun 2005 06:17:27 -0000 From: Denis Vlasenko To: "David S. Miller" Subject: Re: ipw2100: firmware problem Date: Thu, 9 Jun 2005 09:17:23 +0300 User-Agent: KMail/1.5.4 Cc: jketreno@linux.intel.com, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com References: <42A7268D.9020402@linux.intel.com> <200506090903.49295.vda@ilport.com.ua> <20050608.231045.48808548.davem@davemloft.net> In-Reply-To: <20050608.231045.48808548.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506090917.23853.vda@ilport.com.ua> X-archive-position: 2284 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 1003 Lines: 28 On Thursday 09 June 2005 09:10, David S. Miller wrote: > From: Denis Vlasenko > Date: Thu, 9 Jun 2005 09:03:49 +0300 > > > You practically cannot avoid having initrd because you are very likely > > to need to do some wifi config (at least ESSID and mode). > > I need neither at home. It comes up by default just fine with > ifconfig. > > Your points are valid, but they do not detract from the fact that > pieced up drivers, half in the kernel half somewhere else, is total > madness. It is a lose for the user. Here I am totally agree. I would like to not have to mess with separate firmware files. I even don't want binary firmware, gimme the source! Sadly, realities are such that we have to live somehow with closed-source firmware. Worse, sometimes it even isn't freely redistributable (vendor did not explicitly allowed that), and thus we have to ship driver, but users must obtain firmware elsewhere themself. Thus so far we cannot avoid having split drivers. -- vda From davem@davemloft.net Wed Jun 8 23:21:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:21:36 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596LYXq003981 for ; Wed, 8 Jun 2005 23:21:34 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgGOr-0006zq-0v; Wed, 08 Jun 2005 23:20:21 -0700 Date: Wed, 08 Jun 2005 23:20:20 -0700 (PDT) Message-Id: <20050608.232020.115912376.davem@davemloft.net> To: vda@ilport.com.ua Cc: jketreno@linux.intel.com, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <200506090917.23853.vda@ilport.com.ua> References: <200506090903.49295.vda@ilport.com.ua> <20050608.231045.48808548.davem@davemloft.net> <200506090917.23853.vda@ilport.com.ua> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2285 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 446 Lines: 12 From: Denis Vlasenko Date: Thu, 9 Jun 2005 09:17:23 +0300 > Sadly, realities are such that we have to live somehow > with closed-source firmware. You have a choice, buy products from friendly vendors. I use prism54 cards in my laptops for this reason. If you like a vendor's products who aren't friendly, try to voice intelligently your opinion to them as to why users will benefit from them fixing the firmware situation. From vda@ilport.com.ua Wed Jun 8 23:26:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:26:51 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j596QPXq004623 for ; Wed, 8 Jun 2005 23:26:40 -0700 Received: (qmail 27785 invoked by alias); 9 Jun 2005 06:25:17 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 09 Jun 2005 06:25:11 -0000 From: Denis Vlasenko To: "David S. Miller" Subject: Re: ipw2100: firmware problem Date: Thu, 9 Jun 2005 09:25:07 +0300 User-Agent: KMail/1.5.4 Cc: abonilla@linuxwireless.org, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com References: <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> In-Reply-To: <20050608.231657.59660080.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506090925.07495.vda@ilport.com.ua> X-archive-position: 2286 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 1315 Lines: 36 On Thursday 09 June 2005 09:16, David S. Miller wrote: > From: Denis Vlasenko > Date: Thu, 9 Jun 2005 09:09:55 +0300 > > > On Wednesday 08 June 2005 18:05, Alejandro Bonilla wrote: > > > Currently, when we install the driver, it associates to any open network on > > > boot. This is good, cause we don't want to be typing the commands all the > > > time just to associate. It works this way now and is pretty nice. > > > > What is so nice about this? That Linux novice user with his new lappie > > will join a neighbor's network every time he powers up the lappie, > > even without knowing that? > > I love this behavior, because it means that I don't have to do > anything special to get my setup at home working. > > Me caveman > Me plug in wireless router > Me watch pretty lights > Me turn on computer > Me up interface You need to up interface? And surely you need ip addr? That's a knob also! :) > Computer work > Me no care other cavemen use wireless link > > Configuration knobs are _madness_. Things should work with minimal > intervention and configuration. Sure. I consider "iwconfig essid MyCave mode managed" a minimal intervention, just like I consider "ip a a dev eth0 123.123.123.2/24 brd +; ip l set dev eth0 up" a miniman interventian if I need IP configured. -- vda From davem@davemloft.net Wed Jun 8 23:29:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:29:41 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596TXXq005261 for ; Wed, 8 Jun 2005 23:29:37 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgGWY-0000m7-R8; Wed, 08 Jun 2005 23:28:18 -0700 Date: Wed, 08 Jun 2005 23:28:18 -0700 (PDT) Message-Id: <20050608.232818.31644993.davem@davemloft.net> To: vda@ilport.com.ua Cc: abonilla@linuxwireless.org, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <200506090925.07495.vda@ilport.com.ua> References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <200506090925.07495.vda@ilport.com.ua> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2287 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 277 Lines: 7 From: Denis Vlasenko Date: Thu, 9 Jun 2005 09:25:07 +0300 > You need to up interface? And surely you need ip addr? That's a knob also! :) There's this thing called DHCP which takes care of this for me. With IPV6, even less configuration can be necessary. From jgarzik@pobox.com Wed Jun 8 23:30:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:31:01 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596UuXq005631 for ; Wed, 8 Jun 2005 23:30:57 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DgGXx-0004YP-TU; Thu, 09 Jun 2005 06:29:46 +0000 Message-ID: <42A7E1D6.3070509@pobox.com> Date: Thu, 09 Jun 2005 02:29:42 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: jketreno@linux.intel.com, vda@ilport.com.ua, pavel@ucw.cz, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem References: <42A7268D.9020402@linux.intel.com> <20050608.124332.85408883.davem@davemloft.net> <42A7DC4D.7000008@pobox.com> <20050608.231319.95056824.davem@davemloft.net> In-Reply-To: <20050608.231319.95056824.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2288 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 972 Lines: 28 David S. Miller wrote: > From: Jeff Garzik > Date: Thu, 09 Jun 2005 02:06:05 -0400 > > >>Therefore, the easiest way to make things work today is to poke Intel to >>fix their firmware license so that we can distribute it with the kernel :) > > > Seperate firmware from the in-kernel driver is a big headache for > users. As DaveJ has stated, people make mistakes and try to match up > the wrong firmware version with the driver and stuff like that. And > he should know as he has to deal sift through bogus bug reports from > people running into this problem. > > If it's integrated, there are no problems like this. Early userspace is (a) shipped with the kernel source tree and (b) linked into vmlinux. That's integrated. The firmware images will be separate from the .c files (as they should be), but the kernel hacker still controls what gets loaded, and when. But like I said, that's where we're going, not where we are now. Jeff From vda@ilport.com.ua Wed Jun 8 23:32:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:32:09 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j596VdXq006000 for ; Wed, 8 Jun 2005 23:31:55 -0700 Received: (qmail 28292 invoked by alias); 9 Jun 2005 06:30:32 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 09 Jun 2005 06:30:26 -0000 From: Denis Vlasenko To: "David S. Miller" Subject: Re: ipw2100: firmware problem Date: Thu, 9 Jun 2005 09:30:22 +0300 User-Agent: KMail/1.5.4 Cc: jketreno@linux.intel.com, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com References: <200506090903.49295.vda@ilport.com.ua> <200506090917.23853.vda@ilport.com.ua> <20050608.232020.115912376.davem@davemloft.net> In-Reply-To: <20050608.232020.115912376.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506090930.22274.vda@ilport.com.ua> X-archive-position: 2289 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 728 Lines: 23 On Thursday 09 June 2005 09:20, David S. Miller wrote: > From: Denis Vlasenko > Date: Thu, 9 Jun 2005 09:17:23 +0300 > > > Sadly, realities are such that we have to live somehow > > with closed-source firmware. > > You have a choice, buy products from friendly vendors. I am trying! So far, I have Prism2.5, Prism54 and acx111 cards, and all of them require closed binary fw. > I use prism54 cards in my laptops for this reason. ?! As far as I remember, it needs a fw and fw is not open... did that change recently? > If you like a vendor's products who aren't friendly, try > to voice intelligently your opinion to them as to why users > will benefit from them fixing the firmware situation. -- vda From davem@davemloft.net Wed Jun 8 23:37:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:37:04 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j596b2Xq006924 for ; Wed, 8 Jun 2005 23:37:02 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgGdp-0003ml-Kw; Wed, 08 Jun 2005 23:35:49 -0700 Date: Wed, 08 Jun 2005 23:35:49 -0700 (PDT) Message-Id: <20050608.233549.122030692.davem@davemloft.net> To: vda@ilport.com.ua Cc: jketreno@linux.intel.com, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <200506090930.22274.vda@ilport.com.ua> References: <200506090917.23853.vda@ilport.com.ua> <20050608.232020.115912376.davem@davemloft.net> <200506090930.22274.vda@ilport.com.ua> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2290 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 326 Lines: 10 From: Denis Vlasenko Date: Thu, 9 Jun 2005 09:30:22 +0300 > On Thursday 09 June 2005 09:20, David S. Miller wrote: > > I use prism54 cards in my laptops for this reason. > > ?! As far as I remember, it needs a fw and fw is not open... > did that change recently? My bad, they are not. You're right :-/ From vda@ilport.com.ua Wed Jun 8 23:38:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 08 Jun 2005 23:38:50 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j596chXq007422 for ; Wed, 8 Jun 2005 23:38:45 -0700 Received: (qmail 29019 invoked by alias); 9 Jun 2005 06:37:36 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 09 Jun 2005 06:37:31 -0000 From: Denis Vlasenko To: "David S. Miller" Subject: Re: ipw2100: firmware problem Date: Thu, 9 Jun 2005 09:37:25 +0300 User-Agent: KMail/1.5.4 Cc: abonilla@linuxwireless.org, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com References: <200506090909.55889.vda@ilport.com.ua> <200506090925.07495.vda@ilport.com.ua> <20050608.232818.31644993.davem@davemloft.net> In-Reply-To: <20050608.232818.31644993.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506090937.25634.vda@ilport.com.ua> X-archive-position: 2291 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 527 Lines: 16 On Thursday 09 June 2005 09:28, David S. Miller wrote: > From: Denis Vlasenko > Date: Thu, 9 Jun 2005 09:25:07 +0300 > > > You need to up interface? And surely you need ip addr? That's a knob also! :) > > There's this thing called DHCP which takes care of this for me. > With IPV6, even less configuration can be necessary. But DHCP does not start by itself, and it shuldn't. You start dhcp client. That is a "minimal config" in this case. Anyway, I think I start trolling... I should stop now. -- vda From wichert@levante.wiggy.net Thu Jun 9 01:37:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 01:37:23 -0700 (PDT) Received: from mx1.wiggy.net (levante.wiggy.net [195.85.225.139]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j598bHXq018303 for ; Thu, 9 Jun 2005 01:37:17 -0700 Received: from wichert by mx1.wiggy.net with local (Exim 4.50) id 1DgIW8-0000LW-J6; Thu, 09 Jun 2005 10:36:00 +0200 Date: Thu, 9 Jun 2005 10:36:00 +0200 From: Wichert Akkerman To: Denis Vlasenko Cc: "David S. Miller" , abonilla@linuxwireless.org, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem Message-ID: <20050609083600.GE1478@wiggy.net> Mail-Followup-To: Denis Vlasenko , "David S. Miller" , abonilla@linuxwireless.org, pavel@ucw.cz, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com References: <200506090909.55889.vda@ilport.com.ua> <200506090925.07495.vda@ilport.com.ua> <20050608.232818.31644993.davem@davemloft.net> <200506090937.25634.vda@ilport.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200506090937.25634.vda@ilport.com.ua> User-Agent: Mutt/1.5.9i X-SA-Exim-Connect-IP: X-archive-position: 2292 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: wichert@wiggy.net Precedence: bulk X-list: netdev Content-Length: 404 Lines: 12 Previously Denis Vlasenko wrote: > But DHCP does not start by itself, and it shuldn't. It does in most modern distros as far as I know. They use ifplugd or a similar tool to monitor link status and configure the interface if a link is detected. Wichert. -- Wichert Akkerman It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. From pavel@ucw.cz Thu Jun 9 03:43:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 03:43:24 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59AhGXq028685 for ; Thu, 9 Jun 2005 03:43:19 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 14DF08B8A7; Thu, 9 Jun 2005 12:42:05 +0200 (CEST) Date: Thu, 9 Jun 2005 12:42:05 +0200 From: Pavel Machek To: "David S. Miller" Cc: vda@ilport.com.ua, abonilla@linuxwireless.org, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem Message-ID: <20050609104205.GD3169@elf.ucw.cz> References: <002901c56c3b$8216cdd0$600cc60a@amer.sykes.com> <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050608.231657.59660080.davem@davemloft.net> X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2293 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 1340 Lines: 39 Hi! > > > Currently, when we install the driver, it associates to any open network on > > > boot. This is good, cause we don't want to be typing the commands all the > > > time just to associate. It works this way now and is pretty nice. > > > > What is so nice about this? That Linux novice user with his new lappie > > will join a neighbor's network every time he powers up the lappie, > > even without knowing that? > > I love this behavior, because it means that I don't have to do > anything special to get my setup at home working. > > Me caveman > Me plug in wireless router > Me watch pretty lights > Me turn on computer > Me up interface > Computer work > Me no care other cavemen use wireless link > > Configuration knobs are _madness_. Things should work with minimal > intervention and configuration. I'm not saying it should not work automagically. But it is wrong to start transmitting on wireless as soon as kernel boots. It should stay quiet in the radio until it is either told to talk or until interface is upped. That way * above still works, only radio chat begins one step later * if you are in environment where you absolutely do not want it to talk on the radio (airplane, BlackHatCon with APs trying to hack you all around), you can make it quiet without needing kernel/module parameters. Pavel From pavel@ucw.cz Thu Jun 9 03:57:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 03:57:38 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59AvUXq029888 for ; Thu, 9 Jun 2005 03:57:32 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id E2AAA8B8A7; Thu, 9 Jun 2005 12:56:19 +0200 (CEST) Date: Thu, 9 Jun 2005 12:56:19 +0200 From: Pavel Machek To: Zhu Yi Cc: James Ketrenos , Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem Message-ID: <20050609105619.GH3169@elf.ucw.cz> References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> <20050608212707.GA2535@elf.ucw.cz> <42A76719.2060700@linux.intel.com> <20050608223437.GB2614@elf.ucw.cz> <1118287990.10234.114.camel@debian.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1118287990.10234.114.camel@debian.sh.intel.com> X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2294 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@suse.cz Precedence: bulk X-list: netdev Content-Length: 827 Lines: 24 Hi! > > Actually it would still transmit when user did not want it to. I > > believe that staying "quiet" is right thing, long-term. And it could > > solve firmware-loading problems, short-term... > > If ipw2100 is built into kernel, you can disable it by kernel parameter > ipw2100.disable=1. Then you can enable it with: > > $ echo 0 > /sys/bus/pci/drivers/ipw2100/*/rf_kill > > > How long does association with AP take? Anyway it should be easy to > > tell driver to associate ASAP, just after the insmod... > > Are you suggesting by default it is disabled for built into kernel but > enabled as a module? I'm suggesting that by default it is disabled (in kernel or as a module) and its automatically enabled during ifconfig up. That way we can drop the kernel parameter and always do the right thing. Pavel From hadi@cyberus.ca Thu Jun 9 05:45:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 05:45:15 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59Cj8Xq007207 for ; Thu, 9 Jun 2005 05:45:09 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1DgMOC-00052A-Gw for netdev@oss.sgi.com; Thu, 09 Jun 2005 08:44:04 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DgMO8-0002HT-MK; Thu, 09 Jun 2005 08:44:00 -0400 Subject: Re: netdev munching messages again? From: jamal Reply-To: hadi@cyberus.ca To: Ralf Baechle Cc: netdev@vger.kernel.org, Thomas Graf , "David S. Miller" , netdev@oss.sgi.com In-Reply-To: <20050609122325.GE4927@linux-mips.org> References: <20050607140842.778143000@axs> <20050607140901.632982000@axs> <20050607213621.GG20969@postel.suug.ch> <20050607.144237.93024273.davem@davemloft.net> <20050608132953.GK20969@postel.suug.ch> <1118238264.6382.43.camel@localhost.localdomain> <20050608160444.GA17777@linux-mips.org> <20050608161314.GM20969@postel.suug.ch> <20050608172809.GF5520@linux-mips.org> <20050608200048.GP20969@postel.suug.ch> <20050609122325.GE4927@linux-mips.org> Content-Type: text/plain Organization: unknown Date: Thu, 09 Jun 2005 08:43:57 -0400 Message-Id: <1118321037.6270.31.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2296 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 646 Lines: 23 On Thu, 2005-09-06 at 13:23 +0100, Ralf Baechle wrote: > On Wed, Jun 08, 2005 at 10:00:48PM +0200, Thomas Graf wrote: > > > Is there a messages/time limit somwhere? If so what's the limit? > > I tried with 30 seconds delay between each patch but that didn't > > help. > > No limits except on very large messages. Ralf, Since DaveM moved the list can you probably put a forwarding to netdev@vger.kernel.org ? We should also probably announce the change of address in the usual suspect lists. BTW, I too would like to join the masses of people who are thankful to all your great efforts. You sir are a hacker and a gentleman. cheers, jamal From ralf@linux-mips.org Thu Jun 9 06:32:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 06:32:25 -0700 (PDT) Received: from bacchus.net.dhis.org (extgw-uk.mips.com [62.254.210.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59DWEXq010622 for ; Thu, 9 Jun 2005 06:32:15 -0700 Received: from dea.linux-mips.net (localhost.localdomain [127.0.0.1]) by bacchus.net.dhis.org (8.13.1/8.13.1) with ESMTP id j59DRXtQ013192; Thu, 9 Jun 2005 14:27:33 +0100 Received: (from ralf@localhost) by dea.linux-mips.net (8.13.1/8.13.1/Submit) id j59DRUx8013191; Thu, 9 Jun 2005 14:27:30 +0100 Date: Thu, 9 Jun 2005 14:27:30 +0100 From: Ralf Baechle To: "David S. Miller" Cc: tgraf@suug.ch, hadi@cyberus.ca, netdev@oss.sgi.com, Andrew Morton Subject: [PATCH] Re: netdev munching messages again? Message-ID: <20050609132730.GF4927@linux-mips.org> References: <20050608161314.GM20969@postel.suug.ch> <20050608172809.GF5520@linux-mips.org> <20050608200048.GP20969@postel.suug.ch> <20050608.131044.31642070.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20050608.131044.31642070.davem@davemloft.net> User-Agent: Mutt/1.4.1i X-archive-position: 2297 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: netdev Content-Length: 7878 Lines: 293 On Wed, Jun 08, 2005 at 01:10:44PM -0700, David S. Miller wrote: Akpm: Patch with netdev address change below. > I see the delay due to SGI's firewall when I send postings > out too, and it's very annoying. Correction on that, while the bloody PIX is still installed, it's SMTP proxy is disabled. > The fact that I can send an email faster to Herbert Xu > in Australia (several thousand miles away) than oss.sgi.com > (which is a short drive away) would be an amusing anecdote > if it didn't negatively impact my work. Performance is partially explained by running sendsnail - against my resistance. > I think it's time to move this list to a more reliable and > efficient place. > > Ralf, thanks for all of your effort and time maintaining > oss.sgi.com for our stay as guests via the netdev list. > > I've created netdev@vger.kernel.org, and folks can start > to join up there. How about I simply give you the old subscriber list and turn netdev@oss into a forward or autoresponder? > If someone knows the appropriate > archive maintainers to contact (marc.theaimsgroup.com > et al.) please let them know about this transition. It > would be much apprecited. There are archives on oss itself at http://oss.sgi.com/archives/netdev as well. Ralf Documentation/networking/vortex.txt | 2 - MAINTAINERS | 48 ++++++++++++++++++------------------ drivers/net/r8169.c | 2 - net/sched/act_api.c | 2 - 4 files changed, 27 insertions(+), 27 deletions(-) Index: linux-cvs/drivers/net/r8169.c =================================================================== --- linux-cvs.orig/drivers/net/r8169.c 2005-05-20 12:29:58.000000000 +0100 +++ linux-cvs/drivers/net/r8169.c 2005-06-09 14:21:10.000000000 +0100 @@ -415,7 +415,7 @@ struct work_struct task; }; -MODULE_AUTHOR("Realtek and the Linux r8169 crew "); +MODULE_AUTHOR("Realtek and the Linux r8169 crew "); MODULE_DESCRIPTION("RealTek RTL-8169 Gigabit Ethernet driver"); module_param_array(media, int, &num_media, 0); module_param(rx_copybreak, int, 0); Index: linux-cvs/Documentation/networking/vortex.txt =================================================================== --- linux-cvs.orig/Documentation/networking/vortex.txt 2004-03-11 16:46:40.000000000 +0000 +++ linux-cvs/Documentation/networking/vortex.txt 2005-06-09 14:21:10.000000000 +0100 @@ -12,7 +12,7 @@ Please report problems to one or more of: Andrew Morton - Netdev mailing list + Netdev mailing list Linux kernel mailing list Please note the 'Reporting and Diagnosing Problems' section at the end Index: linux-cvs/net/sched/act_api.c =================================================================== --- linux-cvs.orig/net/sched/act_api.c 2005-05-20 12:30:23.000000000 +0100 +++ linux-cvs/net/sched/act_api.c 2005-06-09 14:21:10.000000000 +0100 @@ -881,7 +881,7 @@ link_p[RTM_GETACTION-RTM_BASE].dumpit = tc_dump_action; } - printk("TC classifier action (bugs to netdev@oss.sgi.com cc " + printk("TC classifier action (bugs to netdev@vger.kernel.org cc " "hadi@cyberus.ca)\n"); return 0; } Index: linux-cvs/MAINTAINERS =================================================================== --- linux-cvs.orig/MAINTAINERS 2005-05-20 12:29:29.000000000 +0100 +++ linux-cvs/MAINTAINERS 2005-06-09 14:21:10.000000000 +0100 @@ -73,7 +73,7 @@ 3C359 NETWORK DRIVER P: Mike Phillips M: mikep@linuxtr.net -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org L: linux-tr@linuxtr.net W: http://www.linuxtr.net S: Maintained @@ -81,13 +81,13 @@ 3C505 NETWORK DRIVER P: Philip Blundell M: philb@gnu.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained 3CR990 NETWORK DRIVER P: David Dillow M: dave@thedillows.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained 3W-XXXX ATA-RAID CONTROLLER DRIVER @@ -130,7 +130,7 @@ 8169 10/100/1000 GIGABIT ETHERNET DRIVER P: Francois Romieu M: romieu@fr.zoreil.com -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained 8250/16?50 (AND CLONE UARTS) SERIAL DRIVER @@ -143,7 +143,7 @@ 8390 NETWORK DRIVERS [WD80x3/SMC-ELITE, SMC-ULTRA, NE2000, 3C503, etc.] P: Paul Gortmaker M: p_gortmaker@yahoo.com -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained A2232 SERIAL BOARD DRIVER @@ -326,7 +326,7 @@ ARPD SUPPORT P: Jonathan Layes -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained ASUS ACPI EXTRAS DRIVER @@ -700,7 +700,7 @@ DIGI RIGHTSWITCH NETWORK DRIVER P: Rick Richardson -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org W: http://www.digi.com S: Orphaned @@ -806,7 +806,7 @@ ETHEREXPRESS-16 NETWORK DRIVER P: Philip Blundell M: philb@gnu.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained ETHERNET BRIDGE @@ -869,7 +869,7 @@ FRAME RELAY DLCI/FRAD (Sangoma drivers too) P: Mike McLagan M: mike.mclagan@linux.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained FREEVXFS FILESYSTEM @@ -1209,7 +1209,7 @@ IPX NETWORK LAYER P: Arnaldo Carvalho de Melo M: acme@conectiva.com.br -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained IRDA SUBSYSTEM @@ -1476,7 +1476,7 @@ P: Manish Lachwani M: Manish_Lachwani@pmc-sierra.com L: linux-mips@linux-mips.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Supported MATROX FRAMEBUFFER DRIVER @@ -1586,13 +1586,13 @@ M: akpm@osdl.org P: Jeff Garzik M: jgarzik@pobox.com -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained NETWORKING [GENERAL] P: Networking Team -M: netdev@oss.sgi.com -L: netdev@oss.sgi.com +M: netdev@vger.kernel.org +L: netdev@vger.kernel.org S: Maintained NETWORKING [IPv4/IPv6] @@ -1608,7 +1608,7 @@ M: yoshfuji@linux-ipv6.org P: Patrick McHardy M: kaber@coreworks.de -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained IPVS @@ -1628,7 +1628,7 @@ P: Jan-Pascal van Best and Andreas Mohr M: Jan-Pascal van Best M: Andreas Mohr <100.30936@germany.net> -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained NINJA SCSI-3 / NINJA SCSI-32Bi (16bit/CardBus) PCMCIA SCSI HOST ADAPTER DRIVER @@ -1670,7 +1670,7 @@ M: p2@ace.ulyssis.student.kuleuven.ac.be P: Mike Phillips M: mikep@linuxtr.net -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org L: linux-tr@linuxtr.net W: http://www.linuxtr.net S: Maintained @@ -1777,7 +1777,7 @@ PCNET32 NETWORK DRIVER P: Thomas Bogendörfer M: tsbogend@alpha.franken.de -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained PHRAM MTD DRIVER @@ -1789,7 +1789,7 @@ POSIX CLOCKS and TIMERS P: George Anzinger M: george@mvista.com -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Supported PNP SUPPORT @@ -1824,7 +1824,7 @@ PRISM54 WIRELESS DRIVER P: Prism54 Development Team M: prism54-private@prism54.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org W: http://prism54.org S: Maintained @@ -2041,7 +2041,7 @@ P: Daniele Venzano M: venza@brownhat.org W: http://www.brownhat.org/sis900.html -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained SIS FRAMEBUFFER DRIVER @@ -2100,7 +2100,7 @@ SONIC NETWORK DRIVER P: Thomas Bogendoerfer M: tsbogend@alpha.franken.de -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained SONY VAIO CONTROL DEVICE DRIVER @@ -2157,7 +2157,7 @@ SPX NETWORK LAYER P: Jay Schulist M: jschlst@samba.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Supported SRM (Alpha) environment access @@ -2236,7 +2236,7 @@ TOKEN-RING NETWORK DRIVER P: Mike Phillips M: mikep@linuxtr.net -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org L: linux-tr@linuxtr.net W: http://www.linuxtr.net S: Maintained From ak@muc.de Thu Jun 9 06:49:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 06:49:07 -0700 (PDT) Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59Dn4Xq011716 for ; Thu, 9 Jun 2005 06:49:05 -0700 Received: by one.firstfloor.org (Postfix, from userid 502) id 170BFD033E; Thu, 9 Jun 2005 15:47:53 +0200 (CEST) To: Hans Henrik Happe Subject: Re: PROBLEM: High TCP latency References: <200506061135.09869.hhh@imada.sdu.dk> From: Andi Kleen Cc: netdev@oss.sgi.com Date: Thu, 09 Jun 2005 15:47:53 +0200 In-Reply-To: <200506061135.09869.hhh@imada.sdu.dk> (Hans Henrik Happe's message of "Mon, 6 Jun 2005 11:35:09 +0200") Message-ID: User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2298 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev Content-Length: 56 Lines: 4 Try echo 1 > /proc/sys/net/ipv4/tcp_low_latency -Andi From ak@muc.de Thu Jun 9 06:57:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 06:57:26 -0700 (PDT) Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59DvLXq012825 for ; Thu, 9 Jun 2005 06:57:21 -0700 Received: by one.firstfloor.org (Postfix, from userid 502) id 7B2BED033E; Thu, 9 Jun 2005 15:56:15 +0200 (CEST) To: James Ketrenos Cc: Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> From: Andi Kleen Date: Thu, 09 Jun 2005 15:56:15 +0200 In-Reply-To: <42A723D3.3060001@linux.intel.com> (James Ketrenos's message of "Wed, 08 Jun 2005 11:58:59 -0500") Message-ID: User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2299 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev Content-Length: 479 Lines: 13 James Ketrenos writes: >> > We've been looking into whether the initrd can have the firmware affixed > to the end w/ some magic bytes to identify it. If it works, enhancing > the request_firmware to support both hotplug and an initrd approach may > be reasonable. That space is already used in common distributions for replacement DSDTs I guess at some point we will need a file system in there, but - oops - we already have one, dont we? :) -Andi From hhh@imada.sdu.dk Thu Jun 9 07:05:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 07:05:37 -0700 (PDT) Received: from berlioz.imada.sdu.dk (berlioz.imada.sdu.dk [130.225.128.12]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59E5YXq013744 for ; Thu, 9 Jun 2005 07:05:34 -0700 Received: from localhost (localhost [127.0.0.1]) by localhost.imada.sdu.dk (Postfix) with ESMTP id 3620A6276C; Thu, 9 Jun 2005 16:04:28 +0200 (CEST) Received: from berlioz.imada.sdu.dk ([127.0.0.1]) by localhost (berlioz.imada.sdu.dk [127.0.0.1]) (amavisd-new, port 10025) with ESMTP id 20552-09; Thu, 9 Jun 2005 14:04:27 +0000 (UTC) Received: from [139.91.76.194] (unknown [139.91.76.194]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by berlioz.imada.sdu.dk (Postfix) with ESMTP id 9051D62743; Thu, 9 Jun 2005 16:04:27 +0200 (CEST) From: Hans Henrik Happe To: netdev@vger.kernel.org Subject: Re: PROBLEM: High TCP latency Date: Thu, 9 Jun 2005 16:04:29 +0200 User-Agent: KMail/1.7.2 References: <200506061135.09869.hhh@imada.sdu.dk> In-Reply-To: Cc: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506091604.30412.hhh@imada.sdu.dk> X-archive-position: 2300 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hhh@imada.sdu.dk Precedence: bulk X-list: netdev Content-Length: 138 Lines: 5 On Thursday 09 June 2005 15:47, you wrote: > > Try echo 1 > /proc/sys/net/ipv4/tcp_low_latency I have already tried that. Doesn't help. From abonilla@linuxwireless.org Thu Jun 9 07:33:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 07:33:18 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59EXFXq015292 for ; Thu, 9 Jun 2005 07:33:15 -0700 Received: from WCRSJO2KPAB047 ([200.9.49.66]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with SMTP id j59EVusQ018689; Thu, 9 Jun 2005 10:31:56 -0400 Reply-To: From: "Alejandro Bonilla" To: "'Denis Vlasenko'" , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" Subject: RE: ipw2100: firmware problem Date: Thu, 9 Jun 2005 08:31:51 -0600 Message-ID: <002a01c56cff$fb64ba70$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <200506090909.55889.vda@ilport.com.ua> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478 Importance: Normal X-archive-position: 2301 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev Content-Length: 849 Lines: 31 > What is so nice about this? That Linux novice user with his new lappie > will join a neighbor's network every time he powers up the lappie, > even without knowing that? > > That will be analogous to me plugging ethernet cable into the > switch and > wanting it to work, without any IP addr config, even without > DHCP client. > Just power up the box (or modprobe an eth module) and it > works! Cool, eh? > You want things one way, I like them in another way. Whoever makes this decision should just know that we would like to have an option to make it load with or without the ASSOC on. James already said to use the options ipw2100 disable=1 if you don't want it to associate everytime on boot. At the end, who decides this? .Alejandro > For some reason, we do not do this for wired nets. Why should wireless > be different? > -- > vda > From Valdis.Kletnieks@vt.edu Thu Jun 9 08:24:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 08:25:07 -0700 (PDT) Received: from turing-police.cc.vt.edu (turing-police.cc.vt.edu [128.173.14.107]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59FOvXq021618 for ; Thu, 9 Jun 2005 08:24:58 -0700 Received: from turing-police.cc.vt.edu (localhost [127.0.0.1]) by turing-police.cc.vt.edu (8.13.4/8.13.4) with ESMTP id j59FNmsr008443; Thu, 9 Jun 2005 11:23:48 -0400 Message-Id: <200506091523.j59FNmsr008443@turing-police.cc.vt.edu> X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.1-RC3 To: "David S. Miller" Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: tcp_bic (was Re: 2.6.12-rc6-mm1 OOPS in tcp_push_one() In-Reply-To: Your message of "Wed, 08 Jun 2005 22:58:17 PDT." <20050608.225817.112619139.davem@davemloft.net> From: Valdis.Kletnieks@vt.edu References: <200506090423.j594NWts004829@turing-police.cc.vt.edu> <20050608.225817.112619139.davem@davemloft.net> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1118330628_3931P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Thu, 09 Jun 2005 11:23:48 -0400 X-archive-position: 2302 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Valdis.Kletnieks@vt.edu Precedence: bulk X-list: netdev Content-Length: 2212 Lines: 74 --==_Exmh_1118330628_3931P Content-Type: text/plain; charset="us-ascii" Content-Id: <8426.1118330609.1@turing-police.cc.vt.edu> On Wed, 08 Jun 2005 22:58:17 PDT, "David S. Miller" said: > From: Valdis.Kletnieks@vt.edu > Date: Thu, 09 Jun 2005 00:23:32 -0400 > > > (On a related note, how did tcp_bic get loaded? I requested all the new > > congestion stuff be built as modules, didn't specifically request any of > > them to actually be loaded.... > > It's the default algorithm, so when you open the first TCP > socket it tries to load it. Ahh.. I was reading the Kconfig, which says this: menu "TCP congestion control" # TCP Reno is builtin (required as fallback) config TCP_CONG_BIC tristate "Binary Increase Congestion (BIC) control" depends on INET default y and I built with: CONFIG_TCP_CONG_BIC=m CONFIG_TCP_CONG_WESTWOOD=m CONFIG_TCP_CONG_HTCP=m CONFIG_TCP_CONG_HSTCP=m CONFIG_TCP_CONG_HYBLA=m CONFIG_TCP_CONG_VEGAS=m CONFIG_TCP_CONG_SCALABLE=m so what I *expected* was a kernel with Reno built-in, and the others as modules if I got ambitious and loaded one or another. How do people feel about this: --- linux-2.6.12-rc6-mm1/net/ipv4/Kconfig.bic 2005-06-07 12:55:41.000000000 -0400 +++ linux-2.6.12-rc6-mm1/net/ipv4/Kconfig 2005-06-09 11:12:26.000000000 -0400 @@ -425,6 +425,10 @@ config TCP_CONG_BIC increase provides TCP friendliness. See http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/ + This is the default TCP congestion control and the kernel will + attempt to load it if possible. If it is unable to initialize + tcp_bic, the TCP Reno algorithms will be used as a fallback. + config TCP_CONG_WESTWOOD tristate "TCP Westwood+" select IP_TCPDIAG (although that *still* doesn't document what's really going on with the tcp_init_congestion_control() function, and how that sysctl value interacts with things.... --==_Exmh_1118330628_3931P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iD8DBQFCqF8EcC3lWbTT17ARAk5GAKCzjeGzsCyi0hBWcMQo9FK4k0sytgCg5TDe uwHHr63Nw5oH4/5oVZuQ+RE= =AB5D -----END PGP SIGNATURE----- --==_Exmh_1118330628_3931P-- From shemminger@osdl.org Thu Jun 9 09:16:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 09:16:45 -0700 (PDT) Received: from smtp.osdl.org (smtp.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59GGeXq024657 for ; Thu, 9 Jun 2005 09:16:41 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j59GFTjA014740 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 9 Jun 2005 09:15:29 -0700 Received: from unknown-215.office.pdx.osdl.net (unknown-215.office.pdx.osdl.net [10.8.0.215]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j59GFSrg008781; Thu, 9 Jun 2005 09:15:28 -0700 Date: Thu, 9 Jun 2005 09:15:28 -0700 From: Stephen Hemminger To: Valdis.Kletnieks@vt.edu Cc: "David S. Miller" , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: tcp_bic (was Re: 2.6.12-rc6-mm1 OOPS in tcp_push_one() Message-ID: <20050609091528.1bc1940e@unknown-215.office.pdx.osdl.net> In-Reply-To: <200506091523.j59FNmsr008443@turing-police.cc.vt.edu> References: <200506090423.j594NWts004829@turing-police.cc.vt.edu> <20050608.225817.112619139.davem@davemloft.net> <200506091523.j59FNmsr008443@turing-police.cc.vt.edu> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2303 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 789 Lines: 15 This was all changed in 2.6.12-rc6-tcp1 which is the next version going into -mm. The default congestion control will be the last one registered (LIFO); so if you built everything as modules. the default will be reno. If you build with the default's from Kconfig, bic will be builtin (not a module) and it will end up the default. If you really want a particular default value then you will need to set it with a sysctl. If you use a sysctl, the module will be autoloaded if needed and you will get the expected protocol. If you ask for an unknown congestion method, then the sysctl attempt will fail. If you remove a tcp congestion control module, then you will get the next available one. Since reno can not be built as a module, and can not be deleted, it will always be available. From ralf@linux-mips.org Thu Jun 9 09:23:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 09:23:43 -0700 (PDT) Received: from bacchus.net.dhis.org (extgw-uk.mips.com [62.254.210.129]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59GNZXq025453 for ; Thu, 9 Jun 2005 09:23:36 -0700 Received: from dea.linux-mips.net (localhost.localdomain [127.0.0.1]) by bacchus.net.dhis.org (8.13.1/8.13.1) with ESMTP id j59GIcuf020104; Thu, 9 Jun 2005 17:18:38 +0100 Received: (from ralf@localhost) by dea.linux-mips.net (8.13.1/8.13.1/Submit) id j59GIWKq020103; Thu, 9 Jun 2005 17:18:32 +0100 Date: Thu, 9 Jun 2005 17:18:32 +0100 From: Ralf Baechle To: "David S. Miller" Cc: tgraf@suug.ch, hadi@cyberus.ca, netdev@oss.sgi.com, Marcelo Tosatti Subject: [PATCH] Re: netdev munching messages again? Message-ID: <20050609161832.GJ4927@linux-mips.org> References: <20050608161314.GM20969@postel.suug.ch> <20050608172809.GF5520@linux-mips.org> <20050608200048.GP20969@postel.suug.ch> <20050608.131044.31642070.davem@davemloft.net> <20050609132730.GF4927@linux-mips.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050609132730.GF4927@linux-mips.org> User-Agent: Mutt/1.4.1i X-archive-position: 2304 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ralf@linux-mips.org Precedence: bulk X-list: netdev Content-Length: 2030 Lines: 68 Change the address of netdev in 2.4 also. Documentation/networking/vortex.txt | 2 +- MAINTAINERS | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) Index: linux-cvs-2.4/Documentation/networking/vortex.txt =================================================================== --- linux-cvs-2.4.orig/Documentation/networking/vortex.txt 2002-06-26 23:35:01.000000000 +0100 +++ linux-cvs-2.4/Documentation/networking/vortex.txt 2005-06-09 14:53:43.000000000 +0100 @@ -12,7 +12,7 @@ Please report problems to one or more of: Andrew Morton - Netdev mailing list + Netdev mailing list Linux kernel mailing list Please note the 'Reporting and Diagnosing Problems' section at the end Index: linux-cvs-2.4/MAINTAINERS =================================================================== --- linux-cvs-2.4.orig/MAINTAINERS 2005-05-05 10:36:01.000000000 +0100 +++ linux-cvs-2.4/MAINTAINERS 2005-06-09 14:53:43.000000000 +0100 @@ -116,7 +116,7 @@ 8169 10/100/1000 GIGABIT ETHERNET DRIVER P: Francois Romieu M: romieu@fr.zoreil.com -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained 8250/16?50 (AND CLONE UARTS) SERIAL DRIVER @@ -1186,7 +1186,7 @@ P: Manish Lachwani M: Manish_Lachwani@pmc-sierra.com L: linux-mips@linux-mips.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Supported MARVELL YUKON / SYSKONNECT DRIVER @@ -1315,7 +1315,7 @@ NETWORKING [GENERAL] P: Networking Team -M: netdev@oss.sgi.com +M: netdev@vger.kernel.org L: linux-net@vger.kernel.org S: Maintained @@ -1332,7 +1332,7 @@ M: yoshfuji@linux-ipv6.org P: Patrick McHardy M: kaber@coreworks.de -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org S: Maintained NFS CLIENT @@ -1529,7 +1529,7 @@ PRISM54 WIRELESS DRIVER P: Prism54 Development Team M: prism54-private@prism54.org -L: netdev@oss.sgi.com +L: netdev@vger.kernel.org W: http://prism54.org S: Maintained From rahulhsaxena@gmail.com Thu Jun 9 12:07:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 12:07:57 -0700 (PDT) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.203]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59J7rXq000325 for ; Thu, 9 Jun 2005 12:07:54 -0700 Received: by zproxy.gmail.com with SMTP id 34so180175nzf for ; Thu, 09 Jun 2005 12:06:45 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=mlhFi115450AKEZC5WDQ8nwSVgXUdsH+Cyr7GfgXElGm5x7Iq6um5UIoY4sU9JiE0GOzuE9kYi9W5yY+Iz+qYc8BsDJm2qNGkHAtaHNU8OljKJD2B0Do/5vgF4WyKPM0CLaYmswaLL3uQSenv6GHJtb3bSlXZpHcws2i4sQQkBU= Received: by 10.36.9.5 with SMTP id 5mr632411nzi; Thu, 09 Jun 2005 12:06:45 -0700 (PDT) Received: by 10.36.4.6 with HTTP; Thu, 9 Jun 2005 12:06:45 -0700 (PDT) Message-ID: <4532f31705060912065c2917ef@mail.gmail.com> Date: Fri, 10 Jun 2005 00:36:45 +0530 From: Rahul Hari Reply-To: rahul.hari@cse06.itbhu.org To: tgraf@suug.ch, hadi@znyx.com, netdev@oss.sgi.com, diffserv-general-request@lists.sourceforge.net Subject: Tools for observing the effect of changes to sch_gred.c Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j59J7rXq000325 X-archive-position: 2305 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rahulhsaxena@gmail.com Precedence: bulk X-list: netdev Content-Length: 863 Lines: 28 Hi, I am a newbie to kernel programming and am making changes to sch_gred.c so that the first virtual queue gets absolute priority while dequeuing (starving the others) and the other virtual queues dequeue the packets following a wrr algorithm. Are there any tools that might help me in testing the effect of my changes ... or i have to emulate the entire router,server,client setup to test the effects. Regards, Rahul -- ---------------------- "The fear you let build up in your mind is worse than the situation that actually exists" from "who moved my cheese" --------------------------------------------------------------------------------- Rahul Hari Senior Under Grad. Student, Department of CSE, ITBHU, Varanasi. Ph: +91-9845347020 rahul.hari@cse06.itbhu.org ------------------------------------------------------------------------------------------ From marcelo.tosatti@cyclades.com Thu Jun 9 12:50:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 12:50:09 -0700 (PDT) Received: from parcelfarce.linux.theplanet.co.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59Jo1Xq007668 for ; Thu, 9 Jun 2005 12:50:06 -0700 Received: from [127.0.0.1] (helo=logos.cnet) by parcelfarce.linux.theplanet.co.uk with esmtp (Exim 4.43) id 1DgT24-0003VV-Hz; Thu, 09 Jun 2005 20:49:48 +0100 Received: by logos.cnet (Postfix, from userid 500) id 45265123173; Thu, 9 Jun 2005 12:00:26 -0300 (BRT) Date: Thu, 9 Jun 2005 12:00:26 -0300 From: Marcelo Tosatti To: Manfred Schwarb Cc: linux-kernel@vger.kernel.org, davem@redhat.com, netdev@oss.sgi.com, herbert@gondor.apana.org.au Subject: Re: 2.4.30-hf1 do_IRQ stack overflows Message-ID: <20050609150026.GA7900@logos.cnet> References: <20050511124640.GE8541@logos.cnet> <13943.1118147881@www19.gmx.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <13943.1118147881@www19.gmx.net> User-Agent: Mutt/1.5.5.1i X-archive-position: 2306 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: marcelo.tosatti@cyclades.com Precedence: bulk X-list: netdev Content-Length: 6387 Lines: 145 Hi, On Tue, Jun 07, 2005 at 02:38:01PM +0200, Manfred Schwarb wrote: > > > > > > Hi Manfred, > > > > On Wed, May 11, 2005 at 10:15:02AM +0200, Manfred Schwarb wrote: > > > Hi, > > > with recent versions of the 2.4 kernel (Vanilla), I get an increasing > > amount of do_IRQ stack overflows. > > > This night, I got 3 of them. > > > With 2.4.28 I got an overflow about twice a year, with 2.4.29 nearly > > once a month and with > > > 2.4.30 nearly every day 8-(( > > > > The system is getting dangerously close to an actual stack overflow, which > > would > > crash the system. > > > > "do_IRQ: stack overflow: " indicates how many bytes are still available. > > > > The traces show huge networking execution paths. > > > > It seems you are using some packet scheduler (CONFIG_NET_SCHED)? Pretty > > much all > > traces show functions from sch_generic.c. Can you disable that for a test? > > > > Sorry to bother you again, but the problem didn't vanish completely. > This morning, I caught another one. I built a new kernel with > CONFIG_NET_SCHED=n as suggested, uptime is now 25 days, and the following > is the first do_IRQ since then (ksymoops -i): > > Jun 7 03:55:01 tp-meteodat7 kernel: f3238830 00000280 f49e7b80 00000000 > 00000042 cca1388e f4116980 f17aa000 > Jun 7 03:55:01 tp-meteodat7 kernel: c010d948 00000042 f4116980 > 00000000 cca1388e f4116980 f17aa000 00000042 > Jun 7 03:55:01 tp-meteodat7 kernel: 00000018 f61d0018 ffffff14 > c023a039 00000010 00000246 ee5ea480 00000000 > Jun 7 03:55:01 tp-meteodat7 kernel: Call Trace: [call_do_IRQ+5/13] > [skb_copy_and_csum_dev+73/256] > [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256445916/96] > [qdisc_restart+114/432] [dev_queue_xmit+383/880] > Jun 7 03:55:01 tp-meteodat7 kernel: Call Trace: [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Warning (Oops_read): Code line not seen, dumping what data is available Do you have the "do_IRQ stack overflow" output and the amount of bytes left it informs? > Trace; c010d948 > Trace; c023a039 > Trace; f90df5dc <[8139too]rtl8139_start_xmit+6c/180> > Trace; c0248402 > Trace; c023cc7f > Trace; c02561a8 > Trace; c02560f0 > Trace; c02560f0 I can't explain the "ip_finish_output2+0" entries. Odd. > Trace; c024760e > Trace; c02560f0 > Trace; c025492e > Trace; c02560f0 > Trace; c0256315 > Trace; c0256240 > Trace; c0256240 > Trace; c024760e > Trace; c0256240 > Trace; c0254d0d > Trace; c0256240 > Trace; c026daf0 > Trace; c0267c99 > Trace; c026a6f4 > Trace; c0259370 > Trace; c0259370 > Trace; c02661ca > Trace; c026edaa > Trace; c026f48e > Trace; c025174f > Trace; c02515f0 > Trace; c024760e > Trace; c02515f0 > Trace; c0251790 > Trace; c02510df > Trace; c02515f0 > Trace; c0251790 > Trace; c0251969 > Trace; c0251790 > Trace; c024760e > Trace; c0251790 > Trace; c02514b8 > Trace; c0251790 > Trace; c023d4d5 > Trace; c023d5a3 > Trace; c023d73a > Trace; c01254c6 > Trace; c010b094 > Trace; c010d948 I dont see any huge stack consumers on this callchain. David, Herbert, any clues what might be going on here? From davem@davemloft.net Thu Jun 9 12:54:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 12:54:50 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59JsjXq008237 for ; Thu, 9 Jun 2005 12:54:45 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgT5g-0006fT-I3; Thu, 09 Jun 2005 12:53:24 -0700 Date: Thu, 09 Jun 2005 12:53:24 -0700 (PDT) Message-Id: <20050609.125324.88476545.davem@davemloft.net> To: pavel@ucw.cz Cc: vda@ilport.com.ua, abonilla@linuxwireless.org, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <20050609104205.GD3169@elf.ucw.cz> References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2307 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 603 Lines: 16 From: Pavel Machek Date: Thu, 9 Jun 2005 12:42:05 +0200 > I'm not saying it should not work automagically. But it is wrong to > start transmitting on wireless as soon as kernel boots. It should stay > quiet in the radio until it is either told to talk or until interface > is upped. I agree. There is a similar problem in the Acenic driver, it brings the link up and receives broadcast packets as soon as the driver is loaded. Mostly this is because the driver inits the chip and registers the IRQ handler at probe time, whereas nearly every other driver does this at ->open() time. From afleming@freescale.com Thu Jun 9 13:06:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 13:06:26 -0700 (PDT) Received: from az33egw01.freescale.net (az33egw01.freescale.net [192.88.158.102]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59K6GXq009172 for ; Thu, 9 Jun 2005 13:06:19 -0700 Received: from az33smr02.freescale.net (az33smr02.freescale.net [10.64.34.200]) by az33egw01.freescale.net (8.12.11/az33egw01) with ESMTP id j59KAtoV004170; Thu, 9 Jun 2005 13:10:55 -0700 (MST) Received: from mailserv2.am.freescale.net (mailserv2.am.freescale.net [10.82.65.62]) by az33smr02.freescale.net (8.13.1/8.13.0) with ESMTP id j59K8Wm7001843; Thu, 9 Jun 2005 15:08:33 -0500 (CDT) Received: from cde-tx32-ldt113.am.freescale.net (cde-tx32-ldt113.am.freescale.net [10.82.107.148]) by mailserv2.am.freescale.net (8.13.3/8.13.3) with ESMTP id j59K57tn008137; Thu, 9 Jun 2005 15:05:07 -0500 (CDT) Received: from cde-tx32-ldt113.sps.mot.com (localhost [127.0.0.1]) by cde-tx32-ldt113.am.freescale.net (Postfix) with ESMTP id A274FC2DD1; Thu, 9 Jun 2005 15:05:06 -0500 (CDT) Received: from localhost (afleming@localhost) by cde-tx32-ldt113.sps.mot.com (8.12.11/8.12.11/Submit) with ESMTP id j59K554b031949; Thu, 9 Jun 2005 15:05:06 -0500 X-Authentication-Warning: cde-tx32-ldt113.sps.mot.com: afleming owned process doing -bs Date: Thu, 9 Jun 2005 15:05:03 -0500 (CDT) From: Andy Fleming X-X-Sender: afleming@cde-tx32-ldt113.sps.mot.com To: Netdev Cc: galak@freescale.com, Jeff Garzik Subject: [patch] 8548 support for eTSEC Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2308 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: afleming@freescale.com Precedence: bulk X-list: netdev Content-Length: 66994 Lines: 2068 This patch adds support for eTSEC features on the 8548: * TCP/IP/UDP checksumming and verification * VLAN tag insertion/extraction * Larger multicast hash-table * Padding to align IP headers Also added: * msg lvl support * Some whitespace cleanup Index: drivers/net/gianfar.c =================================================================== RCS file: /proj/ppc/sysperf/cvsroot/cvs_root/pq38/linux-2.6/drivers/net/gianfar.c,v retrieving revision 1.1.1.1 retrieving revision 1.3 diff -u -r1.1.1.1 -r1.3 --- drivers/net/gianfar.c 21 Apr 2005 00:03:14 -0000 1.1.1.1 +++ drivers/net/gianfar.c 7 Jun 2005 22:48:50 -0000 1.3 @@ -1,4 +1,4 @@ -/* +/* * drivers/net/gianfar.c * * Gianfar Ethernet Driver @@ -22,10 +22,9 @@ * B-V +1.62 * * Theory of operation - * This driver is designed for the Triple-speed Ethernet - * controllers on the Freescale 8540/8560 integrated processors, - * as well as the Fast Ethernet Controller on the 8540. - * + * This driver is designed for the non-CPM ethernet controllers + * on the 85xx and 83xx family of integrated processors + * * The driver is initialized through platform_device. Structures which * define the configuration needed by the board are defined in a * board structure in arch/ppc/platforms (though I do not @@ -39,12 +38,12 @@ * * The Gianfar Ethernet Controller uses a ring of buffer * descriptors. The beginning is indicated by a register - * pointing to the physical address of the start of the ring. - * The end is determined by a "wrap" bit being set in the + * pointing to the physical address of the start of the ring. + * The end is determined by a "wrap" bit being set in the * last descriptor of the ring. * * When a packet is received, the RXF bit in the - * IEVENT register is set, triggering an interrupt when the + * IEVENT register is set, triggering an interrupt when the * corresponding bit in the IMASK register is also set (if * interrupt coalescing is active, then the interrupt may not * happen immediately, but will wait until either a set number @@ -52,7 +51,7 @@ * interrupt handler will signal there is work to be done, and * exit. Without NAPI, the packet(s) will be handled * immediately. Both methods will start at the last known empty - * descriptor, and process every subsequent descriptor until there + * descriptor, and process every subsequent descriptor until there * are none left with data (NAPI will stop after a set number of * packets to give time to other tasks, but will eventually * process all the packets). The data arrives inside a @@ -83,9 +82,13 @@ #include #include #include +#include #include #include #include +#include +#include +#include #include #include @@ -123,7 +126,7 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu); static irqreturn_t gfar_error(int irq, void *dev_id, struct pt_regs *regs); static irqreturn_t gfar_transmit(int irq, void *dev_id, struct pt_regs *regs); -irqreturn_t gfar_receive(int irq, void *dev_id, struct pt_regs *regs); +static irqreturn_t gfar_receive(int irq, void *dev_id, struct pt_regs *regs); static irqreturn_t gfar_interrupt(int irq, void *dev_id, struct pt_regs *regs); static irqreturn_t phy_interrupt(int irq, void *dev_id, struct pt_regs *regs); static void gfar_phy_change(void *data); @@ -139,9 +142,12 @@ #ifdef CONFIG_GFAR_NAPI static int gfar_poll(struct net_device *dev, int *budget); #endif -static int gfar_clean_rx_ring(struct net_device *dev, int rx_work_limit); +int gfar_clean_rx_ring(struct net_device *dev, int rx_work_limit); static int gfar_process_frame(struct net_device *dev, struct sk_buff *skb, int length); static void gfar_phy_startup_timer(unsigned long data); +static void gfar_vlan_rx_register(struct net_device *netdev, + struct vlan_group *grp); +static void gfar_vlan_rx_kill_vid(struct net_device *netdev, uint16_t vid); extern struct ethtool_ops gfar_ethtool_ops; @@ -149,6 +155,13 @@ MODULE_DESCRIPTION("Gianfar Ethernet Driver"); MODULE_LICENSE("GPL"); +int gfar_uses_fcb(struct gfar_private *priv) +{ + if (priv->vlan_enable || priv->rx_csum_enable) + return 1; + else + return 0; +} static int gfar_probe(struct device *device) { u32 tempval; @@ -159,7 +172,6 @@ struct resource *r; int idx; int err = 0; - int dev_ethtool_ops = 0; einfo = (struct gianfar_platform_data *) pdev->dev.platform_data; @@ -265,15 +277,69 @@ dev->mtu = 1500; dev->set_multicast_list = gfar_set_multi; - /* Index into the array of possible ethtool - * ops to catch all 4 possibilities */ - if((priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_RMON) == 0) - dev_ethtool_ops += 1; + dev->ethtool_ops = &gfar_ethtool_ops; + + if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_CSUM) { + priv->rx_csum_enable = 1; + dev->features |= NETIF_F_IP_CSUM; + } else + priv->rx_csum_enable = 0; + + priv->vlgrp = NULL; + + if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_VLAN) { + dev->vlan_rx_register = gfar_vlan_rx_register; + dev->vlan_rx_kill_vid = gfar_vlan_rx_kill_vid; + + dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX; + + priv->vlan_enable = 1; + } + + if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_EXTENDED_HASH) { + priv->extended_hash = 1; + priv->hash_width = 9; + + priv->hash_regs[0] = &priv->regs->igaddr0; + priv->hash_regs[1] = &priv->regs->igaddr1; + priv->hash_regs[2] = &priv->regs->igaddr2; + priv->hash_regs[3] = &priv->regs->igaddr3; + priv->hash_regs[4] = &priv->regs->igaddr4; + priv->hash_regs[5] = &priv->regs->igaddr5; + priv->hash_regs[6] = &priv->regs->igaddr6; + priv->hash_regs[7] = &priv->regs->igaddr7; + priv->hash_regs[8] = &priv->regs->gaddr0; + priv->hash_regs[9] = &priv->regs->gaddr1; + priv->hash_regs[10] = &priv->regs->gaddr2; + priv->hash_regs[11] = &priv->regs->gaddr3; + priv->hash_regs[12] = &priv->regs->gaddr4; + priv->hash_regs[13] = &priv->regs->gaddr5; + priv->hash_regs[14] = &priv->regs->gaddr6; + priv->hash_regs[15] = &priv->regs->gaddr7; + + } else { + priv->extended_hash = 0; + priv->hash_width = 8; + + priv->hash_regs[0] = &priv->regs->gaddr0; + priv->hash_regs[1] = &priv->regs->gaddr1; + priv->hash_regs[2] = &priv->regs->gaddr2; + priv->hash_regs[3] = &priv->regs->gaddr3; + priv->hash_regs[4] = &priv->regs->gaddr4; + priv->hash_regs[5] = &priv->regs->gaddr5; + priv->hash_regs[6] = &priv->regs->gaddr6; + priv->hash_regs[7] = &priv->regs->gaddr7; + } + + if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_PADDING) + priv->padding = DEFAULT_PADDING; + else + priv->padding = 0; - if((priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_COALESCE) == 0) - dev_ethtool_ops += 2; + dev->hard_header_len += priv->padding; - dev->ethtool_ops = gfar_op_array[dev_ethtool_ops]; + if (dev->features & NETIF_F_IP_CSUM) + dev->hard_header_len += GMAC_FCB_LEN; priv->rx_buffer_size = DEFAULT_RX_BUFFER_SIZE; #ifdef CONFIG_GFAR_BUFSTASH @@ -289,6 +355,9 @@ priv->rxcount = DEFAULT_RXCOUNT; priv->rxtime = DEFAULT_RXTIME; + /* Enable most messages by default */ + priv->msg_enable = (NETIF_MSG_IFUP << 1 ) - 1; + err = register_netdev(dev); if (err) { @@ -360,8 +429,9 @@ GFP_KERNEL); if(NULL == mii_info) { - printk(KERN_ERR "%s: Could not allocate mii_info\n", - dev->name); + if (netif_msg_ifup(priv)) + printk(KERN_ERR "%s: Could not allocate mii_info\n", + dev->name); return -ENOMEM; } @@ -410,7 +480,8 @@ curphy = get_phy_info(priv->mii_info); if (curphy == NULL) { - printk(KERN_ERR "%s: No PHY found\n", dev->name); + if (netif_msg_ifup(priv)) + printk(KERN_ERR "%s: No PHY found\n", dev->name); err = -1; goto no_phy; } @@ -421,7 +492,7 @@ if(curphy->init) { err = curphy->init(priv->mii_info); - if (err) + if (err) goto phy_init_fail; } @@ -446,14 +517,14 @@ gfar_write(&priv->regs->imask, IMASK_INIT_CLEAR); /* Init hash registers to zero */ - gfar_write(&priv->regs->iaddr0, 0); - gfar_write(&priv->regs->iaddr1, 0); - gfar_write(&priv->regs->iaddr2, 0); - gfar_write(&priv->regs->iaddr3, 0); - gfar_write(&priv->regs->iaddr4, 0); - gfar_write(&priv->regs->iaddr5, 0); - gfar_write(&priv->regs->iaddr6, 0); - gfar_write(&priv->regs->iaddr7, 0); + gfar_write(&priv->regs->igaddr0, 0); + gfar_write(&priv->regs->igaddr1, 0); + gfar_write(&priv->regs->igaddr2, 0); + gfar_write(&priv->regs->igaddr3, 0); + gfar_write(&priv->regs->igaddr4, 0); + gfar_write(&priv->regs->igaddr5, 0); + gfar_write(&priv->regs->igaddr6, 0); + gfar_write(&priv->regs->igaddr7, 0); gfar_write(&priv->regs->gaddr0, 0); gfar_write(&priv->regs->gaddr1, 0); @@ -464,9 +535,6 @@ gfar_write(&priv->regs->gaddr6, 0); gfar_write(&priv->regs->gaddr7, 0); - /* Zero out rctrl */ - gfar_write(&priv->regs->rctrl, 0x00000000); - /* Zero out the rmon mib registers if it has them */ if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_RMON) { memset((void *) &(priv->regs->rmon), 0, @@ -497,20 +565,14 @@ gfar_write(&priv->regs->tbipa, TBIPA_VALUE); } -void stop_gfar(struct net_device *dev) + +/* Halt the receive and transmit queues */ +void gfar_halt(struct net_device *dev) { struct gfar_private *priv = netdev_priv(dev); struct gfar *regs = priv->regs; - unsigned long flags; u32 tempval; - /* Lock it down */ - spin_lock_irqsave(&priv->lock, flags); - - /* Tell the kernel the link is down */ - priv->mii_info->link = 0; - adjust_link(dev); - /* Mask all interrupts */ gfar_write(®s->imask, IMASK_INIT_CLEAR); @@ -533,13 +595,29 @@ tempval = gfar_read(®s->maccfg1); tempval &= ~(MACCFG1_RX_EN | MACCFG1_TX_EN); gfar_write(®s->maccfg1, tempval); +} + +void stop_gfar(struct net_device *dev) +{ + struct gfar_private *priv = netdev_priv(dev); + struct gfar *regs = priv->regs; + unsigned long flags; + + /* Lock it down */ + spin_lock_irqsave(&priv->lock, flags); + + /* Tell the kernel the link is down */ + priv->mii_info->link = 0; + adjust_link(dev); + + gfar_halt(dev); if (priv->einfo->board_flags & FSL_GIANFAR_BRD_HAS_PHY_INTR) { /* Clear any pending interrupts */ mii_clear_phy_interrupt(priv->mii_info); /* Disable PHY Interrupts */ - mii_configure_phy_interrupt(priv->mii_info, + mii_configure_phy_interrupt(priv->mii_info, MII_INTERRUPT_DISABLED); } @@ -566,7 +644,7 @@ sizeof(struct txbd8)*priv->tx_ring_size + sizeof(struct rxbd8)*priv->rx_ring_size, priv->tx_bd_base, - gfar_read(®s->tbase)); + gfar_read(®s->tbase0)); } /* If there are any tx skbs or rx skbs still around, free them. @@ -620,6 +698,34 @@ } } +void gfar_start(struct net_device *dev) +{ + struct gfar_private *priv = netdev_priv(dev); + struct gfar *regs = priv->regs; + u32 tempval; + + /* Enable Rx and Tx in MACCFG1 */ + tempval = gfar_read(®s->maccfg1); + tempval |= (MACCFG1_RX_EN | MACCFG1_TX_EN); + gfar_write(®s->maccfg1, tempval); + + /* Initialize DMACTRL to have WWR and WOP */ + tempval = gfar_read(&priv->regs->dmactrl); + tempval |= DMACTRL_INIT_SETTINGS; + gfar_write(&priv->regs->dmactrl, tempval); + + /* Clear THLT, so that the DMA starts polling now */ + gfar_write(®s->tstat, TSTAT_CLEAR_THALT); + + /* Make sure we aren't stopped */ + tempval = gfar_read(&priv->regs->dmactrl); + tempval &= ~(DMACTRL_GRS | DMACTRL_GTS); + gfar_write(&priv->regs->dmactrl, tempval); + + /* Unmask the interrupts we look for */ + gfar_write(®s->imask, IMASK_DEFAULT); +} + /* Bring the controller up and running */ int startup_gfar(struct net_device *dev) { @@ -630,33 +736,34 @@ int i; struct gfar_private *priv = netdev_priv(dev); struct gfar *regs = priv->regs; - u32 tempval; int err = 0; + u32 rctrl = 0; gfar_write(®s->imask, IMASK_INIT_CLEAR); /* Allocate memory for the buffer descriptors */ - vaddr = (unsigned long) dma_alloc_coherent(NULL, + vaddr = (unsigned long) dma_alloc_coherent(NULL, sizeof (struct txbd8) * priv->tx_ring_size + sizeof (struct rxbd8) * priv->rx_ring_size, &addr, GFP_KERNEL); if (vaddr == 0) { - printk(KERN_ERR "%s: Could not allocate buffer descriptors!\n", - dev->name); + if (netif_msg_ifup(priv)) + printk(KERN_ERR "%s: Could not allocate buffer descriptors!\n", + dev->name); return -ENOMEM; } priv->tx_bd_base = (struct txbd8 *) vaddr; /* enet DMA only understands physical addresses */ - gfar_write(®s->tbase, addr); + gfar_write(®s->tbase0, addr); /* Start the rx descriptor ring where the tx ring leaves off */ addr = addr + sizeof (struct txbd8) * priv->tx_ring_size; vaddr = vaddr + sizeof (struct txbd8) * priv->tx_ring_size; priv->rx_bd_base = (struct rxbd8 *) vaddr; - gfar_write(®s->rbase, addr); + gfar_write(®s->rbase0, addr); /* Setup the skbuff rings */ priv->tx_skbuff = @@ -664,8 +771,9 @@ priv->tx_ring_size, GFP_KERNEL); if (priv->tx_skbuff == NULL) { - printk(KERN_ERR "%s: Could not allocate tx_skbuff\n", - dev->name); + if (netif_msg_ifup(priv)) + printk(KERN_ERR "%s: Could not allocate tx_skbuff\n", + dev->name); err = -ENOMEM; goto tx_skb_fail; } @@ -678,8 +786,9 @@ priv->rx_ring_size, GFP_KERNEL); if (priv->rx_skbuff == NULL) { - printk(KERN_ERR "%s: Could not allocate rx_skbuff\n", - dev->name); + if (netif_msg_ifup(priv)) + printk(KERN_ERR "%s: Could not allocate rx_skbuff\n", + dev->name); err = -ENOMEM; goto rx_skb_fail; } @@ -726,12 +835,13 @@ /* If the device has multiple interrupts, register for * them. Otherwise, only register for the one */ if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_MULTI_INTR) { - /* Install our interrupt handlers for Error, + /* Install our interrupt handlers for Error, * Transmit, and Receive */ if (request_irq(priv->interruptError, gfar_error, 0, "enet_error", dev) < 0) { - printk(KERN_ERR "%s: Can't get IRQ %d\n", - dev->name, priv->interruptError); + if (netif_msg_intr(priv)) + printk(KERN_ERR "%s: Can't get IRQ %d\n", + dev->name, priv->interruptError); err = -1; goto err_irq_fail; @@ -739,8 +849,9 @@ if (request_irq(priv->interruptTransmit, gfar_transmit, 0, "enet_tx", dev) < 0) { - printk(KERN_ERR "%s: Can't get IRQ %d\n", - dev->name, priv->interruptTransmit); + if (netif_msg_intr(priv)) + printk(KERN_ERR "%s: Can't get IRQ %d\n", + dev->name, priv->interruptTransmit); err = -1; @@ -749,8 +860,9 @@ if (request_irq(priv->interruptReceive, gfar_receive, 0, "enet_rx", dev) < 0) { - printk(KERN_ERR "%s: Can't get IRQ %d (receive0)\n", - dev->name, priv->interruptReceive); + if (netif_msg_intr(priv)) + printk(KERN_ERR "%s: Can't get IRQ %d (receive0)\n", + dev->name, priv->interruptReceive); err = -1; goto rx_irq_fail; @@ -758,8 +870,9 @@ } else { if (request_irq(priv->interruptTransmit, gfar_interrupt, 0, "gfar_interrupt", dev) < 0) { - printk(KERN_ERR "%s: Can't get IRQ %d\n", - dev->name, priv->interruptError); + if (netif_msg_intr(priv)) + printk(KERN_ERR "%s: Can't get IRQ %d\n", + dev->name, priv->interruptError); err = -1; goto err_irq_fail; @@ -787,28 +900,22 @@ else gfar_write(®s->rxic, 0); - init_waitqueue_head(&priv->rxcleanupq); + if (priv->rx_csum_enable) + rctrl |= RCTRL_CHECKSUMMING; - /* Enable Rx and Tx in MACCFG1 */ - tempval = gfar_read(®s->maccfg1); - tempval |= (MACCFG1_RX_EN | MACCFG1_TX_EN); - gfar_write(®s->maccfg1, tempval); + if (priv->extended_hash) + rctrl |= RCTRL_EXTHASH; - /* Initialize DMACTRL to have WWR and WOP */ - tempval = gfar_read(&priv->regs->dmactrl); - tempval |= DMACTRL_INIT_SETTINGS; - gfar_write(&priv->regs->dmactrl, tempval); + if (priv->vlan_enable) + rctrl |= RCTRL_VLAN; - /* Clear THLT, so that the DMA starts polling now */ - gfar_write(®s->tstat, TSTAT_CLEAR_THALT); + /* Init rctrl based on our settings */ + gfar_write(&priv->regs->rctrl, rctrl); - /* Make sure we aren't stopped */ - tempval = gfar_read(&priv->regs->dmactrl); - tempval &= ~(DMACTRL_GRS | DMACTRL_GTS); - gfar_write(&priv->regs->dmactrl, tempval); + if (dev->features & NETIF_F_IP_CSUM) + gfar_write(&priv->regs->tctrl, TCTRL_INIT_CSUM); - /* Unmask the interrupts we look for */ - gfar_write(®s->imask, IMASK_DEFAULT); + gfar_start(dev); return 0; @@ -824,7 +931,7 @@ sizeof(struct txbd8)*priv->tx_ring_size + sizeof(struct rxbd8)*priv->rx_ring_size, priv->tx_bd_base, - gfar_read(®s->tbase)); + gfar_read(®s->tbase0)); if (priv->mii_info->phyinfo->close) priv->mii_info->phyinfo->close(priv->mii_info); @@ -857,11 +964,62 @@ return err; } +static struct txfcb *gfar_add_fcb(struct sk_buff *skb, struct txbd8 *bdp) +{ + struct txfcb *fcb = (struct txfcb *)skb_push (skb, GMAC_FCB_LEN); + + memset(fcb, 0, GMAC_FCB_LEN); + + /* Flag the bd so the controller looks for the FCB */ + bdp->status |= TXBD_TOE; + + return fcb; +} + +static inline void gfar_tx_checksum(struct sk_buff *skb, struct txfcb *fcb) +{ + int len; + + /* If we're here, it's a IP packet with a TCP or UDP + * payload. We set it to checksum, using a pseudo-header + * we provide + */ + fcb->ip = 1; + fcb->tup = 1; + fcb->ctu = 1; + fcb->nph = 1; + + /* Notify the controller what the protocol is */ + if (skb->nh.iph->protocol == IPPROTO_UDP) + fcb->udp = 1; + + /* l3os is the distance between the start of the + * frame (skb->data) and the start of the IP hdr. + * l4os is the distance between the start of the + * l3 hdr and the l4 hdr */ + fcb->l3os = (u16)(skb->nh.raw - skb->data - GMAC_FCB_LEN); + fcb->l4os = (u16)(skb->h.raw - skb->nh.raw); + + len = skb->nh.iph->tot_len - fcb->l4os; + + /* Provide the pseudoheader csum */ + fcb->phcs = ~csum_tcpudp_magic(skb->nh.iph->saddr, + skb->nh.iph->daddr, len, + skb->nh.iph->protocol, 0); +} + +void gfar_tx_vlan(struct sk_buff *skb, struct txfcb *fcb) +{ + fcb->vln = 1; + fcb->vlctl = vlan_tx_tag_get(skb); +} + /* This is called by the kernel when a frame is ready for transmission. */ /* It is pointed to by the dev->hard_start_xmit function pointer */ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct gfar_private *priv = netdev_priv(dev); + struct txfcb *fcb = NULL; struct txbd8 *txbdp; /* Update transmit stats */ @@ -876,9 +1034,24 @@ /* Clear all but the WRAP status flags */ txbdp->status &= TXBD_WRAP; + /* Set up checksumming */ + if ((dev->features & NETIF_F_IP_CSUM) + && (CHECKSUM_HW == skb->ip_summed)) { + fcb = gfar_add_fcb(skb, txbdp); + gfar_tx_checksum(skb, fcb); + } + + if (priv->vlan_enable && + unlikely(priv->vlgrp && vlan_tx_tag_present(skb))) { + if (NULL == fcb) + fcb = gfar_add_fcb(skb, txbdp); + + gfar_tx_vlan(skb, fcb); + } + /* Set buffer length and pointer */ txbdp->length = skb->len; - txbdp->bufPtr = dma_map_single(NULL, skb->data, + txbdp->bufPtr = dma_map_single(NULL, skb->data, skb->len, DMA_TO_DEVICE); /* Save the skb pointer so we can free it later */ @@ -972,15 +1145,78 @@ } +/* Enables and disables VLAN insertion/extraction */ +static void gfar_vlan_rx_register(struct net_device *dev, + struct vlan_group *grp) +{ + struct gfar_private *priv = netdev_priv(dev); + unsigned long flags; + u32 tempval; + + spin_lock_irqsave(&priv->lock, flags); + + priv->vlgrp = grp; + + if (grp) { + /* Enable VLAN tag insertion */ + tempval = gfar_read(&priv->regs->tctrl); + tempval |= TCTRL_VLINS; + + gfar_write(&priv->regs->tctrl, tempval); + + /* Enable VLAN tag extraction */ + tempval = gfar_read(&priv->regs->rctrl); + tempval |= RCTRL_VLEX; + gfar_write(&priv->regs->rctrl, tempval); + } else { + /* Disable VLAN tag insertion */ + tempval = gfar_read(&priv->regs->tctrl); + tempval &= ~TCTRL_VLINS; + gfar_write(&priv->regs->tctrl, tempval); + + /* Disable VLAN tag extraction */ + tempval = gfar_read(&priv->regs->rctrl); + tempval &= ~RCTRL_VLEX; + gfar_write(&priv->regs->rctrl, tempval); + } + + spin_unlock_irqrestore(&priv->lock, flags); +} + + +static void gfar_vlan_rx_kill_vid(struct net_device *dev, uint16_t vid) +{ + struct gfar_private *priv = netdev_priv(dev); + unsigned long flags; + + spin_lock_irqsave(&priv->lock, flags); + + if (priv->vlgrp) + priv->vlgrp->vlan_devices[vid] = NULL; + + spin_unlock_irqrestore(&priv->lock, flags); +} + + static int gfar_change_mtu(struct net_device *dev, int new_mtu) { int tempsize, tempval; struct gfar_private *priv = netdev_priv(dev); int oldsize = priv->rx_buffer_size; - int frame_size = new_mtu + 18; + int frame_size = new_mtu + ETH_HLEN; + + if (priv->vlan_enable) + frame_size += VLAN_ETH_HLEN; + + if (gfar_uses_fcb(priv)) + frame_size += GMAC_FCB_LEN; + + frame_size += priv->padding; if ((frame_size < 64) || (frame_size > JUMBO_FRAME_SIZE)) { - printk(KERN_ERR "%s: Invalid MTU setting\n", dev->name); + if (netif_msg_drv(priv)) + printk(KERN_ERR "%s: Invalid MTU setting\n", + dev->name); return -EINVAL; } @@ -1120,7 +1356,7 @@ skb->dev = dev; bdp->bufPtr = dma_map_single(NULL, skb->data, - priv->rx_buffer_size + RXBUF_ALIGNMENT, + priv->rx_buffer_size + RXBUF_ALIGNMENT, DMA_FROM_DEVICE); bdp->length = 0; @@ -1190,11 +1426,10 @@ __netif_rx_schedule(dev); } else { -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: receive called twice (%x)[%x]\n", - dev->name, gfar_read(&priv->regs->ievent), - gfar_read(&priv->regs->imask)); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: receive called twice (%x)[%x]\n", + dev->name, gfar_read(&priv->regs->ievent), + gfar_read(&priv->regs->imask)); } #else @@ -1209,15 +1444,43 @@ else gfar_write(&priv->regs->rxic, 0); - /* Just in case we need to wake the ring param changer */ - priv->rxclean = 1; - spin_unlock(&priv->lock); #endif return IRQ_HANDLED; } +static inline int gfar_rx_vlan(struct sk_buff *skb, + struct vlan_group *vlgrp, unsigned short vlctl) +{ +#ifdef CONFIG_GFAR_NAPI + return vlan_hwaccel_receive_skb(skb, vlgrp, vlctl); +#else + return vlan_hwaccel_rx(skb, vlgrp, vlctl); +#endif +} + +static inline void gfar_rx_checksum(struct sk_buff *skb, struct rxfcb *fcb) +{ + /* If valid headers were found, and valid sums + * were verified, then we tell the kernel that no + * checksumming is necessary. Otherwise, it is */ + if (fcb->cip && !fcb->eip && fcb->ctu && !fcb->etu) + skb->ip_summed = CHECKSUM_UNNECESSARY; + else + skb->ip_summed = CHECKSUM_NONE; +} + + +static inline struct rxfcb *gfar_get_fcb(struct sk_buff *skb) +{ + struct rxfcb *fcb = (struct rxfcb *)skb->data; + + /* Remove the FCB from the skb */ + skb_pull(skb, GMAC_FCB_LEN); + + return fcb; +} /* gfar_process_frame() -- handle one incoming packet if skb * isn't NULL. */ @@ -1225,35 +1488,51 @@ int length) { struct gfar_private *priv = netdev_priv(dev); + struct rxfcb *fcb = NULL; if (skb == NULL) { -#ifdef BRIEF_GFAR_ERRORS - printk(KERN_WARNING "%s: Missing skb!!.\n", - dev->name); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_WARNING "%s: Missing skb!!.\n", dev->name); priv->stats.rx_dropped++; priv->extra_stats.rx_skbmissing++; } else { + int ret; + /* Prep the skb for the packet */ skb_put(skb, length); + /* Grab the FCB if there is one */ + if (gfar_uses_fcb(priv)) + fcb = gfar_get_fcb(skb); + + /* Remove the padded bytes, if there are any */ + if (priv->padding) + skb_pull(skb, priv->padding); + + if (priv->rx_csum_enable) + gfar_rx_checksum(skb, fcb); + /* Tell the skb what kind of packet this is */ skb->protocol = eth_type_trans(skb, dev); /* Send the packet up the stack */ - if (RECEIVE(skb) == NET_RX_DROP) { + if (unlikely(priv->vlgrp && fcb->vln)) + ret = gfar_rx_vlan(skb, priv->vlgrp, fcb->vlctl); + else + ret = RECEIVE(skb); + + if (NET_RX_DROP == ret) priv->extra_stats.kernel_dropped++; - } } return 0; } /* gfar_clean_rx_ring() -- Processes each frame in the rx ring - * until the budget/quota has been reached. Returns the number + * until the budget/quota has been reached. Returns the number * of frames handled */ -static int gfar_clean_rx_ring(struct net_device *dev, int rx_work_limit) +int gfar_clean_rx_ring(struct net_device *dev, int rx_work_limit) { struct rxbd8 *bdp; struct sk_buff *skb; @@ -1355,9 +1634,6 @@ mk_ic_value(priv->rxcount, priv->rxtime)); else gfar_write(&priv->regs->rxic, 0); - - /* Signal to the ring size changer that it's safe to go */ - priv->rxclean = 1; } return (rx_work_limit < 0) ? 1 : 0; @@ -1393,10 +1669,8 @@ if (events & IEVENT_CRL) priv->stats.tx_aborted_errors++; if (events & IEVENT_XFUN) { -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_WARNING "%s: tx underrun. dropped packet\n", - dev->name); -#endif + if (netif_msg_tx_err(priv)) + printk(KERN_WARNING "%s: tx underrun. dropped packet\n", dev->name); priv->stats.tx_dropped++; priv->extra_stats.tx_underrun++; @@ -1415,36 +1689,30 @@ gfar_write(&priv->regs->rstat, RSTAT_CLEAR_RHALT); #endif -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: busy error (rhalt: %x)\n", dev->name, - gfar_read(&priv->regs->rstat)); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: busy error (rhalt: %x)\n", + dev->name, + gfar_read(&priv->regs->rstat)); } if (events & IEVENT_BABR) { priv->stats.rx_errors++; priv->extra_stats.rx_babr++; -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: babbling error\n", dev->name); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: babbling error\n", dev->name); } if (events & IEVENT_EBERR) { priv->extra_stats.eberr++; -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: EBERR\n", dev->name); -#endif - } - if (events & IEVENT_RXC) { -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: control frame\n", dev->name); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: EBERR\n", dev->name); } + if ((events & IEVENT_RXC) && (netif_msg_rx_err(priv))) + printk(KERN_DEBUG "%s: control frame\n", dev->name); if (events & IEVENT_BABT) { priv->extra_stats.tx_babt++; -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: babt error\n", dev->name); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: babt error\n", dev->name); } return IRQ_HANDLED; @@ -1510,7 +1778,7 @@ * If, after GFAR_AN_TIMEOUT seconds, it has not * finished, we switch to forced. * Either way, once the process has completed, we either - * request the interrupt, or switch the timer over to + * request the interrupt, or switch the timer over to * using gfar_phy_timer to check status */ static void gfar_phy_startup_timer(unsigned long data) { @@ -1535,8 +1803,9 @@ /* Forcing failed! Give up */ if(result) { - printk(KERN_ERR "%s: Forcing failed!\n", - mii_info->dev->name); + if (netif_msg_link(priv)) + printk(KERN_ERR "%s: Forcing failed!\n", + mii_info->dev->name); return; } } @@ -1546,16 +1815,17 @@ /* Grab the PHY interrupt, if necessary/possible */ if (priv->einfo->board_flags & FSL_GIANFAR_BRD_HAS_PHY_INTR) { - if (request_irq(priv->einfo->interruptPHY, + if (request_irq(priv->einfo->interruptPHY, phy_interrupt, - SA_SHIRQ, - "phy_interrupt", + SA_SHIRQ, + "phy_interrupt", mii_info->dev) < 0) { - printk(KERN_ERR "%s: Can't get IRQ %d (PHY)\n", - mii_info->dev->name, + if (netif_msg_intr(priv)) + printk(KERN_ERR "%s: Can't get IRQ %d (PHY)\n", + mii_info->dev->name, priv->einfo->interruptPHY); } else { - mii_configure_phy_interrupt(priv->mii_info, + mii_configure_phy_interrupt(priv->mii_info, MII_INTERRUPT_ENABLED); return; } @@ -1592,15 +1862,17 @@ tempval &= ~(MACCFG2_FULL_DUPLEX); gfar_write(®s->maccfg2, tempval); - printk(KERN_INFO "%s: Half Duplex\n", - dev->name); + if (netif_msg_link(priv)) + printk(KERN_INFO "%s: Half Duplex\n", + dev->name); } else { tempval = gfar_read(®s->maccfg2); tempval |= MACCFG2_FULL_DUPLEX; gfar_write(®s->maccfg2, tempval); - printk(KERN_INFO "%s: Full Duplex\n", - dev->name); + if (netif_msg_link(priv)) + printk(KERN_INFO "%s: Full Duplex\n", + dev->name); } priv->oldduplex = mii_info->duplex; @@ -1622,27 +1894,32 @@ gfar_write(®s->maccfg2, tempval); break; default: - printk(KERN_WARNING - "%s: Ack! Speed (%d) is not 10/100/1000!\n", - dev->name, mii_info->speed); + if (netif_msg_link(priv)) + printk(KERN_WARNING + "%s: Ack! Speed (%d) is not 10/100/1000!\n", + dev->name, mii_info->speed); break; } - printk(KERN_INFO "%s: Speed %dBT\n", dev->name, - mii_info->speed); + if (netif_msg_link(priv)) + printk(KERN_INFO "%s: Speed %dBT\n", dev->name, + mii_info->speed); priv->oldspeed = mii_info->speed; } if (!priv->oldlink) { - printk(KERN_INFO "%s: Link is up\n", dev->name); + if (netif_msg_link(priv)) + printk(KERN_INFO "%s: Link is up\n", dev->name); priv->oldlink = 1; netif_carrier_on(dev); netif_schedule(dev); } } else { if (priv->oldlink) { - printk(KERN_INFO "%s: Link is down\n", dev->name); + if (netif_msg_link(priv)) + printk(KERN_INFO "%s: Link is down\n", + dev->name); priv->oldlink = 0; priv->oldspeed = 0; priv->oldduplex = -1; @@ -1664,8 +1941,9 @@ u32 tempval; if(dev->flags & IFF_PROMISC) { - printk(KERN_INFO "%s: Entering promiscuous mode.\n", - dev->name); + if (netif_msg_drv(priv)) + printk(KERN_INFO "%s: Entering promiscuous mode.\n", + dev->name); /* Set RCTRL to PROM */ tempval = gfar_read(®s->rctrl); tempval |= RCTRL_PROM; @@ -1679,6 +1957,14 @@ if(dev->flags & IFF_ALLMULTI) { /* Set the hash to rx all multicast frames */ + gfar_write(®s->igaddr0, 0xffffffff); + gfar_write(®s->igaddr1, 0xffffffff); + gfar_write(®s->igaddr2, 0xffffffff); + gfar_write(®s->igaddr3, 0xffffffff); + gfar_write(®s->igaddr4, 0xffffffff); + gfar_write(®s->igaddr5, 0xffffffff); + gfar_write(®s->igaddr6, 0xffffffff); + gfar_write(®s->igaddr7, 0xffffffff); gfar_write(®s->gaddr0, 0xffffffff); gfar_write(®s->gaddr1, 0xffffffff); gfar_write(®s->gaddr2, 0xffffffff); @@ -1689,6 +1975,14 @@ gfar_write(®s->gaddr7, 0xffffffff); } else { /* zero out the hash */ + gfar_write(®s->igaddr0, 0x0); + gfar_write(®s->igaddr1, 0x0); + gfar_write(®s->igaddr2, 0x0); + gfar_write(®s->igaddr3, 0x0); + gfar_write(®s->igaddr4, 0x0); + gfar_write(®s->igaddr5, 0x0); + gfar_write(®s->igaddr6, 0x0); + gfar_write(®s->igaddr7, 0x0); gfar_write(®s->gaddr0, 0x0); gfar_write(®s->gaddr1, 0x0); gfar_write(®s->gaddr2, 0x0); @@ -1727,16 +2021,15 @@ { u32 tempval; struct gfar_private *priv = netdev_priv(dev); - struct gfar *regs = priv->regs; - u32 *hash = ®s->gaddr0; u32 result = ether_crc(MAC_ADDR_LEN, addr); - u8 whichreg = ((result >> 29) & 0x7); - u8 whichbit = ((result >> 24) & 0x1f); + int width = priv->hash_width; + u8 whichbit = (result >> (32 - width)) & 0x1f; + u8 whichreg = result >> (32 - width + 5); u32 value = (1 << (31-whichbit)); - tempval = gfar_read(&hash[whichreg]); + tempval = gfar_read(priv->hash_regs[whichreg]); tempval |= value; - gfar_write(&hash[whichreg], tempval); + gfar_write(priv->hash_regs[whichreg], tempval); return; } @@ -1754,10 +2047,9 @@ gfar_write(&priv->regs->ievent, IEVENT_ERR_MASK); /* Hmm... */ -#if defined (BRIEF_GFAR_ERRORS) || defined (VERBOSE_GFAR_ERRORS) - printk(KERN_DEBUG "%s: error interrupt (ievent=0x%08x imask=0x%08x)\n", - dev->name, events, gfar_read(&priv->regs->imask)); -#endif + if (netif_msg_rx_err(priv) || netif_msg_tx_err(priv)) + printk(KERN_DEBUG "%s: error interrupt (ievent=0x%08x imask=0x%08x)\n", + dev->name, events, gfar_read(&priv->regs->imask)); /* Update the error counters */ if (events & IEVENT_TXE) { @@ -1768,19 +2060,17 @@ if (events & IEVENT_CRL) priv->stats.tx_aborted_errors++; if (events & IEVENT_XFUN) { -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: underrun. packet dropped.\n", - dev->name); -#endif + if (netif_msg_tx_err(priv)) + printk(KERN_DEBUG "%s: underrun. packet dropped.\n", + dev->name); priv->stats.tx_dropped++; priv->extra_stats.tx_underrun++; /* Reactivate the Tx Queues */ gfar_write(&priv->regs->tstat, TSTAT_CLEAR_THALT); } -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: Transmit Error\n", dev->name); -#endif + if (netif_msg_tx_err(priv)) + printk(KERN_DEBUG "%s: Transmit Error\n", dev->name); } if (events & IEVENT_BSY) { priv->stats.rx_errors++; @@ -1793,35 +2083,31 @@ gfar_write(&priv->regs->rstat, RSTAT_CLEAR_RHALT); #endif -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: busy error (rhalt: %x)\n", dev->name, - gfar_read(&priv->regs->rstat)); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: busy error (rhalt: %x)\n", + dev->name, + gfar_read(&priv->regs->rstat)); } if (events & IEVENT_BABR) { priv->stats.rx_errors++; priv->extra_stats.rx_babr++; -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: babbling error\n", dev->name); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: babbling error\n", dev->name); } if (events & IEVENT_EBERR) { priv->extra_stats.eberr++; -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: EBERR\n", dev->name); -#endif + if (netif_msg_rx_err(priv)) + printk(KERN_DEBUG "%s: EBERR\n", dev->name); } - if (events & IEVENT_RXC) -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: control frame\n", dev->name); -#endif + if ((events & IEVENT_RXC) && netif_msg_rx_status(priv)) + if (netif_msg_rx_status(priv)) + printk(KERN_DEBUG "%s: control frame\n", dev->name); if (events & IEVENT_BABT) { priv->extra_stats.tx_babt++; -#ifdef VERBOSE_GFAR_ERRORS - printk(KERN_DEBUG "%s: babt error\n", dev->name); -#endif + if (netif_msg_tx_err(priv)) + printk(KERN_DEBUG "%s: babt error\n", dev->name); } return IRQ_HANDLED; } Index: drivers/net/gianfar_ethtool.c =================================================================== RCS file: /proj/ppc/sysperf/cvsroot/cvs_root/pq38/linux-2.6/drivers/net/gianfar_ethtool.c,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -u -r1.1.1.1 -r1.2 --- drivers/net/gianfar_ethtool.c 21 Apr 2005 00:03:14 -0000 1.1.1.1 +++ drivers/net/gianfar_ethtool.c 7 Jun 2005 23:02:37 -0000 1.2 @@ -46,16 +46,18 @@ extern int startup_gfar(struct net_device *dev); extern void stop_gfar(struct net_device *dev); -extern void gfar_receive(int irq, void *dev_id, struct pt_regs *regs); +extern void gfar_halt(struct net_device *dev); +extern void gfar_start(struct net_device *dev); +extern int gfar_clean_rx_ring(struct net_device *dev, int rx_work_limit); -void gfar_fill_stats(struct net_device *dev, struct ethtool_stats *dummy, +static void gfar_fill_stats(struct net_device *dev, struct ethtool_stats *dummy, u64 * buf); -void gfar_gstrings(struct net_device *dev, u32 stringset, u8 * buf); -int gfar_gcoalesce(struct net_device *dev, struct ethtool_coalesce *cvals); -int gfar_scoalesce(struct net_device *dev, struct ethtool_coalesce *cvals); -void gfar_gringparam(struct net_device *dev, struct ethtool_ringparam *rvals); -int gfar_sringparam(struct net_device *dev, struct ethtool_ringparam *rvals); -void gfar_gdrvinfo(struct net_device *dev, struct ethtool_drvinfo *drvinfo); +static void gfar_gstrings(struct net_device *dev, u32 stringset, u8 * buf); +static int gfar_gcoalesce(struct net_device *dev, struct ethtool_coalesce *cvals); +static int gfar_scoalesce(struct net_device *dev, struct ethtool_coalesce *cvals); +static void gfar_gringparam(struct net_device *dev, struct ethtool_ringparam *rvals); +static int gfar_sringparam(struct net_device *dev, struct ethtool_ringparam *rvals); +static void gfar_gdrvinfo(struct net_device *dev, struct ethtool_drvinfo *drvinfo); static char stat_gstrings[][ETH_GSTRING_LEN] = { "rx-dropped-by-kernel", @@ -118,57 +120,56 @@ "tx-fragmented-frames", }; +/* Fill in a buffer with the strings which correspond to the + * stats */ +static void gfar_gstrings(struct net_device *dev, u32 stringset, u8 * buf) +{ + struct gfar_private *priv = netdev_priv(dev); + + if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_RMON) + memcpy(buf, stat_gstrings, GFAR_STATS_LEN * ETH_GSTRING_LEN); + else + memcpy(buf, stat_gstrings, + GFAR_EXTRA_STATS_LEN * ETH_GSTRING_LEN); +} + /* Fill in an array of 64-bit statistics from various sources. * This array will be appended to the end of the ethtool_stats * structure, and returned to user space */ -void gfar_fill_stats(struct net_device *dev, struct ethtool_stats *dummy, u64 * buf) +static void gfar_fill_stats(struct net_device *dev, struct ethtool_stats *dummy, u64 * buf) { int i; struct gfar_private *priv = netdev_priv(dev); - u32 *rmon = (u32 *) & priv->regs->rmon; u64 *extra = (u64 *) & priv->extra_stats; - struct gfar_stats *stats = (struct gfar_stats *) buf; - for (i = 0; i < GFAR_RMON_LEN; i++) { - stats->rmon[i] = (u64) (rmon[i]); - } - - for (i = 0; i < GFAR_EXTRA_STATS_LEN; i++) { - stats->extra[i] = extra[i]; - } + if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_RMON) { + u32 *rmon = (u32 *) & priv->regs->rmon; + struct gfar_stats *stats = (struct gfar_stats *) buf; + + for (i = 0; i < GFAR_RMON_LEN; i++) + stats->rmon[i] = (u64) (rmon[i]); + + for (i = 0; i < GFAR_EXTRA_STATS_LEN; i++) + stats->extra[i] = extra[i]; + } else + for (i = 0; i < GFAR_EXTRA_STATS_LEN; i++) + buf[i] = extra[i]; } /* Returns the number of stats (and their corresponding strings) */ -int gfar_stats_count(struct net_device *dev) -{ - return GFAR_STATS_LEN; -} - -void gfar_gstrings_normon(struct net_device *dev, u32 stringset, u8 * buf) -{ - memcpy(buf, stat_gstrings, GFAR_EXTRA_STATS_LEN * ETH_GSTRING_LEN); -} - -void gfar_fill_stats_normon(struct net_device *dev, - struct ethtool_stats *dummy, u64 * buf) +static int gfar_stats_count(struct net_device *dev) { - int i; struct gfar_private *priv = netdev_priv(dev); - u64 *extra = (u64 *) & priv->extra_stats; - for (i = 0; i < GFAR_EXTRA_STATS_LEN; i++) { - buf[i] = extra[i]; - } + if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_RMON) + return GFAR_STATS_LEN; + else + return GFAR_EXTRA_STATS_LEN; } - -int gfar_stats_count_normon(struct net_device *dev) -{ - return GFAR_EXTRA_STATS_LEN; -} /* Fills in the drvinfo structure with some basic info */ -void gfar_gdrvinfo(struct net_device *dev, struct +static void gfar_gdrvinfo(struct net_device *dev, struct ethtool_drvinfo *drvinfo) { strncpy(drvinfo->driver, DRV_NAME, GFAR_INFOSTR_LEN); @@ -182,7 +183,7 @@ } /* Return the current settings in the ethtool_cmd structure */ -int gfar_gsettings(struct net_device *dev, struct ethtool_cmd *cmd) +static int gfar_gsettings(struct net_device *dev, struct ethtool_cmd *cmd) { struct gfar_private *priv = netdev_priv(dev); uint gigabit_support = @@ -216,13 +217,13 @@ } /* Return the length of the register structure */ -int gfar_reglen(struct net_device *dev) +static int gfar_reglen(struct net_device *dev) { return sizeof (struct gfar); } /* Return a dump of the GFAR register space */ -void gfar_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *regbuf) +static void gfar_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *regbuf) { int i; struct gfar_private *priv = netdev_priv(dev); @@ -233,13 +234,6 @@ buf[i] = theregs[i]; } -/* Fill in a buffer with the strings which correspond to the - * stats */ -void gfar_gstrings(struct net_device *dev, u32 stringset, u8 * buf) -{ - memcpy(buf, stat_gstrings, GFAR_STATS_LEN * ETH_GSTRING_LEN); -} - /* Convert microseconds to ethernet clock ticks, which changes * depending on what speed the controller is running at */ static unsigned int gfar_usecs2ticks(struct gfar_private *priv, unsigned int usecs) @@ -291,9 +285,12 @@ /* Get the coalescing parameters, and put them in the cvals * structure. */ -int gfar_gcoalesce(struct net_device *dev, struct ethtool_coalesce *cvals) +static int gfar_gcoalesce(struct net_device *dev, struct ethtool_coalesce *cvals) { struct gfar_private *priv = netdev_priv(dev); + + if (!(priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_COALESCE)) + return -EOPNOTSUPP; cvals->rx_coalesce_usecs = gfar_ticks2usecs(priv, priv->rxtime); cvals->rx_max_coalesced_frames = priv->rxcount; @@ -337,10 +334,13 @@ * Both cvals->*_usecs and cvals->*_frames have to be > 0 * in order for coalescing to be active */ -int gfar_scoalesce(struct net_device *dev, struct ethtool_coalesce *cvals) +static int gfar_scoalesce(struct net_device *dev, struct ethtool_coalesce *cvals) { struct gfar_private *priv = netdev_priv(dev); + if (!(priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_COALESCE)) + return -EOPNOTSUPP; + /* Set up rx coalescing */ if ((cvals->rx_coalesce_usecs == 0) || (cvals->rx_max_coalesced_frames == 0)) @@ -379,7 +379,7 @@ /* Fills in rvals with the current ring parameters. Currently, * rx, rx_mini, and rx_jumbo rings are the same size, as mini and * jumbo are ignored by the driver */ -void gfar_gringparam(struct net_device *dev, struct ethtool_ringparam *rvals) +static void gfar_gringparam(struct net_device *dev, struct ethtool_ringparam *rvals) { struct gfar_private *priv = netdev_priv(dev); @@ -401,9 +401,8 @@ * necessary so that we don't mess things up while we're in * motion. We wait for the ring to be clean before reallocating * the rings. */ -int gfar_sringparam(struct net_device *dev, struct ethtool_ringparam *rvals) +static int gfar_sringparam(struct net_device *dev, struct ethtool_ringparam *rvals) { - u32 tempval; struct gfar_private *priv = netdev_priv(dev); int err = 0; @@ -425,37 +424,54 @@ return -EINVAL; } - /* Stop the controller so we don't rx any more frames */ - /* But first, make sure we clear the bits */ - tempval = gfar_read(&priv->regs->dmactrl); - tempval &= ~(DMACTRL_GRS | DMACTRL_GTS); - gfar_write(&priv->regs->dmactrl, tempval); - - tempval = gfar_read(&priv->regs->dmactrl); - tempval |= (DMACTRL_GRS | DMACTRL_GTS); - gfar_write(&priv->regs->dmactrl, tempval); + if (dev->flags & IFF_UP) { + unsigned long flags; - while (!(gfar_read(&priv->regs->ievent) & (IEVENT_GRSC | IEVENT_GTSC))) - cpu_relax(); + /* Halt TX and RX, and process the frames which + * have already been received */ + spin_lock_irqsave(&priv->lock, flags); + gfar_halt(dev); + gfar_clean_rx_ring(dev, priv->rx_ring_size); + spin_unlock_irqrestore(&priv->lock, flags); - /* Note that rx is not clean right now */ - priv->rxclean = 0; + /* Now we take down the rings to rebuild them */ + stop_gfar(dev); + } - if (dev->flags & IFF_UP) { - /* Tell the driver to process the rest of the frames */ - gfar_receive(0, (void *) dev, NULL); + /* Change the size */ + priv->rx_ring_size = rvals->rx_pending; + priv->tx_ring_size = rvals->tx_pending; + + /* Rebuild the rings with the new size */ + if (dev->flags & IFF_UP) + err = startup_gfar(dev); + + return err; +} + +static int gfar_set_rx_csum(struct net_device *dev, uint32_t data) +{ + struct gfar_private *priv = netdev_priv(dev); + int err = 0; - /* Now wait for it to be done */ - wait_event_interruptible(priv->rxcleanupq, priv->rxclean); + if (!(priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_CSUM)) + return -EOPNOTSUPP; - /* Ok, all packets have been handled. Now we bring it down, - * change the ring size, and bring it up */ + if (dev->flags & IFF_UP) { + unsigned long flags; + /* Halt TX and RX, and process the frames which + * have already been received */ + spin_lock_irqsave(&priv->lock, flags); + gfar_halt(dev); + gfar_clean_rx_ring(dev, priv->rx_ring_size); + spin_unlock_irqrestore(&priv->lock, flags); + + /* Now we take down the rings to rebuild them */ stop_gfar(dev); } - priv->rx_ring_size = rvals->rx_pending; - priv->tx_ring_size = rvals->tx_pending; + priv->rx_csum_enable = data; if (dev->flags & IFF_UP) err = startup_gfar(dev); @@ -463,6 +479,61 @@ return err; } +static uint32_t gfar_get_rx_csum(struct net_device *dev) +{ + struct gfar_private *priv = netdev_priv(dev); + + if (!(priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_CSUM)) + return 0; + + return priv->rx_csum_enable; +} + +static int gfar_set_tx_csum(struct net_device *dev, uint32_t data) +{ + unsigned long flags; + struct gfar_private *priv = netdev_priv(dev); + + if (!(priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_CSUM)) + return -EOPNOTSUPP; + + spin_lock_irqsave(&priv->lock, flags); + gfar_halt(dev); + + if (data) + dev->features |= NETIF_F_IP_CSUM; + else + dev->features &= ~NETIF_F_IP_CSUM; + + gfar_start(dev); + spin_unlock_irqrestore(&priv->lock, flags); + + return 0; +} + +static uint32_t gfar_get_tx_csum(struct net_device *dev) +{ + struct gfar_private *priv = netdev_priv(dev); + + if (!(priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_CSUM)) + return 0; + + return (dev->features & NETIF_F_IP_CSUM) != 0; +} + +static uint32_t gfar_get_msglevel(struct net_device *dev) +{ + struct gfar_private *priv = netdev_priv(dev); + return priv->msg_enable; +} + +static void gfar_set_msglevel(struct net_device *dev, uint32_t data) +{ + struct gfar_private *priv = netdev_priv(dev); + priv->msg_enable = data; +} + + struct ethtool_ops gfar_ethtool_ops = { .get_settings = gfar_gsettings, .get_drvinfo = gfar_gdrvinfo, @@ -476,52 +547,10 @@ .get_strings = gfar_gstrings, .get_stats_count = gfar_stats_count, .get_ethtool_stats = gfar_fill_stats, -}; - -struct ethtool_ops gfar_normon_nocoalesce_ethtool_ops = { - .get_settings = gfar_gsettings, - .get_drvinfo = gfar_gdrvinfo, - .get_regs_len = gfar_reglen, - .get_regs = gfar_get_regs, - .get_link = ethtool_op_get_link, - .get_ringparam = gfar_gringparam, - .set_ringparam = gfar_sringparam, - .get_strings = gfar_gstrings_normon, - .get_stats_count = gfar_stats_count_normon, - .get_ethtool_stats = gfar_fill_stats_normon, -}; - -struct ethtool_ops gfar_nocoalesce_ethtool_ops = { - .get_settings = gfar_gsettings, - .get_drvinfo = gfar_gdrvinfo, - .get_regs_len = gfar_reglen, - .get_regs = gfar_get_regs, - .get_link = ethtool_op_get_link, - .get_ringparam = gfar_gringparam, - .set_ringparam = gfar_sringparam, - .get_strings = gfar_gstrings, - .get_stats_count = gfar_stats_count, - .get_ethtool_stats = gfar_fill_stats, -}; - -struct ethtool_ops gfar_normon_ethtool_ops = { - .get_settings = gfar_gsettings, - .get_drvinfo = gfar_gdrvinfo, - .get_regs_len = gfar_reglen, - .get_regs = gfar_get_regs, - .get_link = ethtool_op_get_link, - .get_coalesce = gfar_gcoalesce, - .set_coalesce = gfar_scoalesce, - .get_ringparam = gfar_gringparam, - .set_ringparam = gfar_sringparam, - .get_strings = gfar_gstrings_normon, - .get_stats_count = gfar_stats_count_normon, - .get_ethtool_stats = gfar_fill_stats_normon, -}; - -struct ethtool_ops *gfar_op_array[] = { - &gfar_ethtool_ops, - &gfar_normon_ethtool_ops, - &gfar_nocoalesce_ethtool_ops, - &gfar_normon_nocoalesce_ethtool_ops + .get_rx_csum = gfar_get_rx_csum, + .get_tx_csum = gfar_get_tx_csum, + .set_rx_csum = gfar_set_rx_csum, + .set_tx_csum = gfar_set_tx_csum, + .get_msglevel = gfar_get_msglevel, + .set_msglevel = gfar_set_msglevel, }; Index: drivers/net/gianfar.h =================================================================== RCS file: /proj/ppc/sysperf/cvsroot/cvs_root/pq38/linux-2.6/drivers/net/gianfar.h,v retrieving revision 1.1.1.1 retrieving revision 1.3 diff -u -r1.1.1.1 -r1.3 --- drivers/net/gianfar.h 21 Apr 2005 00:03:14 -0000 1.1.1.1 +++ drivers/net/gianfar.h 7 Jun 2005 23:03:30 -0000 1.3 @@ -1,4 +1,4 @@ -/* +/* * drivers/net/gianfar.h * * Gianfar Ethernet Driver @@ -53,6 +53,12 @@ /* The maximum number of packets to be handled in one call of gfar_poll */ #define GFAR_DEV_WEIGHT 64 +/* Length for FCB */ +#define GMAC_FCB_LEN 8 + +/* Default padding amount */ +#define DEFAULT_PADDING 2 + /* Number of bytes to align the rx bufs to */ #define RXBUF_ALIGNMENT 64 @@ -91,7 +97,7 @@ #define JUMBO_FRAME_SIZE 9600 /* Latency of interface clock in nanoseconds */ -/* Interface clock latency , in this case, means the +/* Interface clock latency , in this case, means the * time described by a value of 1 in the interrupt * coalescing registers' time fields. Since those fields * refer to the time it takes for 64 clocks to pass, the @@ -166,9 +172,28 @@ mk_ic_icft(count) | \ mk_ic_ictt(time)) +#define RCTRL_PAL_MASK 0x001f0000 +#define RCTRL_VLEX 0x00002000 +#define RCTRL_FILREN 0x00001000 +#define RCTRL_GHTX 0x00000400 +#define RCTRL_IPCSEN 0x00000200 +#define RCTRL_TUCSEN 0x00000100 +#define RCTRL_PRSDEP_MASK 0x000000c0 +#define RCTRL_PRSDEP_INIT 0x000000c0 #define RCTRL_PROM 0x00000008 +#define RCTRL_CHECKSUMMING (RCTRL_IPCSEN \ + | RCTRL_TUCSEN | RCTRL_PRSDEP_INIT) +#define RCTRL_EXTHASH (RCTRL_GHTX) +#define RCTRL_VLAN (RCTRL_PRSDEP_INIT) + + #define RSTAT_CLEAR_RHALT 0x00800000 +#define TCTRL_IPCSEN 0x00004000 +#define TCTRL_TUCSEN 0x00002000 +#define TCTRL_VLINS 0x00001000 +#define TCTRL_INIT_CSUM (TCTRL_TUCSEN | TCTRL_IPCSEN) + #define IEVENT_INIT_CLEAR 0xffffffff #define IEVENT_BABR 0x80000000 #define IEVENT_RXC 0x40000000 @@ -187,12 +212,16 @@ #define IEVENT_RXB0 0x00008000 #define IEVENT_GRSC 0x00000100 #define IEVENT_RXF0 0x00000080 +#define IEVENT_FIR 0x00000008 +#define IEVENT_FIQ 0x00000004 +#define IEVENT_DPE 0x00000002 +#define IEVENT_PERR 0x00000001 #define IEVENT_RX_MASK (IEVENT_RXB0 | IEVENT_RXF0) #define IEVENT_TX_MASK (IEVENT_TXB | IEVENT_TXF) #define IEVENT_ERR_MASK \ (IEVENT_RXC | IEVENT_BSY | IEVENT_EBERR | IEVENT_MSRO | \ IEVENT_BABT | IEVENT_TXC | IEVENT_TXE | IEVENT_LC \ - | IEVENT_CRL | IEVENT_XFUN) + | IEVENT_CRL | IEVENT_XFUN | IEVENT_DPE | IEVENT_PERR) #define IMASK_INIT_CLEAR 0x00000000 #define IMASK_BABR 0x80000000 @@ -212,10 +241,15 @@ #define IMASK_RXB0 0x00008000 #define IMASK_GTSC 0x00000100 #define IMASK_RXFEN0 0x00000080 +#define IMASK_FIR 0x00000008 +#define IMASK_FIQ 0x00000004 +#define IMASK_DPE 0x00000002 +#define IMASK_PERR 0x00000001 #define IMASK_RX_DISABLED ~(IMASK_RXFEN0 | IMASK_BSY) #define IMASK_DEFAULT (IMASK_TXEEN | IMASK_TXFEN | IMASK_TXBEN | \ IMASK_RXFEN0 | IMASK_BSY | IMASK_EBERR | IMASK_BABR | \ - IMASK_XFUN | IMASK_RXC | IMASK_BABT) + IMASK_XFUN | IMASK_RXC | IMASK_BABT | IMASK_DPE \ + | IMASK_PERR) /* Attribute fields */ @@ -254,6 +288,18 @@ #define TXBD_RETRYLIMIT 0x0040 #define TXBD_RETRYCOUNTMASK 0x003c #define TXBD_UNDERRUN 0x0002 +#define TXBD_TOE 0x0002 + +/* Tx FCB param bits */ +#define TXFCB_VLN 0x80 +#define TXFCB_IP 0x40 +#define TXFCB_IP6 0x20 +#define TXFCB_TUP 0x10 +#define TXFCB_UDP 0x08 +#define TXFCB_CIP 0x04 +#define TXFCB_CTU 0x02 +#define TXFCB_NPH 0x01 +#define TXFCB_DEFAULT (TXFCB_IP|TXFCB_TUP|TXFCB_CTU|TXFCB_NPH) /* RxBD status field bits */ #define RXBD_EMPTY 0x8000 @@ -273,6 +319,18 @@ #define RXBD_TRUNCATED 0x0001 #define RXBD_STATS 0x01ff +/* Rx FCB status field bits */ +#define RXFCB_VLN 0x8000 +#define RXFCB_IP 0x4000 +#define RXFCB_IP6 0x2000 +#define RXFCB_TUP 0x1000 +#define RXFCB_CIP 0x0800 +#define RXFCB_CTU 0x0400 +#define RXFCB_EIP 0x0200 +#define RXFCB_ETU 0x0100 +#define RXFCB_PERR_MASK 0x000c +#define RXFCB_PERR_BADL3 0x0008 + struct txbd8 { u16 status; /* Status Fields */ @@ -280,6 +338,22 @@ u32 bufPtr; /* Buffer Pointer */ }; +struct txfcb { + u8 vln:1, + ip:1, + ip6:1, + tup:1, + udp:1, + cip:1, + ctu:1, + nph:1; + u8 reserved; + u8 l4os; /* Level 4 Header Offset */ + u8 l3os; /* Level 3 Header Offset */ + u16 phcs; /* Pseudo-header Checksum */ + u16 vlctl; /* VLAN control word */ +}; + struct rxbd8 { u16 status; /* Status Fields */ @@ -287,6 +361,21 @@ u32 bufPtr; /* Buffer Pointer */ }; +struct rxfcb { + u16 vln:1, + ip:1, + ip6:1, + tup:1, + cip:1, + ctu:1, + eip:1, + etu:1; + u8 rq; /* Receive Queue index */ + u8 pro; /* Layer 4 Protocol */ + u16 reserved; + u16 vlctl; /* VLAN control word */ +}; + struct rmon_mib { u32 tr64; /* 0x.680 - Transmit and Receive 64-byte Frame Counter */ @@ -371,90 +460,191 @@ struct gfar { - u8 res1[16]; - u32 ievent; /* 0x.010 - Interrupt Event Register */ - u32 imask; /* 0x.014 - Interrupt Mask Register */ - u32 edis; /* 0x.018 - Error Disabled Register */ + u32 tsec_id; /* 0x.000 - Controller ID register */ + u8 res1[12]; + u32 ievent; /* 0x.010 - Interrupt Event Register */ + u32 imask; /* 0x.014 - Interrupt Mask Register */ + u32 edis; /* 0x.018 - Error Disabled Register */ u8 res2[4]; - u32 ecntrl; /* 0x.020 - Ethernet Control Register */ - u32 minflr; /* 0x.024 - Minimum Frame Length Register */ - u32 ptv; /* 0x.028 - Pause Time Value Register */ - u32 dmactrl; /* 0x.02c - DMA Control Register */ - u32 tbipa; /* 0x.030 - TBI PHY Address Register */ + u32 ecntrl; /* 0x.020 - Ethernet Control Register */ + u32 minflr; /* 0x.024 - Minimum Frame Length Register */ + u32 ptv; /* 0x.028 - Pause Time Value Register */ + u32 dmactrl; /* 0x.02c - DMA Control Register */ + u32 tbipa; /* 0x.030 - TBI PHY Address Register */ u8 res3[88]; - u32 fifo_tx_thr; /* 0x.08c - FIFO transmit threshold register */ + u32 fifo_tx_thr; /* 0x.08c - FIFO transmit threshold register */ u8 res4[8]; - u32 fifo_tx_starve; /* 0x.098 - FIFO transmit starve register */ + u32 fifo_tx_starve; /* 0x.098 - FIFO transmit starve register */ u32 fifo_tx_starve_shutoff; /* 0x.09c - FIFO transmit starve shutoff register */ - u8 res5[96]; - u32 tctrl; /* 0x.100 - Transmit Control Register */ - u32 tstat; /* 0x.104 - Transmit Status Register */ - u8 res6[4]; - u32 tbdlen; /* 0x.10c - Transmit Buffer Descriptor Data Length Register */ - u32 txic; /* 0x.110 - Transmit Interrupt Coalescing Configuration Register */ - u8 res7[16]; - u32 ctbptr; /* 0x.124 - Current Transmit Buffer Descriptor Pointer Register */ - u8 res8[92]; - u32 tbptr; /* 0x.184 - Transmit Buffer Descriptor Pointer Low Register */ - u8 res9[124]; - u32 tbase; /* 0x.204 - Transmit Descriptor Base Address Register */ - u8 res10[168]; - u32 ostbd; /* 0x.2b0 - Out-of-Sequence Transmit Buffer Descriptor Register */ - u32 ostbdp; /* 0x.2b4 - Out-of-Sequence Transmit Data Buffer Pointer Register */ - u8 res11[72]; - u32 rctrl; /* 0x.300 - Receive Control Register */ - u32 rstat; /* 0x.304 - Receive Status Register */ - u8 res12[4]; - u32 rbdlen; /* 0x.30c - RxBD Data Length Register */ - u32 rxic; /* 0x.310 - Receive Interrupt Coalescing Configuration Register */ - u8 res13[16]; - u32 crbptr; /* 0x.324 - Current Receive Buffer Descriptor Pointer */ - u8 res14[24]; - u32 mrblr; /* 0x.340 - Maximum Receive Buffer Length Register */ - u8 res15[64]; - u32 rbptr; /* 0x.384 - Receive Buffer Descriptor Pointer */ - u8 res16[124]; - u32 rbase; /* 0x.404 - Receive Descriptor Base Address */ - u8 res17[248]; - u32 maccfg1; /* 0x.500 - MAC Configuration 1 Register */ - u32 maccfg2; /* 0x.504 - MAC Configuration 2 Register */ - u32 ipgifg; /* 0x.508 - Inter Packet Gap/Inter Frame Gap Register */ - u32 hafdup; /* 0x.50c - Half Duplex Register */ - u32 maxfrm; /* 0x.510 - Maximum Frame Length Register */ + u8 res5[4]; + u32 fifo_rx_pause; /* 0x.0a4 - FIFO receive pause threshold register */ + u32 fifo_rx_alarm; /* 0x.0a8 - FIFO receive alarm threshold register */ + u8 res6[84]; + u32 tctrl; /* 0x.100 - Transmit Control Register */ + u32 tstat; /* 0x.104 - Transmit Status Register */ + u32 dfvlan; /* 0x.108 - Default VLAN Control word */ + u32 tbdlen; /* 0x.10c - Transmit Buffer Descriptor Data Length Register */ + u32 txic; /* 0x.110 - Transmit Interrupt Coalescing Configuration Register */ + u32 tqueue; /* 0x.114 - Transmit queue control register */ + u8 res7[40]; + u32 tr03wt; /* 0x.140 - TxBD Rings 0-3 round-robin weightings */ + u32 tr47wt; /* 0x.144 - TxBD Rings 4-7 round-robin weightings */ + u8 res8[52]; + u32 tbdbph; /* 0x.17c - Tx data buffer pointer high */ + u8 res9a[4]; + u32 tbptr0; /* 0x.184 - TxBD Pointer for ring 0 */ + u8 res9b[4]; + u32 tbptr1; /* 0x.18c - TxBD Pointer for ring 1 */ + u8 res9c[4]; + u32 tbptr2; /* 0x.194 - TxBD Pointer for ring 2 */ + u8 res9d[4]; + u32 tbptr3; /* 0x.19c - TxBD Pointer for ring 3 */ + u8 res9e[4]; + u32 tbptr4; /* 0x.1a4 - TxBD Pointer for ring 4 */ + u8 res9f[4]; + u32 tbptr5; /* 0x.1ac - TxBD Pointer for ring 5 */ + u8 res9g[4]; + u32 tbptr6; /* 0x.1b4 - TxBD Pointer for ring 6 */ + u8 res9h[4]; + u32 tbptr7; /* 0x.1bc - TxBD Pointer for ring 7 */ + u8 res9[64]; + u32 tbaseh; /* 0x.200 - TxBD base address high */ + u32 tbase0; /* 0x.204 - TxBD Base Address of ring 0 */ + u8 res10a[4]; + u32 tbase1; /* 0x.20c - TxBD Base Address of ring 1 */ + u8 res10b[4]; + u32 tbase2; /* 0x.214 - TxBD Base Address of ring 2 */ + u8 res10c[4]; + u32 tbase3; /* 0x.21c - TxBD Base Address of ring 3 */ + u8 res10d[4]; + u32 tbase4; /* 0x.224 - TxBD Base Address of ring 4 */ + u8 res10e[4]; + u32 tbase5; /* 0x.22c - TxBD Base Address of ring 5 */ + u8 res10f[4]; + u32 tbase6; /* 0x.234 - TxBD Base Address of ring 6 */ + u8 res10g[4]; + u32 tbase7; /* 0x.23c - TxBD Base Address of ring 7 */ + u8 res10[192]; + u32 rctrl; /* 0x.300 - Receive Control Register */ + u32 rstat; /* 0x.304 - Receive Status Register */ + u8 res12[8]; + u32 rxic; /* 0x.310 - Receive Interrupt Coalescing Configuration Register */ + u32 rqueue; /* 0x.314 - Receive queue control register */ + u8 res13[24]; + u32 rbifx; /* 0x.330 - Receive bit field extract control register */ + u32 rqfar; /* 0x.334 - Receive queue filing table address register */ + u32 rqfcr; /* 0x.338 - Receive queue filing table control register */ + u32 rqfpr; /* 0x.33c - Receive queue filing table property register */ + u32 mrblr; /* 0x.340 - Maximum Receive Buffer Length Register */ + u8 res14[56]; + u32 rbdbph; /* 0x.37c - Rx data buffer pointer high */ + u8 res15a[4]; + u32 rbptr0; /* 0x.384 - RxBD pointer for ring 0 */ + u8 res15b[4]; + u32 rbptr1; /* 0x.38c - RxBD pointer for ring 1 */ + u8 res15c[4]; + u32 rbptr2; /* 0x.394 - RxBD pointer for ring 2 */ + u8 res15d[4]; + u32 rbptr3; /* 0x.39c - RxBD pointer for ring 3 */ + u8 res15e[4]; + u32 rbptr4; /* 0x.3a4 - RxBD pointer for ring 4 */ + u8 res15f[4]; + u32 rbptr5; /* 0x.3ac - RxBD pointer for ring 5 */ + u8 res15g[4]; + u32 rbptr6; /* 0x.3b4 - RxBD pointer for ring 6 */ + u8 res15h[4]; + u32 rbptr7; /* 0x.3bc - RxBD pointer for ring 7 */ + u8 res16[64]; + u32 rbaseh; /* 0x.400 - RxBD base address high */ + u32 rbase0; /* 0x.404 - RxBD base address of ring 0 */ + u8 res17a[4]; + u32 rbase1; /* 0x.40c - RxBD base address of ring 1 */ + u8 res17b[4]; + u32 rbase2; /* 0x.414 - RxBD base address of ring 2 */ + u8 res17c[4]; + u32 rbase3; /* 0x.41c - RxBD base address of ring 3 */ + u8 res17d[4]; + u32 rbase4; /* 0x.424 - RxBD base address of ring 4 */ + u8 res17e[4]; + u32 rbase5; /* 0x.42c - RxBD base address of ring 5 */ + u8 res17f[4]; + u32 rbase6; /* 0x.434 - RxBD base address of ring 6 */ + u8 res17g[4]; + u32 rbase7; /* 0x.43c - RxBD base address of ring 7 */ + u8 res17[192]; + u32 maccfg1; /* 0x.500 - MAC Configuration 1 Register */ + u32 maccfg2; /* 0x.504 - MAC Configuration 2 Register */ + u32 ipgifg; /* 0x.508 - Inter Packet Gap/Inter Frame Gap Register */ + u32 hafdup; /* 0x.50c - Half Duplex Register */ + u32 maxfrm; /* 0x.510 - Maximum Frame Length Register */ u8 res18[12]; - u32 miimcfg; /* 0x.520 - MII Management Configuration Register */ - u32 miimcom; /* 0x.524 - MII Management Command Register */ - u32 miimadd; /* 0x.528 - MII Management Address Register */ - u32 miimcon; /* 0x.52c - MII Management Control Register */ - u32 miimstat; /* 0x.530 - MII Management Status Register */ - u32 miimind; /* 0x.534 - MII Management Indicator Register */ + u32 miimcfg; /* 0x.520 - MII Management Configuration Register */ + u32 miimcom; /* 0x.524 - MII Management Command Register */ + u32 miimadd; /* 0x.528 - MII Management Address Register */ + u32 miimcon; /* 0x.52c - MII Management Control Register */ + u32 miimstat; /* 0x.530 - MII Management Status Register */ + u32 miimind; /* 0x.534 - MII Management Indicator Register */ u8 res19[4]; - u32 ifstat; /* 0x.53c - Interface Status Register */ - u32 macstnaddr1; /* 0x.540 - Station Address Part 1 Register */ - u32 macstnaddr2; /* 0x.544 - Station Address Part 2 Register */ - u8 res20[312]; - struct rmon_mib rmon; - u8 res21[192]; - u32 iaddr0; /* 0x.800 - Indivdual address register 0 */ - u32 iaddr1; /* 0x.804 - Indivdual address register 1 */ - u32 iaddr2; /* 0x.808 - Indivdual address register 2 */ - u32 iaddr3; /* 0x.80c - Indivdual address register 3 */ - u32 iaddr4; /* 0x.810 - Indivdual address register 4 */ - u32 iaddr5; /* 0x.814 - Indivdual address register 5 */ - u32 iaddr6; /* 0x.818 - Indivdual address register 6 */ - u32 iaddr7; /* 0x.81c - Indivdual address register 7 */ + u32 ifstat; /* 0x.53c - Interface Status Register */ + u32 macstnaddr1; /* 0x.540 - Station Address Part 1 Register */ + u32 macstnaddr2; /* 0x.544 - Station Address Part 2 Register */ + u32 mac01addr1; /* 0x.548 - MAC exact match address 1, part 1 */ + u32 mac01addr2; /* 0x.54c - MAC exact match address 1, part 2 */ + u32 mac02addr1; /* 0x.550 - MAC exact match address 2, part 1 */ + u32 mac02addr2; /* 0x.554 - MAC exact match address 2, part 2 */ + u32 mac03addr1; /* 0x.558 - MAC exact match address 3, part 1 */ + u32 mac03addr2; /* 0x.55c - MAC exact match address 3, part 2 */ + u32 mac04addr1; /* 0x.560 - MAC exact match address 4, part 1 */ + u32 mac04addr2; /* 0x.564 - MAC exact match address 4, part 2 */ + u32 mac05addr1; /* 0x.568 - MAC exact match address 5, part 1 */ + u32 mac05addr2; /* 0x.56c - MAC exact match address 5, part 2 */ + u32 mac06addr1; /* 0x.570 - MAC exact match address 6, part 1 */ + u32 mac06addr2; /* 0x.574 - MAC exact match address 6, part 2 */ + u32 mac07addr1; /* 0x.578 - MAC exact match address 7, part 1 */ + u32 mac07addr2; /* 0x.57c - MAC exact match address 7, part 2 */ + u32 mac08addr1; /* 0x.580 - MAC exact match address 8, part 1 */ + u32 mac08addr2; /* 0x.584 - MAC exact match address 8, part 2 */ + u32 mac09addr1; /* 0x.588 - MAC exact match address 9, part 1 */ + u32 mac09addr2; /* 0x.58c - MAC exact match address 9, part 2 */ + u32 mac10addr1; /* 0x.590 - MAC exact match address 10, part 1*/ + u32 mac10addr2; /* 0x.594 - MAC exact match address 10, part 2*/ + u32 mac11addr1; /* 0x.598 - MAC exact match address 11, part 1*/ + u32 mac11addr2; /* 0x.59c - MAC exact match address 11, part 2*/ + u32 mac12addr1; /* 0x.5a0 - MAC exact match address 12, part 1*/ + u32 mac12addr2; /* 0x.5a4 - MAC exact match address 12, part 2*/ + u32 mac13addr1; /* 0x.5a8 - MAC exact match address 13, part 1*/ + u32 mac13addr2; /* 0x.5ac - MAC exact match address 13, part 2*/ + u32 mac14addr1; /* 0x.5b0 - MAC exact match address 14, part 1*/ + u32 mac14addr2; /* 0x.5b4 - MAC exact match address 14, part 2*/ + u32 mac15addr1; /* 0x.5b8 - MAC exact match address 15, part 1*/ + u32 mac15addr2; /* 0x.5bc - MAC exact match address 15, part 2*/ + u8 res20[192]; + struct rmon_mib rmon; /* 0x.680-0x.73c */ + u32 rrej; /* 0x.740 - Receive filer rejected packet counter */ + u8 res21[188]; + u32 igaddr0; /* 0x.800 - Indivdual/Group address register 0*/ + u32 igaddr1; /* 0x.804 - Indivdual/Group address register 1*/ + u32 igaddr2; /* 0x.808 - Indivdual/Group address register 2*/ + u32 igaddr3; /* 0x.80c - Indivdual/Group address register 3*/ + u32 igaddr4; /* 0x.810 - Indivdual/Group address register 4*/ + u32 igaddr5; /* 0x.814 - Indivdual/Group address register 5*/ + u32 igaddr6; /* 0x.818 - Indivdual/Group address register 6*/ + u32 igaddr7; /* 0x.81c - Indivdual/Group address register 7*/ u8 res22[96]; - u32 gaddr0; /* 0x.880 - Global address register 0 */ - u32 gaddr1; /* 0x.884 - Global address register 1 */ - u32 gaddr2; /* 0x.888 - Global address register 2 */ - u32 gaddr3; /* 0x.88c - Global address register 3 */ - u32 gaddr4; /* 0x.890 - Global address register 4 */ - u32 gaddr5; /* 0x.894 - Global address register 5 */ - u32 gaddr6; /* 0x.898 - Global address register 6 */ - u32 gaddr7; /* 0x.89c - Global address register 7 */ - u8 res23[856]; - u32 attr; /* 0x.bf8 - Attributes Register */ - u32 attreli; /* 0x.bfc - Attributes Extract Length and Extract Index Register */ + u32 gaddr0; /* 0x.880 - Group address register 0 */ + u32 gaddr1; /* 0x.884 - Group address register 1 */ + u32 gaddr2; /* 0x.888 - Group address register 2 */ + u32 gaddr3; /* 0x.88c - Group address register 3 */ + u32 gaddr4; /* 0x.890 - Group address register 4 */ + u32 gaddr5; /* 0x.894 - Group address register 5 */ + u32 gaddr6; /* 0x.898 - Group address register 6 */ + u32 gaddr7; /* 0x.89c - Group address register 7 */ + u8 res23a[352]; + u32 fifocfg; /* 0x.a00 - FIFO interface config register */ + u8 res23b[252]; + u8 res23c[248]; + u32 attr; /* 0x.bf8 - Attributes Register */ + u32 attreli; /* 0x.bfc - Attributes Extract Length and Extract Index Register */ u8 res24[1024]; }; @@ -496,6 +686,8 @@ struct txbd8 *cur_tx; /* Next free ring entry */ struct txbd8 *dirty_tx; /* The Ring entry to be freed. */ struct gfar *regs; /* Pointer to the GFAR memory mapped Registers */ + u32 *hash_regs[16]; + int hash_width; struct gfar *phyregs; struct work_struct tq; struct timer_list phy_info_timer; @@ -506,9 +698,12 @@ unsigned int rx_stash_size; unsigned int tx_ring_size; unsigned int rx_ring_size; - wait_queue_head_t rxcleanupq; - unsigned int rxclean; + unsigned char vlan_enable:1, + rx_csum_enable:1, + extended_hash:1; + unsigned short padding; + struct vlan_group *vlgrp; /* Info structure initialized by board setup code */ unsigned int interruptTransmit; unsigned int interruptReceive; @@ -519,6 +714,8 @@ int oldspeed; int oldduplex; int oldlink; + + uint32_t msg_enable; }; extern inline u32 gfar_read(volatile unsigned *addr) From matti.aarnio@zmailer.org Thu Jun 9 13:54:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 13:54:59 -0700 (PDT) Received: from mail.zmailer.org (van-1-67.lab.dnainternet.fi [62.78.96.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59KsuXq011151 for ; Thu, 9 Jun 2005 13:54:56 -0700 Received: (mea@mea-ext) by mail.zmailer.org id S1621224AbVFIUxs (ORCPT ); Thu, 9 Jun 2005 23:53:48 +0300 Date: Thu, 9 Jun 2005 23:53:48 +0300 From: Matti Aarnio To: netdev@oss.sgi.com Subject: Ethernet driver general problem .... Message-ID: <20050609205348.GT4661@mea-ext.zmailer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-archive-position: 2309 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: matti.aarnio@zmailer.org Precedence: bulk X-list: netdev Content-Length: 1430 Lines: 45 I encountered today a case, where default RX_RING_SIZE was too small for the traffic going thru. (3c59x driver, and NAT-router application.) One related problem is, that the RING_SIZE is hard #define parameter in most driver sources without ability to adjust it during runtime. (e.g. "modprobe zzzz rx_ring_size=512") Possibly even an ifconfig:able parameter, which works also for statically compiled kernels. (But needs new ifconfig machinery...) The case at hand was solved by editing kernel sources, and compiling a new kernel, but now I am looking for a more generic approach. Clearly pre-allocing 500 buffers of 1.6 kB in size does eat a bit much of kernel memory which usual workstation does not get benefit from, but server or router would get benefits. Having the rx_ring_size ifconfig tunable would help with adjusting interface for reception, but adding ifconfig parameters is ... not trivial. Alternate might be sysctl tunability -- even as a global 'dev.eth.rx_ring_size' (or 'net.dev.eth.rx_ring_size') which supplies (default) ring size for initialization of the interface where such rings exist at all... Possibly there should also be tx_ring_size, which need not be same size as the rx_ring_size. Unadjusted default value for both could be 16. All in all, this needs near uniquitous changes in driver init codes accross lots of network drivers. Comments ? Suggestions ? /Matti Aarnio From jketreno@linux.intel.com Thu Jun 9 14:03:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:03:39 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59L3YXq012239 for ; Thu, 9 Jun 2005 14:03:35 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j59L1c68025155; Thu, 9 Jun 2005 21:01:38 GMT Received: from [192.168.1.154] (hdlrvguser-273.hd.intel.com [10.127.53.36]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j59L1Ugw017524; Thu, 9 Jun 2005 21:01:33 GMT Message-ID: <42A8AE2A.4080104@linux.intel.com> Date: Thu, 09 Jun 2005 16:01:30 -0500 From: James Ketrenos User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050519 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: pavel@ucw.cz, vda@ilport.com.ua, abonilla@linuxwireless.org, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> In-Reply-To: <20050609.125324.88476545.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2310 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jketreno@linux.intel.com Precedence: bulk X-list: netdev Content-Length: 1304 Lines: 39 David S. Miller wrote: >From: Pavel Machek >Date: Thu, 9 Jun 2005 12:42:05 +0200 > > > >>I'm not saying it should not work automagically. But it is wrong to >>start transmitting on wireless as soon as kernel boots. It should stay >>quiet in the radio until it is either told to talk or until interface >>is upped. >> >> > >I agree. > >There is a similar problem in the Acenic driver, it brings the >link up and receives broadcast packets as soon as the driver >is loaded. Mostly this is because the driver inits the chip >and registers the IRQ handler at probe time, whereas nearly >every other driver does this at ->open() time. > > The ipw2100 originally postponed doing any initialization until open was called. The problem at that time was that distributions were crafted to rely on link detection (I believe via ethtoolop's get_link) before they would bring the interface up. With a wireless device, you don't have link until you are associated... chicken and egg. The solution was to move initialization and association to the probe. I don't know if all the distributions have moved away from this model. If they have and the devices are brought up regardless of link, then going back to delaying radio initialization until the open() is called is workable. James From afleming@freescale.com Thu Jun 9 14:05:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:05:56 -0700 (PDT) Received: from az33egw01.freescale.net (az33egw01.freescale.net [192.88.158.102]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59L5rXq012850 for ; Thu, 9 Jun 2005 14:05:54 -0700 Received: from az33smr01.freescale.net (az33smr01.freescale.net [10.64.34.199]) by az33egw01.freescale.net (8.12.11/az33egw01) with ESMTP id j59LAWqg017880; Thu, 9 Jun 2005 14:10:33 -0700 (MST) Received: from [10.82.16.201] ([10.82.16.201]) by az33smr01.freescale.net (8.13.1/8.13.0) with ESMTP id j59L6XaS028051; Thu, 9 Jun 2005 16:06:33 -0500 (CDT) In-Reply-To: <20050609205348.GT4661@mea-ext.zmailer.org> References: <20050609205348.GT4661@mea-ext.zmailer.org> Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Cc: netdev@oss.sgi.com Content-Transfer-Encoding: 7bit From: Andy Fleming Subject: Re: Ethernet driver general problem .... Date: Thu, 9 Jun 2005 16:04:48 -0500 To: Matti Aarnio X-Mailer: Apple Mail (2.730) X-archive-position: 2311 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: afleming@freescale.com Precedence: bulk X-list: netdev Content-Length: 1645 Lines: 52 Look into ethtool. ethtool allows you to configure the ring sizes, among other things. On Jun 9, 2005, at 15:53, Matti Aarnio wrote: > I encountered today a case, where default RX_RING_SIZE > was too small for the traffic going thru. (3c59x driver, > and NAT-router application.) > > One related problem is, that the RING_SIZE is hard #define > parameter in most driver sources without ability to adjust > it during runtime. (e.g. "modprobe zzzz rx_ring_size=512") > Possibly even an ifconfig:able parameter, which works > also for statically compiled kernels. (But needs new > ifconfig machinery...) > > > The case at hand was solved by editing kernel sources, and > compiling a new kernel, but now I am looking for a more > generic approach. > > > Clearly pre-allocing 500 buffers of 1.6 kB in size does > eat a bit much of kernel memory which usual workstation > does not get benefit from, but server or router would get > benefits. > > Having the rx_ring_size ifconfig tunable would help with > adjusting interface for reception, but adding ifconfig > parameters is ... not trivial. > > Alternate might be sysctl tunability -- even as a global > 'dev.eth.rx_ring_size' (or 'net.dev.eth.rx_ring_size') > which supplies (default) ring size for initialization of > the interface where such rings exist at all... > > > Possibly there should also be tx_ring_size, which need not > be same size as the rx_ring_size. > > Unadjusted default value for both could be 16. > > > All in all, this needs near uniquitous changes in driver > init codes accross lots of network drivers. > > > Comments ? Suggestions ? > > /Matti Aarnio > From galibert@dspnet.fr.eu.org Thu Jun 9 14:13:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:14:01 -0700 (PDT) Received: from dspnet.fr.eu.org (dspnet.fr.eu.org [213.186.44.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59LDwXq013915 for ; Thu, 9 Jun 2005 14:13:58 -0700 Received: by dspnet.fr.eu.org (Postfix, from userid 1007) id C8C8934D20; Thu, 9 Jun 2005 23:12:42 +0200 (CEST) Date: Thu, 9 Jun 2005 23:12:42 +0200 From: Olivier Galibert To: Andi Kleen Cc: James Ketrenos , Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem Message-ID: <20050609211242.GA30319@dspnet.fr.eu.org> Mail-Followup-To: Olivier Galibert , Andi Kleen , James Ketrenos , Jeff Garzik , Netdev list , kernel list , "James P. Ketrenos" References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-archive-position: 2313 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: galibert@pobox.com Precedence: bulk X-list: netdev Content-Length: 624 Lines: 20 On Thu, Jun 09, 2005 at 03:56:15PM +0200, Andi Kleen wrote: > I guess at some point we will need a file system in there, but - oops - > we already have one, dont we? :) Well, you could put .config in it too. Frankly, a filesystem that: - can be somehow linked with vmlinux and not separate like an initrd - editable post vmlinux-linking - gives files that can be accessed from request_firmware, acpi and friends even rather early in the boot process (i.e. well before any userland is allowed to exist) - accessible post-boot through mounting of a special fs and/or /proc or something would be quite useful. OG. From pavel@ucw.cz Thu Jun 9 14:12:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:12:40 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59LCTXq013632 for ; Thu, 9 Jun 2005 14:12:35 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 5EB188B8CA; Thu, 9 Jun 2005 23:11:13 +0200 (CEST) Date: Thu, 9 Jun 2005 23:11:13 +0200 From: Pavel Machek To: James Ketrenos Cc: "David S. Miller" , vda@ilport.com.ua, abonilla@linuxwireless.org, jgarzik@pobox.com, netdev@oss.sgi.com, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem Message-ID: <20050609211113.GC4173@elf.ucw.cz> References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> <42A8AE2A.4080104@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42A8AE2A.4080104@linux.intel.com> X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2312 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 1035 Lines: 27 Hi! > >I agree. > > > >There is a similar problem in the Acenic driver, it brings the > >link up and receives broadcast packets as soon as the driver > >is loaded. Mostly this is because the driver inits the chip > >and registers the IRQ handler at probe time, whereas nearly > >every other driver does this at ->open() time. > > > > > The ipw2100 originally postponed doing any initialization until open was > called. The problem at that time was that distributions were crafted to > rely on link detection (I believe via ethtoolop's get_link) before they > would bring the interface up. > > With a wireless device, you don't have link until you are associated... > chicken and egg. The solution was to move initialization and > association to the probe. > > I don't know if all the distributions have moved away from this model. > If they have and the devices are brought up regardless of link, then > going back to delaying radio initialization until the open() is called > is workable. Ook, great, I see. Pavel From romieu@fr.zoreil.com Thu Jun 9 14:17:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:17:06 -0700 (PDT) Received: from fr.zoreil.com (electric-eye.fr.zoreil.com [213.41.134.224]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59LGxXq014828 for ; Thu, 9 Jun 2005 14:16:59 -0700 Received: from electric-eye.fr.zoreil.com (localhost.localdomain [127.0.0.1]) by fr.zoreil.com (8.13.1/8.12.1) with ESMTP id j59LCXP6007694; Thu, 9 Jun 2005 23:12:33 +0200 Received: (from romieu@localhost) by electric-eye.fr.zoreil.com (8.13.1/8.13.1/Submit) id j59LCSbI007684; Thu, 9 Jun 2005 23:12:28 +0200 Date: Thu, 9 Jun 2005 23:12:26 +0200 From: Francois Romieu To: Matti Aarnio Cc: netdev@oss.sgi.com Subject: Re: Ethernet driver general problem .... Message-ID: <20050609211226.GA7387@electric-eye.fr.zoreil.com> References: <20050609205348.GT4661@mea-ext.zmailer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050609205348.GT4661@mea-ext.zmailer.org> User-Agent: Mutt/1.4.1i X-Organisation: Land of Sunshine Inc. X-archive-position: 2314 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: romieu@fr.zoreil.com Precedence: bulk X-list: netdev Content-Length: 150 Lines: 8 Matti Aarnio : [...] > Comments ? Suggestions ? See the use of ethtool_ops.set_ringparam() below drivers/net. -- Ueimor From SRS0+97f303dfd3a29984ca02+655+infradead.org+arjan@pentafluge.srs.infradead.org Thu Jun 9 14:17:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:17:36 -0700 (PDT) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59LHUXq014977 for ; Thu, 9 Jun 2005 14:17:32 -0700 Received: from g133107.upc-g.chello.nl ([80.57.133.107] helo=[172.31.3.43]) by pentafluge.infradead.org with esmtpsa (Exim 4.43 #1 (Red Hat Linux)) id 1DgUNc-00058e-NO; Thu, 09 Jun 2005 22:16:01 +0100 Subject: Re: ipw2100: firmware problem From: Arjan van de Ven To: James Ketrenos Cc: "David S. Miller" , pavel@ucw.cz, vda@ilport.com.ua, abonilla@linuxwireless.org, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com In-Reply-To: <42A8AE2A.4080104@linux.intel.com> References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> <42A8AE2A.4080104@linux.intel.com> Content-Type: text/plain Date: Thu, 09 Jun 2005 23:15:54 +0200 Message-Id: <1118351754.5508.30.camel@laptopd505.fenrus.org> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-4) Content-Transfer-Encoding: 7bit X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html X-archive-position: 2315 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: arjan@infradead.org Precedence: bulk X-list: netdev Content-Length: 350 Lines: 9 On Thu, 2005-06-09 at 16:01 -0500, James Ketrenos wrote: > I don't know if all the distributions have moved away from this model. > If they have and the devices are brought up regardless of link, then > going back to delaying radio initialization until the open() is called > is workable. wouldn't you want that anyway for power saving reasons? From afleming@freescale.com Thu Jun 9 14:33:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:33:36 -0700 (PDT) Received: from az33egw01.freescale.net (az33egw01.freescale.net [192.88.158.102]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59LXXXq016619 for ; Thu, 9 Jun 2005 14:33:33 -0700 Received: from az33smr02.freescale.net (az33smr02.freescale.net [10.64.34.200]) by az33egw01.freescale.net (8.12.11/az33egw01) with ESMTP id j59Lc2N5024495; Thu, 9 Jun 2005 14:38:03 -0700 (MST) Received: from [10.82.16.201] ([10.82.16.201]) by az33smr02.freescale.net (8.13.1/8.13.0) with ESMTP id j59LZeDN012945; Thu, 9 Jun 2005 16:35:41 -0500 (CDT) In-Reply-To: <42A360A0.1040902@aarnet.edu.au> References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> <20050601144123.2bc11c06@dxpl.pdx.osdl.net> <9A2D608A-D818-455B-96F4-ED42413556C0@freescale.com> <42A360A0.1040902@aarnet.edu.au> Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <0A9010B9-D24A-4762-8069-F19607ADD416@freescale.com> Cc: Stephen Hemminger , Netdev , Kumar Gala Content-Transfer-Encoding: 7bit From: Andy Fleming Subject: Re: RFC: PHY Abstraction Layer II Date: Thu, 9 Jun 2005 16:32:18 -0500 To: Glen Turner X-Mailer: Apple Mail (2.730) X-archive-position: 2316 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: afleming@freescale.com Precedence: bulk X-list: netdev Content-Length: 753 Lines: 23 On Jun 5, 2005, at 15:29, Glen Turner wrote: > > Operationally, it would be very useful if the PHY printed > the physical interface detail when detected (1000Base-LX, > etc). I was thinking that it would be easier for the ethernet driver to do this in the adjust_link() function, since it's going to need to track when these things change, anyway. But if the general consensus is that it should be in the generic code, I can see about adding it there. > > Also, it would be nice to be able to retrieve PHY data > independent of the interface status (eg, to retrieve > asset serial numbers, GBIC make/models, etc). I'm not sure what you mean, here. The driver can use phy_read/write to get/set information anytime it wants. Andy Fleming From jesse.brandeburg@intel.com Thu Jun 9 14:40:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 14:40:03 -0700 (PDT) Received: from orsfmr004.jf.intel.com (fmr19.intel.com [134.134.136.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59Le0Xq017350 for ; Thu, 9 Jun 2005 14:40:00 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr004.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j59LbYRL001979; Thu, 9 Jun 2005 21:37:34 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j59LbYOa016505; Thu, 9 Jun 2005 21:37:34 GMT Received: from ladlxr.jf.intel.com (ladlxr.jf.intel.com [10.23.35.110]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j59LbVSL019488; Thu, 9 Jun 2005 14:37:31 -0700 Date: Thu, 9 Jun 2005 14:37:31 -0700 (PDT) From: Jesse Brandeburg X-X-Sender: jbrandeb@ladlxr To: jamal cc: "David S. Miller" , "Brandeburg, Jesse" , "Ronciak, John" , shemminger@osdl.org, "Williams, Mitch A" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <1118237775.6382.34.camel@localhost.localdomain> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> <20050607.204339.21591152.davem@davemloft.net> <1118237775.6382.34.camel@localhost.localdomain> ReplyTo: "Jesse Brandeburg" MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-264338517-1118353051=:10396" X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2317 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jesse.brandeburg@intel.com Precedence: bulk X-list: netdev Content-Length: 7965 Lines: 169 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-264338517-1118353051=:10396 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT On Wed, 8 Jun 2005, jamal wrote: > > Something is up, if a single gigabit TCP stream can fully CPU > > load your machine.  10 gigabit, yeah, definitely all current > > generation machines are cpu limited over that link speed, but > > 1 gigabit should be no problem. > > > > Yes, sir. > BTW, all along i thought the sender and receiver are hooked up directly > (there was some mention of chariot a while back). Okay let me clear this up once and for all, here is our test setup: * 10 1u rack machines (dual P3 - 1250MHz), with both windows and linux installed (running windows now) * Extreme 1gig switch * Dual 2.8 GHz P4 server, RHEL3 base, running 2.6.12-rc5 or supertso patch * the test entails transferring 1MB files of zeros from memory to memory, using TCP, with each client doing primary either send or recv, not both. > Even if they did have some smart ass thing in the middle that reorders, > it is still suprising that such a fast CPU cant handle a mere one Gig of > what seems to be MTU=1500 bytes sized packets. It can handle a single thread (or even 6) just fine, its after that we get in trouble somewhere. > I suppose a netstat -s would help for visualization in addition to those > dumps. Okay I have that data, do you want it for the old tso, supertso, or no tso at all? > Heres what i am deducing from their data, correct me if i am wrong: > ->The evidence is that something is expensive in their code path (duh). Actually I've found that adding more threads (10 total) sending to the server, while keeping the transmit thread count constant yields an increase our throughput all the way to 1750+ Mb/s (with supertso) > -> Whatever that expensive thing code is, it not helped by them > replenishing the descriptors after all the budget is exhausted since the > descriptor departure rate is much slower than packet arrival. I'm running all my tests with the replenish patch mentioned earlier in this thread. > ---> This is why they would be seeing that the reduction of weight > improves performance since the replenishing happens sooner with a > smaller weight. seems like we're past the weight problem now, should i start a new thread? > ------> Clearly the driver needs some fixing - if they could do what I'm not convinced it is the driver that is having issues. We might be having some complex interaction with the stack, but I definitely think we have a lot of onion layers to hack through here, all of which are probably relevant. > Even if they SACKed for every packet, this still would not make any > sense. So i think a profile of where the cycles are spent would also > help. I am suspecting the driver at this point but i could be wrong. I have profile data, here is an example of 5tx/5rx threads, where the throughput was 1236Mb/s total, 936tx, 300rx, on 2.6.12-rc5 with old TSO (the original problem case) we are at 100% cpu and generating 3289 ints/s, with no hardware drops reported prolly due to my replenish patch CPU: P4 / Xeon with 2 hyper-threads, speed 2791.36 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name symbol name 533687 8.1472 vmlinux pskb_expand_head 428726 6.5449 vmlinux __copy_user_zeroing_intel 349934 5.3421 vmlinux _read_lock_irqsave 313667 4.7884 vmlinux csum_partial 218870 3.3413 vmlinux _spin_lock 214302 3.2715 vmlinux __copy_user_intel 193662 2.9564 vmlinux skb_release_data 177755 2.7136 vmlinux ipt_do_table 148445 2.2662 vmlinux _write_lock_irqsave 148080 2.2606 vmlinux _read_unlock_bh 143308 2.1877 vmlinux tcp_sendmsg 115745 1.7670 vmlinux ip_queue_xmit 111487 1.7020 vmlinux __kfree_skb 108383 1.6546 vmlinux _spin_lock_irqsave 108071 1.6498 e1000.ko e1000_xmit_frame 107850 1.6464 vmlinux tcp_clean_rtx_queue 104552 1.5961 e1000.ko e1000_clean_tx_irq 101308 1.5466 e1000.ko e1000_clean_rx_irq 94297 1.4395 vmlinux __copy_from_user_ll 85170 1.3002 vmlinux kfree 76730 1.1714 vmlinux tcp_transmit_skb 70976 1.0835 vmlinux eth_type_trans 67381 1.0286 vmlinux tcp_rcv_established 64670 0.9872 vmlinux sub_preempt_count 64451 0.9839 vmlinux dev_queue_xmit 64010 0.9772 vmlinux skb_clone 62314 0.9513 vmlinux tcp_v4_rcv 61980 0.9462 vmlinux nf_iterate 60374 0.9217 vmlinux ip_finish_output 57407 0.8764 vmlinux _write_unlock_bh 56165 0.8574 vmlinux mark_offset_tsc 54673 0.8346 endpoint (no symbols) 52662 0.8039 vmlinux __kmalloc 50112 0.7650 vmlinux sock_wfree 50001 0.7633 vmlinux _spin_trylock 47053 0.7183 vmlinux _read_lock_bh 45988 0.7021 vmlinux tcp_write_xmit 44229 0.6752 vmlinux kmem_cache_alloc 43506 0.6642 vmlinux smp_processor_id 42401 0.6473 vmlinux ip_conntrack_find_get 42095 0.6426 vmlinux alloc_skb 40619 0.6201 vmlinux tcp_in_window 38098 0.5816 vmlinux add_preempt_count 37701 0.5755 vmlinux __copy_to_user_ll 31529 0.4813 vmlinux ip_conntrack_in 31314 0.4780 vmlinux kmem_cache_free 30954 0.4725 vmlinux __ip_conntrack_find 30863 0.4712 vmlinux local_bh_enable 30774 0.4698 vmlinux tcp_packet 29426 0.4492 vmlinux _spin_unlock_irqrestore 28716 0.4384 vmlinux hash_conntrack 27073 0.4133 vmlinux ip_route_input 26540 0.4052 e1000.ko e1000_clean 25817 0.3941 vmlinux nf_hook_slow 23395 0.3571 vmlinux schedule 22981 0.3508 vmlinux tcp_v4_send_check 22139 0.3380 vmlinux __mod_timer 22126 0.3378 vmlinux timer_interrupt 21511 0.3284 vmlinux cache_alloc_refill 21161 0.3230 vmlinux netif_receive_skb 20418 0.3117 vmlinux _write_lock_bh 19443 0.2968 vmlinux skb_copy_datagram_iovec 19100 0.2916 vmlinux ip_nat_fn 18784 0.2868 vmlinux ip_local_deliver 18251 0.2786 vmlinux _read_lock 17513 0.2674 vmlinux nat_packet 17124 0.2614 e1000.ko e1000_intr 16357 0.2497 vmlinux default_idle 15358 0.2345 vmlinux qdisc_restart 14564 0.2223 vmlinux _read_unlock 14360 0.2192 vmlinux tcp_recvmsg 13853 0.2115 oprofiled odb_insert 13374 0.2042 e1000.ko e1000_alloc_rx_buffers 13321 0.2034 vmlinux apic_timer_interrupt 12668 0.1934 vmlinux pfifo_fast_enqueue 12618 0.1926 vmlinux tcp_sack 12180 0.1859 vmlinux ip_nat_local_fn 11434 0.1746 vmlinux system_call 11426 0.1744 vmlinux free_block 11377 0.1737 vmlinux try_to_wake_up 11138 0.1700 vmlinux irq_entries_start 11017 0.1682 vmlinux ipt_route_hook 10987 0.1677 vmlinux dev_queue_xmit_nit 10970 0.1675 vmlinux tcp_push_one 10508 0.1604 vmlinux tcp_error 10365 0.1582 vmlinux pfifo_fast_dequeue 10323 0.1576 vmlinux ip_rcv 10022 0.1530 vmlinux ip_output --8323328-264338517-1118353051=:10396-- From shemminger@osdl.org Thu Jun 9 15:07:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 15:07:55 -0700 (PDT) Received: from smtp.osdl.org (smtp.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59M7oXq019186 for ; Thu, 9 Jun 2005 15:07:51 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j59M5ljA009094 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 9 Jun 2005 15:05:48 -0700 Received: from unknown-215.office.pdx.osdl.net (unknown-215.office.pdx.osdl.net [10.8.0.215]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j59M5kjG024762; Thu, 9 Jun 2005 15:05:46 -0700 Date: Thu, 9 Jun 2005 15:05:46 -0700 From: Stephen Hemminger To: Jesse Brandeburg Cc: jamal , "David S. Miller" , "Brandeburg, Jesse" , "Ronciak, John" , "Williams, Mitch A" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050609150546.61b0fee7@unknown-215.office.pdx.osdl.net> In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> <20050607.204339.21591152.davem@davemloft.net> <1118237775.6382.34.camel@localhost.localdomain> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.109 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2318 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 1219 Lines: 25 > I have profile data, here is an example of 5tx/5rx threads, where the > throughput was 1236Mb/s total, 936tx, 300rx, on 2.6.12-rc5 with old TSO > (the original problem case) we are at 100% cpu and generating 3289 ints/s, > with no hardware drops reported prolly due to my replenish patch > CPU: P4 / Xeon with 2 hyper-threads, speed 2791.36 MHz (estimated) > Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 > samples % image name symbol name > 533687 8.1472 vmlinux pskb_expand_head > 428726 6.5449 vmlinux __copy_user_zeroing_intel > 349934 5.3421 vmlinux _read_lock_irqsave We should kill all reader/writer locks in the fastpath. reader locks are more expensive than spinlocks unless they are going to be held for a fairly large window. > 313667 4.7884 vmlinux csum_partial > 218870 3.3413 vmlinux _spin_lock > 214302 3.2715 vmlinux __copy_user_intel > 193662 2.9564 vmlinux skb_release_data > 177755 2.7136 vmlinux ipt_do_table You are probably benchmarking iptables/netfilter! How many rules do you have? From davem@davemloft.net Thu Jun 9 15:12:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 15:12:31 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59MCPXq020019 for ; Thu, 9 Jun 2005 15:12:25 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgVEy-0000km-JU; Thu, 09 Jun 2005 15:11:08 -0700 Date: Thu, 09 Jun 2005 15:11:08 -0700 (PDT) Message-Id: <20050609.151108.92584111.davem@davemloft.net> To: jketreno@linux.intel.com Cc: pavel@ucw.cz, vda@ilport.com.ua, abonilla@linuxwireless.org, jgarzik@pobox.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem From: "David S. Miller" In-Reply-To: <42A8AE2A.4080104@linux.intel.com> References: <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> <42A8AE2A.4080104@linux.intel.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2319 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 656 Lines: 15 From: James Ketrenos Date: Thu, 09 Jun 2005 16:01:30 -0500 > The ipw2100 originally postponed doing any initialization until open was > called. The problem at that time was that distributions were crafted to > rely on link detection (I believe via ethtoolop's get_link) before they > would bring the interface up. Yes, I see, and that does work for most ethernet devices. I noticed that Debian's 3.1 installer used this to determine which ethernet device it should use as the default in it's network device dialogue. One idea, returning true for get_link when the device is down, may not be a bad idea for the wireless case. From jesse.brandeburg@intel.com Thu Jun 9 15:14:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 15:15:45 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59MEdXq020631 for ; Thu, 9 Jun 2005 15:14:39 -0700 Received: from orsfmr101.jf.intel.com (orsfmr101.jf.intel.com [10.7.209.17]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j59MCA68005889; Thu, 9 Jun 2005 22:12:10 GMT Received: from nwlxmail01.jf.intel.com (nwlxmail01.jf.intel.com [10.7.171.40]) by orsfmr101.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j59MCAOa006506; Thu, 9 Jun 2005 22:12:10 GMT Received: from ladlxr.jf.intel.com (ladlxr.jf.intel.com [10.23.35.110]) by nwlxmail01.jf.intel.com (8.12.10/8.12.9/MailSET/Hub) with ESMTP id j59MC9SL021746; Thu, 9 Jun 2005 15:12:09 -0700 Date: Thu, 9 Jun 2005 15:12:09 -0700 (PDT) From: Jesse Brandeburg X-X-Sender: jbrandeb@ladlxr To: Stephen Hemminger cc: "Brandeburg, Jesse" , jamal , "David S. Miller" , "Ronciak, John" , "Williams, Mitch A" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <20050609150546.61b0fee7@unknown-215.office.pdx.osdl.net> Message-ID: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408><20050607.132159.35660612.davem@davemloft.net><20050607.204339.21591152.davem@davemloft.net><1118237775.6382.34.camel@localhost.localdomain> <20050609150546.61b0fee7@unknown-215.office.pdx.osdl.net> ReplyTo: "Jesse Brandeburg" MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-601160196-1118355129=:16917" X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2320 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jesse.brandeburg@intel.com Precedence: bulk X-list: netdev Content-Length: 875 Lines: 22 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-601160196-1118355129=:16917 Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8BIT On Thu, 9 Jun 2005, Stephen Hemminger wrote: > > 313667    4.7884  vmlinux          csum_partial > > 218870    3.3413  vmlinux          _spin_lock > > 214302    3.2715  vmlinux          __copy_user_intel > > 193662    2.9564  vmlinux          skb_release_data > > 177755    2.7136  vmlinux          ipt_do_table > > You are probably benchmarking iptables/netfilter! How many rules do you > have? I saw that... somehow iptables got compiled into kernel statically. no rules are active or installed iptables -L -n shows nothing in any chain. Jesse --8323328-601160196-1118355129=:16917-- From hadi@cyberus.ca Thu Jun 9 15:21:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 15:21:57 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59MLkXq021391 for ; Thu, 9 Jun 2005 15:21:46 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DgVOA-0000jL-6H for netdev@oss.sgi.com; Thu, 09 Jun 2005 16:20:38 -0600 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DgVOD-0006Bf-68; Thu, 09 Jun 2005 18:20:41 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Jesse Brandeburg Cc: "David S. Miller" , "Ronciak, John" , shemminger@osdl.org, "Williams, Mitch A" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> <20050607.204339.21591152.davem@davemloft.net> <1118237775.6382.34.camel@localhost.localdomain> Content-Type: text/plain Organization: unknown Date: Thu, 09 Jun 2005 18:20:36 -0400 Message-Id: <1118355636.12573.32.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2321 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 3185 Lines: 91 On Thu, 2005-09-06 at 14:37 -0700, Jesse Brandeburg wrote: > Okay let me clear this up once and for all, here is our test setup: > > * 10 1u rack machines (dual P3 - 1250MHz), with both windows and linux > installed (running windows now) > * Extreme 1gig switch > * Dual 2.8 GHz P4 server, RHEL3 base, running 2.6.12-rc5 or supertso patch > > * the test entails transferring 1MB files of zeros from memory to memory, > using TCP, with each client doing primary either send or recv, not both. Linux as sender? > > Even if they did have some smart ass thing in the middle that reorders, > > it is still suprising that such a fast CPU cant handle a mere one Gig of > > what seems to be MTU=1500 bytes sized packets. > > It can handle a single thread (or even 6) just fine, its after that we get > in trouble somewhere. > Certainly interesting details? > > I suppose a netstat -s would help for visualization in addition to those > > dumps. > > Okay I have that data, do you want it for the old tso, supertso, or no tso > at all? > hrmph - dont know. Dave could tell you. I would say whatever you are running thats latest and greatest and causes you trouble? > > Heres what i am deducing from their data, correct me if i am wrong: > > ->The evidence is that something is expensive in their code path (duh). > > Actually I've found that adding more threads (10 total) sending to the > server, while keeping the transmit thread count constant yields an > increase our throughput all the way to 1750+ Mb/s (with supertso) > Interesting tidbit > > -> Whatever that expensive thing code is, it not helped by them > > replenishing the descriptors after all the budget is exhausted since the > > descriptor departure rate is much slower than packet arrival. > > I'm running all my tests with the replenish patch mentioned earlier in > this thread. > Ok. When i said " in the data path" - it could be anything from the driver all the way to the socket. If you have some pig along that path - it would mean you get back less often to replenish the descriptors. > > ---> This is why they would be seeing that the reduction of weight > > improves performance since the replenishing happens sooner with a > > smaller weight. > > seems like we're past the weight problem now, should i start a new thread? > I think so. > > ------> Clearly the driver needs some fixing - if they could do what > > I'm not convinced it is the driver that is having issues. We might be > having some complex interaction with the stack, but I definitely think we > have a lot of onion layers to hack through here, all of which are probably > relevant. > I agree. But the driver could have some improvement as well if you did what the other driver does ;-> > I have profile data, here is an example of 5tx/5rx threads, where the > throughput was 1236Mb/s total, 936tx, 300rx, on 2.6.12-rc5 with old TSO > (the original problem case) we are at 100% cpu and generating 3289 ints/s, > with no hardware drops reported prolly due to my replenish patch Hrm, reading Stephen email as well ;-> Can you turn off netfilter off totaly? Most importantly remove contracking. cheers, jamal From davem@davemloft.net Thu Jun 9 15:22:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 15:22:47 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59MMiXq021741 for ; Thu, 9 Jun 2005 15:22:44 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgVOp-0000mf-87; Thu, 09 Jun 2005 15:21:19 -0700 Date: Thu, 09 Jun 2005 15:21:19 -0700 (PDT) Message-Id: <20050609.152119.55508630.davem@davemloft.net> To: jesse.brandeburg@intel.com Cc: shemminger@osdl.org, hadi@cyberus.ca, john.ronciak@intel.com, mitch.a.williams@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: References: <20050609150546.61b0fee7@unknown-215.office.pdx.osdl.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2322 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 353 Lines: 8 From: Jesse Brandeburg Date: Thu, 9 Jun 2005 15:12:09 -0700 (PDT) > I saw that... somehow iptables got compiled into kernel statically. no > rules are active or installed iptables -L -n shows nothing in any chain. Netfilter can kill performance, even if no rules are loaded at all. Please take that out of your kernel. From hadi@cyberus.ca Thu Jun 9 15:22:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 15:22:50 -0700 (PDT) Received: from mx02.cybersurf.com (mx02.cybersurf.com [209.197.145.105]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59MMiXq021744 for ; Thu, 9 Jun 2005 15:22:44 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx02.cybersurf.com with esmtp (Exim 4.30) id 1DgVPC-0004uQ-Uj for netdev@oss.sgi.com; Thu, 09 Jun 2005 18:21:42 -0400 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DgVPA-0006H1-Up; Thu, 09 Jun 2005 18:21:41 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Jesse Brandeburg Cc: Stephen Hemminger , "David S. Miller" , "Ronciak, John" , "Williams, Mitch A" , mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, "Venkatesan, Ganesh" In-Reply-To: References: <468F3FDA28AA87429AD807992E22D07E0450C01F@orsmsx408> <20050607.132159.35660612.davem@davemloft.net> <20050607.204339.21591152.davem@davemloft.net> <1118237775.6382.34.camel@localhost.localdomain> <20050609150546.61b0fee7@unknown-215.office.pdx.osdl.net> Content-Type: text/plain Organization: unknown Date: Thu, 09 Jun 2005 18:21:35 -0400 Message-Id: <1118355695.12573.34.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2323 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 724 Lines: 20 On Thu, 2005-09-06 at 15:12 -0700, Jesse Brandeburg wrote: > On Thu, 9 Jun 2005, Stephen Hemminger wrote: > > > 313667 4.7884 vmlinux csum_partial > > > 218870 3.3413 vmlinux _spin_lock > > > 214302 3.2715 vmlinux __copy_user_intel > > > 193662 2.9564 vmlinux skb_release_data > > > 177755 2.7136 vmlinux ipt_do_table > > > > You are probably benchmarking iptables/netfilter! How many rules do you > > have? > > I saw that... somehow iptables got compiled into kernel statically. no > rules are active or installed iptables -L -n shows nothing in any chain. Contracking is a lot worse of a problem. Just turn off netfilter all together. cheers, jamal From davem@davemloft.net Thu Jun 9 15:23:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 15:23:47 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59MNfXq022703 for ; Thu, 9 Jun 2005 15:23:41 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgVPw-0000nE-9t; Thu, 09 Jun 2005 15:22:28 -0700 Date: Thu, 09 Jun 2005 15:22:28 -0700 (PDT) Message-Id: <20050609.152228.112623409.davem@davemloft.net> To: shemminger@osdl.org Cc: jesse.brandeburg@intel.com, hadi@cyberus.ca, john.ronciak@intel.com, mitch.a.williams@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <20050609150546.61b0fee7@unknown-215.office.pdx.osdl.net> References: <1118237775.6382.34.camel@localhost.localdomain> <20050609150546.61b0fee7@unknown-215.office.pdx.osdl.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2324 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1123 Lines: 20 From: Stephen Hemminger Date: Thu, 9 Jun 2005 15:05:46 -0700 > > I have profile data, here is an example of 5tx/5rx threads, where the > > throughput was 1236Mb/s total, 936tx, 300rx, on 2.6.12-rc5 with old TSO > > (the original problem case) we are at 100% cpu and generating 3289 ints/s, > > with no hardware drops reported prolly due to my replenish patch > > CPU: P4 / Xeon with 2 hyper-threads, speed 2791.36 MHz (estimated) > > Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 > > samples % image name symbol name > > 533687 8.1472 vmlinux pskb_expand_head > > 428726 6.5449 vmlinux __copy_user_zeroing_intel > > 349934 5.3421 vmlinux _read_lock_irqsave > > We should kill all reader/writer locks in the fastpath. reader locks are > more expensive than spinlocks unless they are going to be held for a fairly > large window. True, but I see no reason why it should have any influence here. Let's not get distracted by this in our analysis of the problem. From khc@pm.waw.pl Thu Jun 9 16:00:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 16:00:08 -0700 (PDT) Received: from khc.piap.pl (khc.piap.pl [195.187.100.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59N04Xq025459 for ; Thu, 9 Jun 2005 16:00:06 -0700 Received: by khc.piap.pl (Postfix, from userid 500) id CA56D34105; Fri, 10 Jun 2005 00:58:55 +0200 (CEST) To: Subject: TCP stalls, 2.6.12pre6 From: Krzysztof Halasa Date: Fri, 10 Jun 2005 00:58:55 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2325 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: khc@pm.waw.pl Precedence: bulk X-list: netdev Content-Length: 1559 Lines: 49 Hi, I got this three times recently, with 2.6.12 pre6 and possibly pre5 (4?) kernels on both sides: intrepid is X11-server with ssh connection to defiant (X11-forwarding, EPIC100 PCI NIC). defiant is an older notebook machine and it was running XEmacs and Firefox with ssh/X11 (cardbus DEC 21143). Both on the same Ethernet subnet. Both standard MTU etc. intrepid# netstat -to tcp 0 0 intrepid:3457 defiant:ssh ESTABLISHED keepalive (5615.71/0/0) defiant# netstat -to (not 100% sure about this data) Send-Q tcp 0 32800 defiant:ssh intrepid:3457 ESTABLISHED keepalive (20.5/0/0) Partial dump: intrepid# tcpdump -r qwe -vve port not 4840 00:08:32.603003 len 1514: IP (tos 0x10, ttl 64, id 17467, offset 0, flags [DF], proto 6, length: 1500) defiant.ssh > intrepid.3457: . [bad tcp cksum 2fa (->8fbe)!] 2999401769:2999403217(1448) ack 1597770543 win 16022 00:08:55.750326 len 158: IP (tos 0x10, ttl 64, id 13187, offset 0, flags [DF], proto 6, length: 144) intrepid.3457 > defiant.ssh: P [tcp sum ok] 1:81(80) ack 0 win 18856 00:08:55.750603 len 66: IP (tos 0x10, ttl 64, id 17469, offset 0, flags [DF], proto 6, length: 52) defiant.ssh > intrepid.3457: . [tcp sum ok] 16816:16816(0) ack 81 win 16022 Send-Q is stuck at the fixed value. Other TCP connections (including similar ones between the two hosts) are not affected. Ideas? -- Krzysztof Halasa From jgarzik@pobox.com Thu Jun 9 16:15:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 16:15:15 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59NFBXq026628 for ; Thu, 9 Jun 2005 16:15:12 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DgWDh-00050s-27; Thu, 09 Jun 2005 23:13:53 +0000 Message-ID: <42A8CD2E.7020504@pobox.com> Date: Thu, 09 Jun 2005 19:13:50 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Olivier Galibert CC: Andi Kleen , James Ketrenos , Netdev list , kernel list , "James P. Ketrenos" Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> <42A723D3.3060001@linux.intel.com> <20050609211242.GA30319@dspnet.fr.eu.org> In-Reply-To: <20050609211242.GA30319@dspnet.fr.eu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2326 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 752 Lines: 28 Olivier Galibert wrote: > On Thu, Jun 09, 2005 at 03:56:15PM +0200, Andi Kleen wrote: > >>I guess at some point we will need a file system in there, but - oops - >>we already have one, dont we? :) > > > Well, you could put .config in it too. > > Frankly, a filesystem that: > - can be somehow linked with vmlinux and not separate like an initrd > > - editable post vmlinux-linking > > - gives files that can be accessed from request_firmware, acpi and > friends even rather early in the boot process (i.e. well before any > userland is allowed to exist) > > - accessible post-boot through mounting of a special fs and/or /proc or something > > would be quite useful. This exists. It's called initramfs. Read the kernel code :) Jeff From grundler@cup.hp.com Thu Jun 9 16:27:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 16:27:15 -0700 (PDT) Received: from palrel10.hp.com (palrel10.hp.com [156.153.255.245]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59NRAXq031429 for ; Thu, 9 Jun 2005 16:27:10 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel10.hp.com (Postfix) with ESMTP id 0BF661D01; Thu, 9 Jun 2005 16:26:02 -0700 (PDT) Received: from localhost.localdomain (postfix@debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id QAA11449; Thu, 9 Jun 2005 16:20:01 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id 009068FBDE; Thu, 9 Jun 2005 16:28:30 -0700 (PDT) Date: Thu, 9 Jun 2005 16:28:30 -0700 From: Grant Grundler To: Michael Chan Cc: davem@davemloft.net, iod00d@hp.com, peterc@gelato.unsw.edu.au, netdev@oss.sgi.com Subject: Re: [PATCH] tg3: Fix link failure in 5701 Message-ID: <20050609232830.GC12434@esmail.cup.hp.com> References: <1118086942.5008.14.camel@rh4> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1118086942.5008.14.camel@rh4> User-Agent: Mutt/1.5.9i X-archive-position: 2327 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 2933 Lines: 80 On Mon, Jun 06, 2005 at 12:42:22PM -0700, Michael Chan wrote: > On some 5701 devices with older bootcode, the LED configuration bits in > SRAM may be invalid with value zero. The fix is to check for invalid > bits (0) and default to PHY 1 mode. Incorrect LED mode will lead to > error in programming the PHY. Michael, David, I confirmed this patch in fact fixes the problem on currently shipping rx7620 and rx8620 "Core LAN" cards. I expected it would but now have nice warm fuzzies that it really 100% does. One minor issue: I unloaded the unpatched tg3 v3.29 driver and then did not get a link when loaded the patched tg3 v3.30 driver. Unplugging and replugging the cable made this work. v3.30 continued to work fine after a reboot. lspci for the offending rx8620 NIC is: [root@n2 net]# lspci -vs 00:01.0 00:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethe rnet (rev 15) Subsystem: Hewlett-Packard Company HP IOX Core Lan 1000Base-T [A7109AX] Flags: 66Mhz, medium devsel, IRQ 50 Memory at 00000f0100000000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable (aka Subsystem 103c:12c1.) rx7620 Core LAN has Subsystem Device ID 0x1300 and is also expected to have this problem. (I've just now submitted an entry to pciids.sf.net) I'm told another "Core LAN" NIC from an older platform (rx5670) _may_ also have bad "boot code". But I don't have any to test with and don't know what the SubSys DevID is. BTW, I am still pushing for a recipe to update the bootcode. This is just painfully slow. Not surprising given the number of organizations involved. hth, grant > > Thanks to Grant Grundler for debugging the problem. > > >From Grant: > | In May, 2004, tg3 v3.4 changed how MAC_LED_CTRL (0x40c) was getting > | programmed and how to determine what to program into LED_CTRL. The new > | code trusted NIC_SRAM_DATA_CFG (0x00000b58) to indicate what to write > | to LED_CTRL and MII EXT_CTRL registers. On "IOX Core Lan", SRAM was > | saying MODE_MAC (0x0) and that doesn't work. > > Signed-off-by: Michael Chan > > diff -Nru led1/drivers/net/tg3.c led2/drivers/net/tg3.c > --- led1/drivers/net/tg3.c 2005-06-06 10:19:56.692541944 -0700 > +++ led2/drivers/net/tg3.c 2005-06-06 10:34:49.251852304 -0700 > @@ -8555,6 +8555,16 @@ > > case NIC_SRAM_DATA_CFG_LED_MODE_MAC: > tp->led_ctrl = LED_CTRL_MODE_MAC; > + > + /* Default to PHY_1_MODE if 0 (MAC_MODE) is > + * read on some older 5700/5701 bootcode. > + */ > + if (GET_ASIC_REV(tp->pci_chip_rev_id) == > + ASIC_REV_5700 || > + GET_ASIC_REV(tp->pci_chip_rev_id) == > + ASIC_REV_5701) > + tp->led_ctrl = LED_CTRL_MODE_PHY_1; > + > break; > > case SHASTA_EXT_LED_SHARED: > > > From mchan@broadcom.com Thu Jun 9 16:54:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 16:54:49 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j59NskXq000762 for ; Thu, 9 Jun 2005 16:54:46 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Thu, 09 Jun 2005 16:53:25 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Thu, 9 Jun 2005 16:53:24 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BDG15109; Thu, 9 Jun 2005 16:53:23 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id QAA20047; Thu, 9 Jun 2005 16:53:22 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Thu, 9 Jun 2005 23:53:21 +0000 Received: from rh4 by nt-irva-0741; 09 Jun 2005 15:55:52 -0700 Subject: Re: [PATCH] tg3: Fix link failure in 5701 From: "Michael Chan" To: "Grant Grundler" cc: davem@davemloft.net, peterc@gelato.unsw.edu.au, netdev@oss.sgi.com, netdev@vger.kernel.org In-Reply-To: <20050609232830.GC12434@esmail.cup.hp.com> References: <1118086942.5008.14.camel@rh4> <20050609232830.GC12434@esmail.cup.hp.com> Date: Thu, 09 Jun 2005 15:55:52 -0700 Message-ID: <1118357752.5838.13.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EB609FF1VO7129355-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2328 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 682 Lines: 20 On Thu, 2005-06-09 at 16:28 -0700, Grant Grundler wrote: > Michael, David, > I confirmed this patch in fact fixes the problem on currently > shipping rx7620 and rx8620 "Core LAN" cards. I expected it would > but now have nice warm fuzzies that it really 100% does. Thanks for testing, Grant. > > One minor issue: I unloaded the unpatched tg3 v3.29 driver and > then did not get a link when loaded the patched tg3 v3.30 driver. > Unplugging and replugging the cable made this work. > v3.30 continued to work fine after a reboot. > This is odd. May be the link partner is in a bad state when previously connected to unpatched tg3, and requires unplug to get out of that state. From greearb@candelatech.com Thu Jun 9 17:39:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 17:39:33 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A0dTXq003160 for ; Thu, 9 Jun 2005 17:39:29 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5A1Ca5I023325; Thu, 9 Jun 2005 18:12:37 -0700 Message-ID: <42A8E0FE.3020708@candelatech.com> Date: Thu, 09 Jun 2005 17:38:22 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "'netdev@oss.sgi.com'" CC: mchan@broadcom.com Subject: BCM5704 performance questions. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2329 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1332 Lines: 37 Hello! I have a 4-port NIC by silicom-usa.com that uses the BCM5704 (rev 10) chipset. It's running in a PCI-X bus (100 or 133Mhz). CPUs are dual xeon 2.8Ghz, 1MB cache, 1GB RAM, etc). Kernel is 2.6.11 + my hacks (no hacks to tg3, minor hacks to e1000 and other parts of the networking stacks). I am trying to bridge as much traffic as possible across two interfaces, using a proprietary kernel module. The network traffic is 1514 byte packets, generated by a modified version of pktgen running on another machine with similar hardware (Intel NICs). With the BCM NIC I can get about 600Mbps in one direction and about 800Mbps in the other..with a great deal of dropped packets. With the Intel 4-port NIC (same machine, different PCI slot, and also from Silicom-usa.com) I can get 900+Mbps in both directions with virtually no drops. So: * Is the BCM5704 chipset/driver really that much slower? * Is there some information on tuning the tg3 somewhere? (I didn't see a Documentation/networking/tg3.txt file, for instance) * Is there a way to verify the bus speed that the NIC is running at? (ethtool -d ethX gives lots of meaningless (to me) hex) Please let me know if more information would be useful. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@davemloft.net Thu Jun 9 17:56:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 17:56:08 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A0u0Xq004320 for ; Thu, 9 Jun 2005 17:56:04 -0700 Received: from localhost ([127.0.0.1]) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgXnL-0000yd-Ug; Thu, 09 Jun 2005 17:54:47 -0700 Date: Thu, 09 Jun 2005 17:54:17 -0700 (PDT) Message-Id: <20050609.175417.108740435.davem@davemloft.net> To: greearb@candelatech.com Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: BCM5704 performance questions. From: "David S. Miller" In-Reply-To: <42A8E0FE.3020708@candelatech.com> References: <42A8E0FE.3020708@candelatech.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2331 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 517 Lines: 13 From: Ben Greear Date: Thu, 09 Jun 2005 17:38:22 -0700 > I am trying to bridge as much traffic as possible across two interfaces, > using a proprietary kernel module. Ben, I'm going to just mention that I'm not going to look into your bug report. You consistently come here without a test case or setup that other developers can use to reproduce or investigate your problem. You always report things with your proprietary setup of this or that. You have our code, we don't have your's. From mchan@broadcom.com Thu Jun 9 17:55:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 17:55:17 -0700 (PDT) Received: from MMS3.broadcom.com (mms3.broadcom.com [216.31.210.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A0tCXq004243 for ; Thu, 9 Jun 2005 17:55:13 -0700 Received: from 10.10.64.121 by MMS3.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Thu, 09 Jun 2005 17:53:39 -0700 X-Server-Uuid: 35E76369-CF33-4172-911A-D1698BD5E887 Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Thu, 9 Jun 2005 17:53:48 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BDG30180; Thu, 9 Jun 2005 17:53:47 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id RAA08971; Thu, 9 Jun 2005 17:53:46 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Fri, 10 Jun 2005 00:53:49 +0000 Received: from rh4 by nt-irva-0741; 09 Jun 2005 16:56:17 -0700 Subject: Re: BCM5704 performance questions. From: "Michael Chan" To: "Ben Greear" cc: "'netdev@oss.sgi.com'" In-Reply-To: <42A8E0FE.3020708@candelatech.com> References: <42A8E0FE.3020708@candelatech.com> Date: Thu, 09 Jun 2005 16:56:16 -0700 Message-ID: <1118361376.5838.20.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EB63B191XO57232-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2330 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 764 Lines: 22 On Thu, 2005-06-09 at 17:38 -0700, Ben Greear wrote: > > * Is the BCM5704 chipset/driver really that much slower? > Unfortunately, the 5704 requires the "ONE_DMA" workaround which will limit throughput in a PCIX 100/133 bus. If you comment out the line that sets the DMA_RWCTRL_ONE_DMA flag in tg3.c, you should see improved performance. However, you may run into some DMA issues on certain systems. > * Is there some information on tuning the tg3 somewhere? > (I didn't see a Documentation/networking/tg3.txt file, for instance) > > * Is there a way to verify the bus speed that the NIC is running at? > (ethtool -d ethX gives lots of meaningless (to me) hex) > tg3 probing string for each device will tell you the bus type, width, and speed. From greearb@candelatech.com Thu Jun 9 18:21:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 18:21:47 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A1LhXq006887 for ; Thu, 9 Jun 2005 18:21:43 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5A1sm5I023776; Thu, 9 Jun 2005 18:54:48 -0700 Message-ID: <42A8EAE1.5030201@candelatech.com> Date: Thu, 09 Jun 2005 18:20:33 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <20050609.175417.108740435.davem@davemloft.net> In-Reply-To: <20050609.175417.108740435.davem@davemloft.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2332 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1464 Lines: 45 David S. Miller wrote: > From: Ben Greear > Date: Thu, 09 Jun 2005 17:38:22 -0700 > > >>I am trying to bridge as much traffic as possible across two interfaces, >>using a proprietary kernel module. > > > Ben, I'm going to just mention that I'm not going to > look into your bug report. You consistently come here > without a test case or setup that other developers can > use to reproduce or investigate your problem. You always > report things with your proprietary setup of this or that. > > You have our code, we don't have your's. Fair enough. I ran a test using pktgen to (try to) send 82kpps, 1514 byte packets between two ports on the tg3 NIC. It can do about 780Mbps in one direction, and 880Mbps in the other direction. Lots of harmless hard-start xmit errors reported (tg3 may not stop it's tx queue correctly, or maybe pktgen is screwed up since e1000 reports similar errors). Intel e1000 can do about 960Mbps in both directions. My pktgen is modified. You can find my full patch against 2.6.11 here if you so wish: http://www.candelatech.com/oss/candela_2.6.11.patch If you need the exact arguments I used to configure pktgen, I can get those for you as well. I found the Mhz printout (thanks Michael!) The tg3 NIC is in the 133Mhz slot. That probably means the intel NIC is only running at 100Mhz. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb@candelatech.com Thu Jun 9 18:25:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 18:25:57 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A1PoXq007543 for ; Thu, 9 Jun 2005 18:25:50 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5A1wv5I023821; Thu, 9 Jun 2005 18:58:58 -0700 Message-ID: <42A8EBDA.6010306@candelatech.com> Date: Thu, 09 Jun 2005 18:24:42 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Michael Chan CC: "'netdev@oss.sgi.com'" Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> In-Reply-To: <1118361376.5838.20.camel@rh4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2333 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1148 Lines: 35 Michael Chan wrote: > On Thu, 2005-06-09 at 17:38 -0700, Ben Greear wrote: > >>* Is the BCM5704 chipset/driver really that much slower? >> > Unfortunately, the 5704 requires the "ONE_DMA" workaround which will > limit throughput in a PCIX 100/133 bus. If you comment out the line that > sets the DMA_RWCTRL_ONE_DMA flag in tg3.c, you should see improved > performance. However, you may run into some DMA issues on certain > systems. Is there any way I can tell which systems are affected? It won't be an option for me to purposefully ship possibly busted drivers/hardware, but if I can be certain that my systems are immune, I will try this modification. >>* Is there some information on tuning the tg3 somewhere? >> (I didn't see a Documentation/networking/tg3.txt file, for instance) >> >>* Is there a way to verify the bus speed that the NIC is running at? >> (ethtool -d ethX gives lots of meaningless (to me) hex) >> > > > tg3 probing string for each device will tell you the bus type, width, > and speed. Thanks, I found it. Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@davemloft.net Thu Jun 9 18:30:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 18:30:54 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A1UpXq008236 for ; Thu, 9 Jun 2005 18:30:51 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgYL9-00012J-7x; Thu, 09 Jun 2005 18:29:43 -0700 Date: Thu, 09 Jun 2005 18:29:43 -0700 (PDT) Message-Id: <20050609.182943.29573731.davem@davemloft.net> To: greearb@candelatech.com Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: BCM5704 performance questions. From: "David S. Miller" In-Reply-To: <42A8EAE1.5030201@candelatech.com> References: <42A8E0FE.3020708@candelatech.com> <20050609.175417.108740435.davem@davemloft.net> <42A8EAE1.5030201@candelatech.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2334 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 784 Lines: 17 From: Ben Greear Date: Thu, 09 Jun 2005 18:20:33 -0700 > I ran a test using pktgen to (try to) send 82kpps, 1514 byte packets between > two ports on the tg3 NIC. It can do about 780Mbps in one direction, > and 880Mbps in the other direction. Lots of harmless hard-start xmit errors reported > (tg3 may not stop it's tx queue correctly, or maybe pktgen is screwed up since > e1000 reports similar errors). There is a known race on SMP with drivers using NETIF_F_LLTX that is still not fixed. It will cause the error message to be reported from the driver's ->hard_start_xmit() routine when you hit this race. The e1000 driver hits the same race, it just doesn't print any message. You're definitely on an SMP system if you are triggering that message. From mchan@broadcom.com Thu Jun 9 18:37:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 18:37:12 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A1b9Xq008943 for ; Thu, 9 Jun 2005 18:37:09 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Thu, 09 Jun 2005 18:35:13 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Thu, 9 Jun 2005 18:35:11 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BDG39286; Thu, 9 Jun 2005 18:35:10 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id SAA20961; Thu, 9 Jun 2005 18:35:10 -0700 (PDT) Received: from 10.7.18.177 ([10.7.18.177]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Fri, 10 Jun 2005 01:35:11 +0000 Received: from rh4 by nt-irva-0741; 09 Jun 2005 17:37:41 -0700 Subject: Re: BCM5704 performance questions. From: "Michael Chan" To: "Ben Greear" cc: "'netdev@oss.sgi.com'" In-Reply-To: <42A8EBDA.6010306@candelatech.com> References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> Date: Thu, 09 Jun 2005 17:37:41 -0700 Message-ID: <1118363861.5838.29.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EB631DB1VO7150403-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2335 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 1037 Lines: 23 On Thu, 2005-06-09 at 18:24 -0700, Ben Greear wrote: > Michael Chan wrote: > > > > Unfortunately, the 5704 requires the "ONE_DMA" workaround which will > > limit throughput in a PCIX 100/133 bus. If you comment out the line that > > sets the DMA_RWCTRL_ONE_DMA flag in tg3.c, you should see improved > > performance. However, you may run into some DMA issues on certain > > systems. > > Is there any way I can tell which systems are affected? It won't be > an option for me to purposefully ship possibly busted drivers/hardware, > but if I can be certain that my systems are immune, I will try this > modification. > I mentioned this so that you could verify that the slow performance was indeed caused by ONE_DMA. Even if your system is affected, it's a very subtle problem that won't show up right away and should allow you to get some performance numbers. Unfortunately, if indeed it is ONE_DMA, there is no easy way for us to tell which system is affected. And the recommendation is to turn it on for all 5704 in PCIX 100/133. From jgarzik@pobox.com Thu Jun 9 19:14:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 19:14:59 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A2EuXq012181 for ; Thu, 9 Jun 2005 19:14:57 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DgZ1m-00054f-JT; Fri, 10 Jun 2005 02:13:46 +0000 Message-ID: <42A8F758.2060008@pobox.com> Date: Thu, 09 Jun 2005 22:13:44 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: James Ketrenos CC: "David S. Miller" , pavel@ucw.cz, vda@ilport.com.ua, abonilla@linuxwireless.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> <42A8AE2A.4080104@linux.intel.com> In-Reply-To: <42A8AE2A.4080104@linux.intel.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2336 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 489 Lines: 16 James Ketrenos wrote: > I don't know if all the distributions have moved away from this model. > If they have and the devices are brought up regardless of link, then > going back to delaying radio initialization until the open() is called > is workable. When the interface is not up, we ideally want the device to be as passive as possible. Most net drivers shut down as much as possible at dev->close() time, and it would really be good if wireless drivers followed suit. Jeff From greearb@candelatech.com Thu Jun 9 19:29:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 19:29:59 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A2TuXq013287 for ; Thu, 9 Jun 2005 19:29:56 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5A3335I024547; Thu, 9 Jun 2005 20:03:03 -0700 Message-ID: <42A8FADF.90309@candelatech.com> Date: Thu, 09 Jun 2005 19:28:47 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <20050609.175417.108740435.davem@davemloft.net> <42A8EAE1.5030201@candelatech.com> <20050609.182943.29573731.davem@davemloft.net> In-Reply-To: <20050609.182943.29573731.davem@davemloft.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2337 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1035 Lines: 31 David S. Miller wrote: > From: Ben Greear > Date: Thu, 09 Jun 2005 18:20:33 -0700 > > >>I ran a test using pktgen to (try to) send 82kpps, 1514 byte packets between >>two ports on the tg3 NIC. It can do about 780Mbps in one direction, >>and 880Mbps in the other direction. Lots of harmless hard-start xmit errors reported >>(tg3 may not stop it's tx queue correctly, or maybe pktgen is screwed up since >>e1000 reports similar errors). > > > There is a known race on SMP with drivers using NETIF_F_LLTX that is > still not fixed. It will cause the error message to be reported from > the driver's ->hard_start_xmit() routine when you hit this race. Ok, it's not a big deal, since I can just retry the packet when the tx queue wakes up again. The message was coming from pktgen, btw. > You're definitely on an SMP system if you are triggering that message. Yes, dual xeon with HT as well. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From abonilla@linuxwireless.org Thu Jun 9 20:47:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 20:47:51 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A3liXq020636 for ; Thu, 9 Jun 2005 20:47:47 -0700 Received: from [192.168.1.12] ([200.91.94.134]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with ESMTP id j5A3kRnv022370; Thu, 9 Jun 2005 23:46:28 -0400 Message-ID: <42A8FF03.3010508@linuxwireless.org> Date: Thu, 09 Jun 2005 21:46:27 -0500 From: Alejandro Bonilla User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Debian/1.7.8-1 X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: James Ketrenos , "David S. Miller" , pavel@ucw.cz, vda@ilport.com.ua, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> <42A8AE2A.4080104@linux.intel.com> <42A8F758.2060008@pobox.com> In-Reply-To: <42A8F758.2060008@pobox.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2338 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev Content-Length: 1715 Lines: 47 Jeff Garzik wrote: > James Ketrenos wrote: > >> I don't know if all the distributions have moved away from this >> model. If they have and the devices are brought up regardless of >> link, then >> going back to delaying radio initialization until the open() is called >> is workable. > > > > When the interface is not up, we ideally want the device to be as > passive as possible. > > Most net drivers shut down as much as possible at dev->close() time, > and it would really be good if wireless drivers followed suit. > > Jeff > > > OK. I understand the point and I totally agree with this. We really want the adapter to just do what the user or profiles ask the adapter to do. Yes, in an ideal world. Let's talk about easyness. These adapters are in laptops. You don't want to type a lot of stop everytime you move from access points, reboots and so on. In a server enviroment with the ethernet adapters, we really just want them to do what they do and we have scripts for it. So, again, with mobile is different. An association on boot is fair and really OK. You are not really doing dhcp requests on boot and trying to get the internet from people for free. You just want you adapter running faster, get connected and get over whatever you have to do to start working or whatsoever. Let's really think what would be the nicest way that the card should behave, after all if the adapter just associates, you are not really stealing any Internet. :) Association on boot is how it has worked all the time, and in the 18 months of the project, nobody has complained about it... So... I wonder, users are happy with it? (I know it might not be the correct way) Just a thought. .Alejandro From vda@ilport.com.ua Thu Jun 9 23:58:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 09 Jun 2005 23:58:16 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5A6vrXq030707 for ; Thu, 9 Jun 2005 23:57:59 -0700 Received: (qmail 2471 invoked by alias); 10 Jun 2005 06:56:26 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 10 Jun 2005 06:56:20 -0000 From: Denis Vlasenko To: , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" Subject: Re: ipw2100: firmware problem Date: Fri, 10 Jun 2005 09:56:16 +0300 User-Agent: KMail/1.5.4 References: <002a01c56cff$fb64ba70$600cc60a@amer.sykes.com> In-Reply-To: <002a01c56cff$fb64ba70$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506100956.16031.vda@ilport.com.ua> X-archive-position: 2339 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 1597 Lines: 42 On Thursday 09 June 2005 17:31, Alejandro Bonilla wrote: > > > What is so nice about this? That Linux novice user with his new lappie > > will join a neighbor's network every time he powers up the lappie, > > even without knowing that? > > > > That will be analogous to me plugging ethernet cable into the > > switch and > > wanting it to work, without any IP addr config, even without > > DHCP client. > > Just power up the box (or modprobe an eth module) and it > > works! Cool, eh? > > > > You want things one way, I like them in another way. Whoever makes this > decision should just know that we would like to have an option to make it > load with or without the ASSOC on. But you already _have_ the option to associate. Just issue appropriate iwconfig command (or embed one in startup script). > James already said to use the options ipw2100 disable=1 if you don't want it > to associate everytime on boot. Do we have to add such option to each and every wireless driver now? That would be wrong since iwconfig already exists. > At the end, who decides this? User. As I said, with no automatic assoc at module load user still may easily attain that with iwconfig. Adding kernel level wireless autoconfiguration duplicates the effort. Since I am not going to give up a requirement to be able to stay radio silent at boot (me too wants freedom, not only you), you need to add disable=1 module parameter to each driver, which adds to the mess. ALSA does the Right Thing. Sound is completely muted out at module load. It's a user freedom to set desired volume level after that. -- vda From manfred99@gmx.ch Fri Jun 10 01:11:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 01:11:55 -0700 (PDT) Received: from mail.gmx.net (imap.gmx.net [213.165.64.20]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5A8BeXq007631 for ; Fri, 10 Jun 2005 01:11:41 -0700 Received: (qmail 20316 invoked by uid 0); 10 Jun 2005 08:10:33 -0000 Received: from 212.55.205.40 by www33.gmx.net with HTTP; Fri, 10 Jun 2005 10:10:32 +0200 (MEST) Date: Fri, 10 Jun 2005 10:10:32 +0200 (MEST) From: "Manfred Schwarb" To: Marcelo Tosatti Cc: linux-kernel@vger.kernel.org, davem@redhat.com, netdev@oss.sgi.com, herbert@gondor.apana.org.au MIME-Version: 1.0 References: <20050609150026.GA7900@logos.cnet> Subject: Re: 2.4.30-hf1 do_IRQ stack overflows X-Priority: 3 (Normal) X-Authenticated: #17170890 Message-ID: <8387.1118391032@www33.gmx.net> X-Mailer: WWW-Mail 1.6 (Global Message Exchange) X-Flags: 0001 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-archive-position: 2340 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: manfred99@gmx.ch Precedence: bulk X-list: netdev Content-Length: 10342 Lines: 211 > Hi, > > On Tue, Jun 07, 2005 at 02:38:01PM +0200, Manfred Schwarb wrote: > > > > > > > > > > Hi Manfred, > > > > > > On Wed, May 11, 2005 at 10:15:02AM +0200, Manfred Schwarb wrote: > > > > Hi, > > > > with recent versions of the 2.4 kernel (Vanilla), I get an > increasing > > > amount of do_IRQ stack overflows. > > > > This night, I got 3 of them. > > > > With 2.4.28 I got an overflow about twice a year, with 2.4.29 nearly > > > once a month and with > > > > 2.4.30 nearly every day 8-(( > > > > > > The system is getting dangerously close to an actual stack overflow, > which > > > would > > > crash the system. > > > > > > "do_IRQ: stack overflow: " indicates how many bytes are still > available. > > > > > > The traces show huge networking execution paths. > > > > > > It seems you are using some packet scheduler (CONFIG_NET_SCHED)? > Pretty > > > much all > > > traces show functions from sch_generic.c. Can you disable that for a > test? > > > > > > > Sorry to bother you again, but the problem didn't vanish completely. > > This morning, I caught another one. I built a new kernel with > > CONFIG_NET_SCHED=n as suggested, uptime is now 25 days, and the > following > > is the first do_IRQ since then (ksymoops -i): > > > > Jun 7 03:55:01 tp-meteodat7 kernel: f3238830 00000280 f49e7b80 00000000 <---------snip--------> > > [] [] [] [] > > Warning (Oops_read): Code line not seen, dumping what data is available > > Do you have the "do_IRQ stack overflow" output and the amount of bytes > left it informs? > Yes, it was a close one, 640. I append the original output to the end of this email. Thanks for looking at this. > > Trace; c010d948 > > Trace; c023a039 <---------snip--------> > > Trace; c010d948 > > I dont see any huge stack consumers on this callchain. > > David, Herbert, any clues what might be going on here? > > Jun 7 03:55:01 tp-meteodat7 kernel: do_IRQ: stack overflow: 640 Jun 7 03:55:01 tp-meteodat7 kernel: f3238830 00000280 f49e7b80 00000000 00000042 cca1388e f4116980 f17aa000 Jun 7 03:55:01 tp-meteodat7 kernel: c010d948 00000042 f4116980 00000000 cca1388e f4116980 f17aa000 00000042 Jun 7 03:55:01 tp-meteodat7 kernel: 00000018 f61d0018 ffffff14 c023a039 00000010 00000246 ee5ea480 00000000 Jun 7 03:55:01 tp-meteodat7 kernel: Call Trace: [call_do_IRQ+5/13] [skb_copy_and_csum_dev+73/256] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256445916/96] [qdisc_restart+114/432] [dev_queue_xmit+383/880] Jun 7 03:55:01 tp-meteodat7 kernel: Call Trace: [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_finish_output2+184/336] [ip_finish_output2+0/336] [ip_finish_output2+0/336] [nf_hook_slow+478/528] [ip_finish_output2+0/336] [ip_output+334/480] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_finish_output2+0/336] [ip_queue_xmit2+213/671] [ip_queue_xmit2+0/671] [ip_queue_xmit2+0/671] [nf_hook_slow+478/528] [ip_queue_xmit2+0/671] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_queue_xmit+845/1536] [ip_queue_xmit2+0/671] [tcp_v4_send_check+160/240] [tcp_transmit_skb+1001/1792] [tcp_send_ack+132/208] [tcp_rfree+0/32] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [tcp_rfree+0/32] [tcp_rcv_established+2042/2640] [tcp_v4_do_rcv+314/352] [tcp_v4_rcv+1726/1952] [ip_local_deliver_finish+351/416] [ip_local_deliver_finish+0/416] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [nf_hook_slow+478/528] [ip_local_deliver_finish+0/416] [ip_rcv_finish+0/616] [ip_local_deliver+399/576] [ip_local_deliver_finish+0/416] [ip_rcv_finish+0/616] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_rcv_finish+473/616] [ip_rcv_finish+0/616] [nf_hook_slow+478/528] [ip_rcv_finish+0/616] [ip_rcv+808/1120] [ip_rcv_finish+0/616] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [netif_receive_skb+485/544] [process_backlog+147/304] [net_rx_action+250/368] [do_softirq+118/224] [do_IRQ+244/304] [call_do_IRQ+5/13] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [set_ldt_desc+5/59] [schedule+650/1344] [schedule_timeout+84/160] [process_timeout+0/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256523927/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256524281/96] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256524459/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256530630/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256904584/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256908606/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256911957/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250526660/96] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250598640/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250526660/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250605583/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250526660/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250526660/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250602612/96] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250526660/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250525224/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250568899/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250613955/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250576507/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250526472/96] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250576301/96] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4250628604/96] [__kfree_skb+246/336] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256445940/96] [qdisc_restart+31/432] [dev_queue_xmit+383/880] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_finish_output2+184/336] [__kfree_skb+246/336] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256445940/96] [__kfree_skb+246/336] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256445940/96] [qdisc_restart+31/432] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [dev_queue_xmit+383/880] [__kfree_skb+246/336] [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256445940/96] [qdisc_restart+31/432] [dev_queue_xmit+383/880] [ip_finish_output2+184/336] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_finish_output2+0/336] [ip_finish_output2+0/336] [nf_hook_slow+478/528] [ip_finish_output2+0/336] [ip_output+334/480] [ip_finish_output2+0/336] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_queue_xmit2+213/671] [sock_def_readable+99/128] [tcp_rfree+0/32] [tcp_rfree+0/32] [tcp_rcv_established+1981/2640] [tcp_v4_do_rcv+314/352] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [tcp_v4_rcv+1726/1952] [ip_local_deliver_finish+351/416] [ip_local_deliver_finish+0/416] [nf_hook_slow+478/528] [ip_local_deliver_finish+0/416] [ip_rcv_finish+0/616] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_local_deliver+399/576] [ip_local_deliver_finish+0/416] [ip_rcv_finish+0/616] [ip_rcv_finish+473/616] [ip_rcv_finish+0/616] [nf_hook_slow+478/528] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [ip_rcv_finish+0/616] [__alloc_pages+100/640] [poll_freewait+68/80] [do_select+569/592] [sys_select+660/1248] [sys_ioctl+677/785] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] [] [] [] [] Jun 7 03:55:01 tp-meteodat7 kernel: [sys_time+21/80] [system_call+51/56] Jun 7 03:55:01 tp-meteodat7 kernel: [] [] In my previous posting, I snipped /var/log/messages at the wrong place, and zapped the last line. This line results in two additional lines of the ksymoops output: Trace; c0124bb5 Trace; c0108fa7 -- Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis ++ Jetzt anmelden & testen ++ http://www.gmx.net/de/go/promail ++ From zoran.bosic@ericsson.com Fri Jun 10 01:28:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 01:28:22 -0700 (PDT) Received: from mailgw3.ericsson.se (mailgw3.ericsson.se [193.180.251.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A8SJXq008782 for ; Fri, 10 Jun 2005 01:28:19 -0700 Received: from localhost (localhost.localdomain [127.0.0.1]) by delivery_mailgw3.ericsson.se (Symantec Mail Security) with ESMTP id 56932D1F for ; Fri, 10 Jun 2005 10:27:10 +0200 (CEST) Received: from esealmw127.eemea.ericsson.se (unknown [153.88.254.122]) by outbound_mailgw3.ericsson.se (Symantec Mail Security) with ESMTP id 4D2F1D1C for ; Fri, 10 Jun 2005 10:27:10 +0200 (CEST) Received: from esealmw127.eemea.ericsson.se ([153.88.254.175]) by esealmw127.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.211); Fri, 10 Jun 2005 10:27:09 +0200 Received: from ehrzgmw300.eemea.ericsson.se ([159.107.224.60]) by esealmw127.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.211); Fri, 10 Jun 2005 10:27:08 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Subject: Date: Fri, 10 Jun 2005 10:27:06 +0200 Message-ID: <9619FF757C08744C9D0772E2658DBD011E1AD3@ehrzgmw300.eemea.ericsson.se> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Index: AcVtli+cEj7HMRvJRUuRooMozA1ytQ== From: "Zoran Bosic (ZG/ETK)" To: X-OriginalArrivalTime: 10 Jun 2005 08:27:09.0343 (UTC) FILETIME=[3187C2F0:01C56D96] X-Brightmail-Tracker: AAAAAA== Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5A8SJXq008782 X-archive-position: 2341 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: zoran.bosic@ericsson.com Precedence: bulk X-list: netdev Content-Length: 1 Lines: 1 From pavel@ucw.cz Fri Jun 10 02:01:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 02:01:43 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5A91aXq011041 for ; Fri, 10 Jun 2005 02:01:38 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 181028B8CA; Fri, 10 Jun 2005 11:00:23 +0200 (CEST) Date: Fri, 10 Jun 2005 11:00:23 +0200 From: Pavel Machek To: Alejandro Bonilla Cc: Jeff Garzik , James Ketrenos , "David S. Miller" , vda@ilport.com.ua, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem Message-ID: <20050610090022.GF4173@elf.ucw.cz> References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> <42A8AE2A.4080104@linux.intel.com> <42A8F758.2060008@pobox.com> <42A8FF03.3010508@linuxwireless.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42A8FF03.3010508@linuxwireless.org> X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2342 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 469 Lines: 13 Hi! > OK. I understand the point and I totally agree with this. We really want > the adapter to just do what the user or profiles ask the adapter to do. > Yes, in an ideal world. > > Let's talk about easyness. These adapters are in laptops. You don't want > to type a lot of stop everytime you move from access points, reboots > and We are not trying to make it hard to the users. Lets do the right thing in kernel, and let userspace make it easy. Pavel From john@stoffel.org Fri Jun 10 06:02:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 06:02:13 -0700 (PDT) Received: from mxsf24.cluster1.charter.net (mxsf24.cluster1.charter.net [209.225.28.224]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AD25Xq030628 for ; Fri, 10 Jun 2005 06:02:06 -0700 Received: from mxip10a.cluster1.charter.net (mxip10a.cluster1.charter.net [209.225.28.140]) by mxsf24.cluster1.charter.net (8.12.11/8.12.11) with ESMTP id j5AD0vhb009524 for ; Fri, 10 Jun 2005 09:00:57 -0400 Received: from 24-241-23-121.dhcp.oxfr.ma.charter.com (HELO jfsnew) (24.241.23.121) by mxip10a.cluster1.charter.net with ESMTP; 10 Jun 2005 09:00:55 -0400 X-IronPort-AV: i="3.93,189,1115006400"; d="scan'208"; a="989869222:sNHT707798202" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17065.36616.214742.580727@smtp.charter.net> Date: Fri, 10 Jun 2005 09:00:56 -0400 From: "John Stoffel" To: Pavel Machek Cc: Alejandro Bonilla , Jeff Garzik , James Ketrenos , "David S. Miller" , vda@ilport.com.ua, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, ipw2100-admin@linux.intel.com Subject: Re: ipw2100: firmware problem In-Reply-To: <20050610090022.GF4173@elf.ucw.cz> References: <200506090909.55889.vda@ilport.com.ua> <20050608.231657.59660080.davem@davemloft.net> <20050609104205.GD3169@elf.ucw.cz> <20050609.125324.88476545.davem@davemloft.net> <42A8AE2A.4080104@linux.intel.com> <42A8F758.2060008@pobox.com> <42A8FF03.3010508@linuxwireless.org> <20050610090022.GF4173@elf.ucw.cz> X-Mailer: VM 7.19 under Emacs 21.4.1 X-archive-position: 2344 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: john@stoffel.org Precedence: bulk X-list: netdev Content-Length: 451 Lines: 13 I'd like to chime in here and say that from my point of view, not enabling the wireless network adaptor until asked by userspace is the way to go. It reduces power requirements, and it pushes the configuration details out to userspace, where they can be handled according to the policy setup by the distro/user. Having my latop bootup and turn on the wireless card and join an AP without my explicity asking is a bad thing to have happen. John From abonilla@linuxwireless.org Fri Jun 10 06:24:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 06:25:00 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5ADOsXq032462 for ; Fri, 10 Jun 2005 06:24:56 -0700 Received: from WCRSJO2KPAB047 ([200.9.49.66]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with SMTP id j5ADNnCY018307; Fri, 10 Jun 2005 09:23:49 -0400 Reply-To: From: "Alejandro Bonilla" To: "'Denis Vlasenko'" , , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" Subject: RE: ipw2100: firmware problem Date: Fri, 10 Jun 2005 07:23:34 -0600 Message-ID: <000f01c56dbf$9b15de90$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <200506100956.16031.vda@ilport.com.ua> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478 Importance: Normal X-archive-position: 2345 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev Content-Length: 1650 Lines: 43 > > Adding kernel level wireless autoconfiguration duplicates the effort. > Since I am not going to give up a requirement to be able to stay radio > silent at boot (me too wants freedom, not only you), you need to add > disable=1 module parameter to each driver, which adds to the mess. > > ALSA does the Right Thing. Sound is completely muted out at > module load. > It's a user freedom to set desired volume level after that. Yeah right. I remember I had to google for 10 minutes to find the answer for this one. Why would you install something, for it to not work? It thing of Mute in ALSA is stupid. If you want Sound, you install the Sound and enable it. Why would it make you google for more things to do? ALSA mute on install is WAY way, not OK. You *will* have to use a How-To with ALSA, nobody knows that your sound would be off because some people decided it. But this is out of the Topic. I agree with you all, but as I mentioned in a more current email, this is a laptop, not a server. Things behave differently and you want things faster. (Yes, I could have a script) What I'm saying, is that just as ALSA, you will have to google even more just to be able to look for the boot param for the driver for it to ASSOC on boot like the Original drive does. Instead, if you simply don't want to associate then turn off the Radio. It's a simple FN+F2 or depends on your laptop. Let's not make this a bigger thread, just decide and then do it that way. I'm looking at this on the side of a supporter, seeing the emails from users... "how do I make it behave as it was before" "it won't assoc on boot anymore" .Alejandro > -- > vda > From abonilla@linuxwireless.org Fri Jun 10 06:34:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 06:34:45 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5ADYcXq000835 for ; Fri, 10 Jun 2005 06:34:39 -0700 Received: from WCRSJO2KPAB047 ([200.9.49.66]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with SMTP id j5ADXNd2010018; Fri, 10 Jun 2005 09:33:23 -0400 Reply-To: From: "Alejandro Bonilla" To: "'Pavel Machek'" Cc: "'Jeff Garzik'" , "'James Ketrenos'" , "'David S. Miller'" , , , Subject: RE: ipw2100: firmware problem Date: Fri, 10 Jun 2005 07:33:08 -0600 Message-ID: <001001c56dc0$f12b6ab0$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <20050610090022.GF4173@elf.ucw.cz> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478 Importance: Normal X-archive-position: 2346 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev Content-Length: 529 Lines: 21 > Hi! > > > OK. I understand the point and I totally agree with this. > We really want > > the adapter to just do what the user or profiles ask the > adapter to do. > > Yes, in an ideal world. > > > > Let's talk about easyness. These adapters are in laptops. > You don't want > > to type a lot of stop everytime you move from access points, reboots > > and > > We are not trying to make it hard to the users. Lets do the right > thing in kernel, and let userspace make it easy. > Pavel Pavel, Agreed then. From linux-netdev@gmane.org Fri Jun 10 07:15:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 07:15:33 -0700 (PDT) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AEFQXq003197 for ; Fri, 10 Jun 2005 07:15:27 -0700 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1DgkAO-0002qn-Hy for netdev@oss.sgi.com; Fri, 10 Jun 2005 16:07:24 +0200 Received: from 69.15.40.50 ([69.15.40.50]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 10 Jun 2005 16:07:24 +0200 Received: from lunz by 69.15.40.50 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 10 Jun 2005 16:07:24 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: netdev@oss.sgi.com From: Jason Lunz Subject: Re: BCM5704 performance questions. Date: Fri, 10 Jun 2005 14:03:21 +0000 (UTC) Organization: PBR Streetgang Message-ID: References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 69.15.40.50 User-Agent: slrn/0.9.8.1 (Debian) X-archive-position: 2347 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: lunz@falooley.org Precedence: bulk X-list: netdev Content-Length: 2011 Lines: 54 mchan@broadcom.com said: > On Thu, 2005-06-09 at 17:38 -0700, Ben Greear wrote: > >> >> * Is the BCM5704 chipset/driver really that much slower? >> > > Unfortunately, the 5704 requires the "ONE_DMA" workaround which will > limit throughput in a PCIX 100/133 bus. If you comment out the line that > sets the DMA_RWCTRL_ONE_DMA flag in tg3.c, you should see improved > performance. However, you may run into some DMA issues on certain > systems. > >> * Is there some information on tuning the tg3 somewhere? >> (I didn't see a Documentation/networking/tg3.txt file, for instance) >> >> * Is there a way to verify the bus speed that the NIC is running at? >> (ethtool -d ethX gives lots of meaningless (to me) hex) >> > > tg3 probing string for each device will tell you the bus type, width, > and speed. The patch below from http://article.gmane.org/gmane.linux.network/18734 does this too for e1000. Lennert Buytenhek posted it a while back. Jason diff -urpN linux-pf/drivers/net/e1000/e1000_main.c linux-bs/drivers/net/e1000/e1000_main.c --- linux-pf/drivers/net/e1000/e1000_main.c Fri Apr 8 15:06:34 2005 +++ linux-bs/drivers/net/e1000/e1000_main.c Fri Apr 8 15:29:34 2005 @@ -617,6 +617,21 @@ e1000_probe(struct pci_dev *pdev, if(eeprom_data & eeprom_apme_mask) adapter->wol |= E1000_WUFC_MAG; + /* print bus type/speed/width info */ + printk(KERN_INFO "%s: e1000 (PCI%s:%s:%s) ", netdev->name, + ((adapter->hw.bus_type == e1000_bus_type_pcix) ? "X" : ""), + ((adapter->hw.bus_speed == e1000_bus_speed_133) ? "133MHz" : + (adapter->hw.bus_speed == e1000_bus_speed_120) ? "120MHz" : + (adapter->hw.bus_speed == e1000_bus_speed_100) ? "100MHz" : + (adapter->hw.bus_speed == e1000_bus_speed_66) ? "66MHz" : + "33MHz"), + ((adapter->hw.bus_width == e1000_bus_width_64) ? "64-bit" : + "32-bit")); + + for (i = 0; i < 6; i++) + printk("%2.2x%c", netdev->dev_addr[i], + i == 5 ? '\n' : ':'); + /* reset the hardware with the new settings */ e1000_reset(adapter); From linville@bilbo.tuxdriver.com Fri Jun 10 07:28:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 07:28:28 -0700 (PDT) Received: from apollo.tuxdriver.com (apollo.tuxdriver.com [24.172.12.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AESGXq004188 for ; Fri, 10 Jun 2005 07:28:19 -0700 Received: from bilbo.tuxdriver.com (azure.tuxdriver.com [24.172.12.5]) by apollo.tuxdriver.com (8.12.11/8.12.11) with ESMTP id j5ADPQIM021123; Fri, 10 Jun 2005 09:25:26 -0400 Received: from bilbo.tuxdriver.com (localhost.localdomain [127.0.0.1]) by bilbo.tuxdriver.com (8.13.1/8.13.1) with ESMTP id j5AER3jU023443; Fri, 10 Jun 2005 10:27:03 -0400 Received: (from linville@localhost) by bilbo.tuxdriver.com (8.13.1/8.13.1/Submit) id j5AER3SQ023442; Fri, 10 Jun 2005 10:27:03 -0400 Date: Fri, 10 Jun 2005 10:27:02 -0400 From: "John W. Linville" To: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Cc: akpm@osdl.org, jgarzik@pobox.com Subject: [patch 2.6.12-rc6] 3c59x: remove superfluous vortex_debug test from boomerang_start_xmit Message-ID: <20050610142702.GC10449@tuxdriver.com> Mail-Followup-To: netdev@oss.sgi.com, linux-kernel@vger.kernel.org, akpm@osdl.org, jgarzik@pobox.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-archive-position: 2348 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: linville@tuxdriver.com Precedence: bulk X-list: netdev Content-Length: 908 Lines: 28 Remove the superfluous test of "if (vortex_debug > 3)" inside the "if (vortex_debug > 6)" clause early in boomerang_start_xmit. Signed-off-by: John W. Linville --- I stumbled across this while looking at something else... drivers/net/3c59x.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c --- a/drivers/net/3c59x.c +++ b/drivers/net/3c59x.c @@ -2202,9 +2202,8 @@ boomerang_start_xmit(struct sk_buff *skb if (vortex_debug > 6) { printk(KERN_DEBUG "boomerang_start_xmit()\n"); - if (vortex_debug > 3) - printk(KERN_DEBUG "%s: Trying to send a packet, Tx index %d.\n", - dev->name, vp->cur_tx); + printk(KERN_DEBUG "%s: Trying to send a packet, Tx index %d.\n", + dev->name, vp->cur_tx); } if (vp->cur_tx - vp->dirty_tx >= TX_RING_SIZE) { -- John W. Linville linville@tuxdriver.com From rlrevell@joe-job.com Fri Jun 10 13:26:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 13:26:45 -0700 (PDT) Received: from mustang.oldcity.dca.net (mustang.oldcity.dca.net [216.158.38.3]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5AKQeXq009679 for ; Fri, 10 Jun 2005 13:26:41 -0700 Received: (qmail 19550 invoked from network); 10 Jun 2005 20:25:32 -0000 Received: from unknown (HELO ?192.168.0.55?) (216.158.29.193) by mustang with SMTP; 10 Jun 2005 20:25:32 -0000 Subject: RE: ipw2100: firmware problem From: Lee Revell To: abonilla@linuxwireless.org Cc: "'Denis Vlasenko'" , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" In-Reply-To: <000f01c56dbf$9b15de90$600cc60a@amer.sykes.com> References: <000f01c56dbf$9b15de90$600cc60a@amer.sykes.com> Content-Type: text/plain Date: Fri, 10 Jun 2005 16:26:27 -0400 Message-Id: <1118435188.6423.26.camel@mindpipe> Mime-Version: 1.0 X-Mailer: Evolution 2.3.1 Content-Transfer-Encoding: 7bit X-archive-position: 2349 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rlrevell@joe-job.com Precedence: bulk X-list: netdev Content-Length: 954 Lines: 23 On Fri, 2005-06-10 at 07:23 -0600, Alejandro Bonilla wrote: > > > > Adding kernel level wireless autoconfiguration duplicates the effort. > > Since I am not going to give up a requirement to be able to stay radio > > silent at boot (me too wants freedom, not only you), you need to add > > disable=1 module parameter to each driver, which adds to the mess. > > > > ALSA does the Right Thing. Sound is completely muted out at > > module load. > > It's a user freedom to set desired volume level after that. > > Yeah right. I remember I had to google for 10 minutes to find the answer for > this one. Why would you install something, for it to not work? > > It thing of Mute in ALSA is stupid. If you want Sound, you install the Sound > and enable it. Why would it make you google for more things to do? ALSA mute > on install is WAY way, not OK. It took you 10 minutes of googling before you thought to try the mixer? Sorry dude, this is PEBKAC. Lee From abonilla@linuxwireless.org Fri Jun 10 14:01:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 14:01:28 -0700 (PDT) Received: from linuxwireless.org.ve.carpathiahost.net (linuxwireless.org.ve.carpathiahost.net [66.117.45.234]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AL1PXq011940 for ; Fri, 10 Jun 2005 14:01:25 -0700 Received: from WCRSJO2KPAB047 ([200.9.49.66]) by linuxwireless.org.ve.carpathiahost.net (8.12.10/8.12.10) with SMTP id j5AL0KJb019682; Fri, 10 Jun 2005 17:00:20 -0400 Reply-To: From: "Alejandro Bonilla" To: "'Lee Revell'" Cc: "'Netdev list'" , "'kernel list'" Subject: RE: ipw2100: firmware problem Date: Fri, 10 Jun 2005 15:00:13 -0600 Message-ID: <003001c56dff$662fe4b0$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.6604 (9.0.2911.0) In-Reply-To: <1118435188.6423.26.camel@mindpipe> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1478 Importance: Normal X-archive-position: 2350 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: abonilla@linuxwireless.org Precedence: bulk X-list: netdev Content-Length: 740 Lines: 23 > > It thing of Mute in ALSA is stupid. If you want Sound, you > install the Sound > > and enable it. Why would it make you google for more things > to do? ALSA mute > > on install is WAY way, not OK. > > It took you 10 minutes of googling before you thought to try > the mixer? > Sorry dude, this is PEBKAC. > > Lee Riiiight. It could be. Or it could be that no where in the world I have seen something where the device would be disabled by default without notifying the user. Why would you Mute the driver? Is the driver that bad, that the developers would rather Mute the sound card, just in case if the sound cards starts making noises and shit when the driver is loaded? You are moving to another topic. Let's drop it. .Alejandro From rlrevell@joe-job.com Fri Jun 10 14:07:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 14:07:33 -0700 (PDT) Received: from mustang.oldcity.dca.net (mustang.oldcity.dca.net [216.158.38.3]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5AL7TXq012724 for ; Fri, 10 Jun 2005 14:07:30 -0700 Received: (qmail 24264 invoked from network); 10 Jun 2005 21:06:22 -0000 Received: from unknown (HELO ?192.168.0.55?) (216.158.29.193) by mustang with SMTP; 10 Jun 2005 21:06:22 -0000 Subject: RE: ipw2100: firmware problem From: Lee Revell To: abonilla@linuxwireless.org Cc: "'Netdev list'" , "'kernel list'" In-Reply-To: <003001c56dff$662fe4b0$600cc60a@amer.sykes.com> References: <003001c56dff$662fe4b0$600cc60a@amer.sykes.com> Content-Type: text/plain Date: Fri, 10 Jun 2005 17:07:18 -0400 Message-Id: <1118437639.6423.65.camel@mindpipe> Mime-Version: 1.0 X-Mailer: Evolution 2.3.1 Content-Transfer-Encoding: 7bit X-archive-position: 2351 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rlrevell@joe-job.com Precedence: bulk X-list: netdev Content-Length: 1299 Lines: 36 On Fri, 2005-06-10 at 15:00 -0600, Alejandro Bonilla wrote: > > > It thing of Mute in ALSA is stupid. If you want Sound, you > > install the Sound > > > and enable it. Why would it make you google for more things > > to do? ALSA mute > > > on install is WAY way, not OK. > > > > It took you 10 minutes of googling before you thought to try > > the mixer? > > Sorry dude, this is PEBKAC. > > > > Lee > > Riiiight. It could be. Or it could be that no where in the world I have seen > something where the device would be disabled by default without notifying > the user. Why would you Mute the driver? Is the driver that bad, that the > developers would rather Mute the sound card, just in case if the sound cards > starts making noises and shit when the driver is loaded? > Userspace should handle it, doing this in the kernel is bloat. My Debian system initializes the mixer settings to a sane state just fine when the alsasound init script is run. Maybe you need a better distro. Users who compile ALSA from source are expected to know what they are doing. And, if you watch the "make install" output, it prints a big fat warning that all mixer controls are muted by default. > You are moving to another topic. Let's drop it. Agreed, but it was your OT rant that changed the topic... Lee From greearb@candelatech.com Fri Jun 10 14:10:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 14:10:15 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5ALAAXq013336 for ; Fri, 10 Jun 2005 14:10:10 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5ALhL5I006330; Fri, 10 Jun 2005 14:43:21 -0700 Message-ID: <42AA016C.9050801@candelatech.com> Date: Fri, 10 Jun 2005 14:09:00 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Michael Chan CC: "'netdev@oss.sgi.com'" Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> <1118363861.5838.29.camel@rh4> In-Reply-To: <1118363861.5838.29.camel@rh4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2352 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1273 Lines: 35 Michael Chan wrote: > On Thu, 2005-06-09 at 18:24 -0700, Ben Greear wrote: > >>Michael Chan wrote: >> >>>Unfortunately, the 5704 requires the "ONE_DMA" workaround which will >>>limit throughput in a PCIX 100/133 bus. If you comment out the line that >>>sets the DMA_RWCTRL_ONE_DMA flag in tg3.c, you should see improved >>>performance. However, you may run into some DMA issues on certain >>>systems. >> >>Is there any way I can tell which systems are affected? It won't be >>an option for me to purposefully ship possibly busted drivers/hardware, >>but if I can be certain that my systems are immune, I will try this >>modification. >> > > I mentioned this so that you could verify that the slow performance was > indeed caused by ONE_DMA. Even if your system is affected, it's a very > subtle problem that won't show up right away and should allow you to get > some performance numbers. I commented out the code and ran the pktgen test again. It may be a small bit better, but not much: 770Mbps in one direction, 750Mbps in the other. Have you done any tests with 2 tg3 NICs in a single machine to see if they can run at or near line speed (full duplex)? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From rick.jones2@hp.com Fri Jun 10 14:35:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 14:35:08 -0700 (PDT) Received: from palrel12.hp.com (palrel12.hp.com [156.153.255.237]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5ALZ4Xq014827 for ; Fri, 10 Jun 2005 14:35:04 -0700 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel12.hp.com (Postfix) with ESMTP id 0FF0C4028ED; Fri, 10 Jun 2005 14:33:57 -0700 (PDT) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id OAA13131; Fri, 10 Jun 2005 14:33:56 -0700 (PDT) Message-ID: <42AA0743.1020101@hp.com> Date: Fri, 10 Jun 2005 14:33:55 -0700 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ben Greear Cc: "'netdev@oss.sgi.com'" Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> <1118363861.5838.29.camel@rh4> <42AA016C.9050801@candelatech.com> In-Reply-To: <42AA016C.9050801@candelatech.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2353 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev Content-Length: 1084 Lines: 22 > Have you done any tests with 2 tg3 NICs in a single machine to see if they > can run at or near line speed (full duplex)? It isn't just a question of two tg3 NICs in the same box is it? You are running two NICs on the same bus right? And unless my dimm memory is mistaken, four ports on a card with 5704s means two 5704's a bridge chip right? So, it would be two tg3 NICs going through the same bridge chip, not just the same bus or same system. I'd be worrying about DMA latencies on the system and the bridge chip, and perhaps the efficiency of the PCI-X bus usage (not sure - is there anything in your system's chipset to extract that sort of information?) What happens when you turn pktgen around/insideout and source packets from the bridging system to each of the (two other?) systems? Since you are bridging, does having CKO enabled really matter? Mightn't that allow the firmware on the 5704(s) to run a triffle faster? Or does bridging already not request CKO (I suppose it might). Are your interface interrupts distributed across the CPUs? rick jones From greearb@candelatech.com Fri Jun 10 14:57:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 14:57:58 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5ALvsXq015969 for ; Fri, 10 Jun 2005 14:57:54 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5AMV65I006926; Fri, 10 Jun 2005 15:31:07 -0700 Message-ID: <42AA0C9D.2060006@candelatech.com> Date: Fri, 10 Jun 2005 14:56:45 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Rick Jones CC: "'netdev@oss.sgi.com'" Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> <1118363861.5838.29.camel@rh4> <42AA016C.9050801@candelatech.com> <42AA0743.1020101@hp.com> In-Reply-To: <42AA0743.1020101@hp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2354 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 2862 Lines: 62 Rick Jones wrote: > >> Have you done any tests with 2 tg3 NICs in a single machine to see if >> they >> can run at or near line speed (full duplex)? > > > It isn't just a question of two tg3 NICs in the same box is it? You are > running two NICs on the same bus right? And unless my dimm memory is > mistaken, four ports on a card with 5704s means two 5704's a bridge chip > right? So, it would be two tg3 NICs going through the same bridge chip, > not just the same bus or same system. I'd be worrying about DMA > latencies on the system and the bridge chip, and perhaps the efficiency > of the PCI-X bus usage (not sure - is there anything in your system's > chipset to extract that sort of information?) There will be a bridge chip, and indeed I see better performance when I just use a 2-port Intel NIC as opposed to a 4 port, even if I am only actively using 2 of the 4 ports on the 4-port NIC. For the tg3 hardware I only have a 4-port NIC. I do assume that a 2-port tg3 NIC w/out a bridge chip would be faster..but probably not too much. > What happens when you turn pktgen around/insideout and source packets > from the bridging system to each of the (two other?) systems? I looped two ports on the same NIC together for the pktgen tests, so there is only a single machine in question. With Intel I can source/sink about 960Mbps on two ports simultaneously in this configuration. With the tg3 NIC I can only do about 750Mbps. And, the tg3 is in the faster PCI-X slot (133Mhz v/s 100Mhz). So, to me it appears that the tg3 hardware and/or driver can only handle about 80% of the performance that the intel e1000 can produce. It's possible I have a particularly sub-optimal configuration for tg3, or maybe a poorly designed NIC, which is why I'd like to know what others see... > Since you are bridging, does having CKO enabled really matter? Mightn't > that allow the firmware on the 5704(s) to run a triffle faster? Or does > bridging already not request CKO (I suppose it might). CKO == IP checksum offload? Since Dave doesn't want to debug my bridge setup (and I don't blame him), I am going to try to focus my testing/debug reports on the pktgen tests. If/when pktgen shows better performance with tg3, I can verify that I see the same speedups with my proprietary bridging module. I've no idea if CKO would help or hinder pktgen, nor have I tried to enable or disable it. > Are your interface interrupts distributed across the CPUs? I'm using FC2, basically a default install. It does seem to have an irq balance daemon running. But, I'm not specifically binding IRQs or anything like that. pktgen tx is running as a single thread, so the rx code could run mostly on the other CPU if locking allows... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From rick.jones2@hp.com Fri Jun 10 15:04:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 15:04:14 -0700 (PDT) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AM4BXq016837 for ; Fri, 10 Jun 2005 15:04:11 -0700 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel11.hp.com (Postfix) with ESMTP id 141BD4616; Fri, 10 Jun 2005 15:03:04 -0700 (PDT) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id PAA13293; Fri, 10 Jun 2005 15:03:03 -0700 (PDT) Message-ID: <42AA0E17.8050201@hp.com> Date: Fri, 10 Jun 2005 15:03:03 -0700 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ben Greear Cc: "'netdev@oss.sgi.com'" Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> <1118363861.5838.29.camel@rh4> <42AA016C.9050801@candelatech.com> <42AA0743.1020101@hp.com> <42AA0C9D.2060006@candelatech.com> In-Reply-To: <42AA0C9D.2060006@candelatech.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2355 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev Content-Length: 1327 Lines: 35 > There will be a bridge chip, and indeed I see better performance when I just > use a 2-port Intel NIC as opposed to a 4 port, even if I am only actively > using 2 of the 4 ports on the 4-port NIC. For the tg3 hardware I only have a > 4-port NIC. I do assume that a 2-port tg3 NIC w/out a bridge chip would be > faster..but probably not too much. I have been taught by several wise old engineers that the proper spelling of assume is ass-u-me :) Bridge chips can in theory do all sorts of nasty things to performance. > CKO == IP checksum offload? Yes. > Since Dave doesn't want to debug my bridge setup (and I don't blame him), I > am going to try to focus my testing/debug reports on the pktgen tests. > If/when pktgen shows better performance with tg3, I can verify that I see the > same speedups with my proprietary bridging module. I've no idea if CKO would > help or hinder pktgen, nor have I tried to enable or disable it. > >> Are your interface interrupts distributed across the CPUs? > > > I'm using FC2, basically a default install. It does seem to have an irq > balance daemon running. But, I'm not specifically binding IRQs or anything > like that. pktgen tx is running as a single thread, so the rx code could run > mostly on the other CPU if locking allows... again, never ass-u-me. rick From mchan@broadcom.com Fri Jun 10 15:15:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 15:15:52 -0700 (PDT) Received: from MMS1.broadcom.com (mms1.broadcom.com [216.31.210.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AMFlXq017755 for ; Fri, 10 Jun 2005 15:15:49 -0700 Received: from 10.10.64.121 by MMS1.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Fri, 10 Jun 2005 15:14:12 -0700 X-Server-Uuid: 146C3151-C1DE-4F71-9D02-C3BE503878DD Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Fri, 10 Jun 2005 15:14:11 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BDL06367; Fri, 10 Jun 2005 15:14:11 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id PAA14928; Fri, 10 Jun 2005 15:14:11 -0700 (PDT) Received: from 10.7.17.4 ([10.7.17.4]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Fri, 10 Jun 2005 22:14:10 +0000 Received: from rh4 by nt-irva-0741; 10 Jun 2005 14:16:44 -0700 Subject: Re: BCM5704 performance questions. From: "Michael Chan" To: "Ben Greear" cc: "'netdev@oss.sgi.com'" In-Reply-To: <42AA016C.9050801@candelatech.com> References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> <1118363861.5838.29.camel@rh4> <42AA016C.9050801@candelatech.com> Date: Fri, 10 Jun 2005 14:16:43 -0700 Message-ID: <1118438203.5294.7.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EB4CF3E2V0481948-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2356 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 688 Lines: 18 On Fri, 2005-06-10 at 14:09 -0700, Ben Greear wrote: > I commented out the code and ran the pktgen test again. It may be a small > bit better, but not much: 770Mbps in one direction, 750Mbps in the other. > The latest tg3 driver will print out the dma_rwctrl at probe time. Can you check the value to make sure ONE_DMA is disabled? Bit 14 sets ONE_DMA. > Have you done any tests with 2 tg3 NICs in a single machine to see if they > can run at or near line speed (full duplex)? > Yes, we have years ago but not with the tg3 driver. We set up the 5704 to bridge (or route, I don't remember) one port to the other. It cannot bridge at line-rate with the ONE_DMA workaround enabled. From greearb@candelatech.com Fri Jun 10 15:26:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 15:26:19 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AMQAXq018575 for ; Fri, 10 Jun 2005 15:26:10 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5AMxL5I007212; Fri, 10 Jun 2005 15:59:21 -0700 Message-ID: <42AA133C.2000009@candelatech.com> Date: Fri, 10 Jun 2005 15:25:00 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Rick Jones CC: "'netdev@oss.sgi.com'" Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> <1118363861.5838.29.camel@rh4> <42AA016C.9050801@candelatech.com> <42AA0743.1020101@hp.com> <42AA0C9D.2060006@candelatech.com> <42AA0E17.8050201@hp.com> In-Reply-To: <42AA0E17.8050201@hp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2357 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 2005 Lines: 52 Rick Jones wrote: > >> There will be a bridge chip, and indeed I see better performance when >> I just use a 2-port Intel NIC as opposed to a 4 port, even if I am >> only actively using 2 of the 4 ports on the 4-port NIC. For the tg3 >> hardware I only have a >> 4-port NIC. I do assume that a 2-port tg3 NIC w/out a bridge chip >> would be >> faster..but probably not too much. > > > I have been taught by several wise old engineers that the proper > spelling of assume is ass-u-me :) > > Bridge chips can in theory do all sorts of nasty things to performance. Sure...but the end result is that I need 2 port NICs and I need 4 port NICs. The 4-port ones need a bridge, so I'm stuck with a bridge. The Intel 4-port with a bridge works OK, the tg3 not so good. If someone else has a 2-port BCM NIC that handles full line speed, then that would be a good data point, but I'm unlikely to purchase one just to satisfy my curiousity. If someone wants to send me one, I'll happily stick it in my system and report the results. >> I'm using FC2, basically a default install. It does seem to have an irq >> balance daemon running. But, I'm not specifically binding IRQs or >> anything >> like that. pktgen tx is running as a single thread, so the rx code >> could run >> mostly on the other CPU if locking allows... > > again, never ass-u-me. I'm not assuming anything here...just reporting the setup. Truth is, the e1000 works really well for my application in most configurations. It was interesting for me to learn that some folks are getting very good tg3 performance for TCP transfers when the e1000 was dropping frames (see thread from the last week or so). So, I was a little supprised that I did not see such good tg3 numbers. If the answer is that the tg3 just can't do it, no shame there...but if my testing can help the tg3 driver improve, I will try to do my part. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From greearb@candelatech.com Fri Jun 10 15:36:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 15:36:49 -0700 (PDT) Received: from www.lanforge.com (ns1.lanforge.com [66.165.47.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AMaiXq019546 for ; Fri, 10 Jun 2005 15:36:44 -0700 Received: from [71.112.207.80] (pool-71-112-207-80.sttlwa.dsl-w.verizon.net [71.112.207.80]) (authenticated bits=0) by www.lanforge.com (8.12.8/8.12.8) with ESMTP id j5AN9u5I007345; Fri, 10 Jun 2005 16:09:57 -0700 Message-ID: <42AA15B7.8050407@candelatech.com> Date: Fri, 10 Jun 2005 15:35:35 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Michael Chan CC: "'netdev@oss.sgi.com'" Subject: Re: BCM5704 performance questions. References: <42A8E0FE.3020708@candelatech.com> <1118361376.5838.20.camel@rh4> <42A8EBDA.6010306@candelatech.com> <1118363861.5838.29.camel@rh4> <42AA016C.9050801@candelatech.com> <1118438203.5294.7.camel@rh4> In-Reply-To: <1118438203.5294.7.camel@rh4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2358 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: greearb@candelatech.com Precedence: bulk X-list: netdev Content-Length: 1190 Lines: 35 Michael Chan wrote: > On Fri, 2005-06-10 at 14:09 -0700, Ben Greear wrote: >>I commented out the code and ran the pktgen test again. It may be a small >>bit better, but not much: 770Mbps in one direction, 750Mbps in the other. >> > The latest tg3 driver will print out the dma_rwctrl at probe time. Can > you check the value to make sure ONE_DMA is disabled? Bit 14 sets > ONE_DMA. I don't see any printout in /var/log/messages that looks like it relates to the dma_rwctrl. I'm using the driver in 2.6.11..maybe it is not recent enough to print this information out? >>Have you done any tests with 2 tg3 NICs in a single machine to see if they >>can run at or near line speed (full duplex)? >> > Yes, we have years ago but not with the tg3 driver. We set up the 5704 > to bridge (or route, I don't remember) one port to the other. It cannot > bridge at line-rate with the ONE_DMA workaround enabled. Ok. Do you have other chipsets/NICs that can handle faster GigE speeds? If you'd like to send me some multi-port NICs to play with I'll be happy to report my findings :) Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com From davem@davemloft.net Fri Jun 10 15:44:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 15:44:45 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5AMifXq020284 for ; Fri, 10 Jun 2005 15:44:42 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DgsDs-0001tj-MU; Fri, 10 Jun 2005 15:43:32 -0700 Date: Fri, 10 Jun 2005 15:43:32 -0700 (PDT) Message-Id: <20050610.154332.122616684.davem@davemloft.net> To: greearb@candelatech.com Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: BCM5704 performance questions. From: "David S. Miller" In-Reply-To: <42AA15B7.8050407@candelatech.com> References: <42AA016C.9050801@candelatech.com> <1118438203.5294.7.camel@rh4> <42AA15B7.8050407@candelatech.com> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2359 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 796 Lines: 20 From: Ben Greear Date: Fri, 10 Jun 2005 15:35:35 -0700 > Michael Chan wrote: > > On Fri, 2005-06-10 at 14:09 -0700, Ben Greear wrote: > > >>I commented out the code and ran the pktgen test again. It may be a small > >>bit better, but not much: 770Mbps in one direction, 750Mbps in the other. > >> > > > The latest tg3 driver will print out the dma_rwctrl at probe time. Can > > you check the value to make sure ONE_DMA is disabled? Bit 14 sets > > ONE_DMA. > > I don't see any printout in /var/log/messages that looks like it > relates to the dma_rwctrl. I'm using the driver in 2.6.11..maybe > it is not recent enough to print this information out? Yes, your driver is too old. The one in Linus's current GIT tree has this, along with many other enhancements. From rahulhsaxena@gmail.com Fri Jun 10 17:40:49 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 17:40:55 -0700 (PDT) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.203]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5B0enXq028436 for ; Fri, 10 Jun 2005 17:40:49 -0700 Received: by zproxy.gmail.com with SMTP id 34so553810nzf for ; Fri, 10 Jun 2005 17:39:42 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=Tn+JLetwe2pmbttlsjws0jrA5R03SuIvA+zeFjBK3BX7SdTGOygmywXnHLP2OSO/55iej05n1tK+gI/TH5NgTrs44hO6BJHuldyoNGYnnFTgqXIMVSeKEPK648nMNPcGAi22EUsZibbfKkdvReUJFQ3PmrxR9ay4MOTfSEXlapA= Received: by 10.36.13.4 with SMTP id 4mr1675328nzm; Fri, 10 Jun 2005 17:39:42 -0700 (PDT) Received: by 10.36.4.6 with HTTP; Fri, 10 Jun 2005 17:39:42 -0700 (PDT) Message-ID: <4532f3170506101739702e31ad@mail.gmail.com> Date: Sat, 11 Jun 2005 06:09:42 +0530 From: Rahul Hari Reply-To: rahul.hari@cse06.itbhu.org To: netdev@oss.sgi.com, netdev@vger.kernel.org, lartc-request@mailman.ds9a.nl, diffserv-general@lists.sourceforge.net, linux.kernel@googlegroups.com Subject: testing techniques to confirm the effectiveness of changes made to sch_gred.c Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5B0enXq028436 X-archive-position: 2360 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rahulhsaxena@gmail.com Precedence: bulk X-list: netdev Content-Length: 1612 Lines: 53 Hi, I have made some changes to the file sch_gred.c to modify the GRED queueing discipline to support the following features: 1) The first virtual queue should get absolute priority while dequeueing (not caring if the others get starved) 2) While in equalise mode and with RIO mode enabled, the packets in the first virtual queue should not be counted for calculating the qave. I want to confirm if the changes made by me are really effective. I would be grateful if someone could let me know about any testing techniques that can be followed for confirming that the changes are really effective. It would be great if someone could also let me know if the logic that I have applied to effect these changes is correct. My logic is as follows: 1) Since the process deals with dequeueing, i have to make changes to gred_dequeue only. If t->tab[0] != 0 then we dequeue the packet otherwise do not dequeue it. 2) if (t->eqp && t->grio) { for (i=0;iDPs;i++) { if ((!t->tab[i]) || (i==q->DP) || (i==0)) continue; if ((t->tab[i] != q) && (PSCHED_IS_PASTPERFECT(t->tab[i]->qidlestart))) qave +=t->tab[i]->qave; } Regards, Rahul -- ---------------------- "The fear you let build up in your mind is worse than the situation that actually exists" from "who moved my cheese" --------------------------------------------------------------------------------- Rahul Hari Senior Under Grad. Student, Department of CSE, ITBHU, Varanasi. Ph: +91-9845347020 rahul.hari@cse06.itbhu.org ------------------------------------------------------------------------------------------ From mbp@sourcefrog.net Fri Jun 10 19:05:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 10 Jun 2005 19:05:04 -0700 (PDT) Received: from ozlabs.org (ozlabs.org [203.10.76.45]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5B250Xq031979 for ; Fri, 10 Jun 2005 19:05:01 -0700 Received: from hope.sourcefrog.net (57.16.168.202.velocitynet.com.au [202.168.16.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTP id AC5CC67B1A; Sat, 11 Jun 2005 12:03:52 +1000 (EST) Received: by hope.sourcefrog.net (Postfix, from userid 1000) id ABBFAF96; Sat, 11 Jun 2005 12:03:48 +1000 (EST) Date: Sat, 11 Jun 2005 12:03:48 +1000 From: Martin Pool To: Pavan K Cc: ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, Alan.Cox@linux.org Subject: Re: Sockets hang in FIN_WAIT1 state in linux 2.2.5 Message-ID: <20050611020348.GA27334@sourcefrog.net> References: <20050610105808.13626.qmail@web30909.mail.mud.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050610105808.13626.qmail@web30909.mail.mud.yahoo.com> User-Agent: Mutt/1.5.6+20040907i X-archive-position: 2361 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbp@sourcefrog.net Precedence: bulk X-list: netdev Content-Length: 983 Lines: 20 On 10 Jun 2005, Pavan K wrote: > Hi, > This is Pavan from India, a s/w engineer stuck in a TCP problem. We have a > legendary linux 2.2.5 kernel (redhat ) system which cant be replaced in my > company. I have noticed that my server stops ACK ing the SYN packets after some > time . Also when i netstat i can see lots of sockets hung in FIN_WAIT1 state. > Upon googling, I found ur mails in the mailing list. Sounds that you too had > similar problems. > So please suggest me any solution to this problem if you have. I am in a > critical stage to fix this bug. I cant upgrade my system to newer kernel since > it needs lot of code porting. I've heard similar reports from people running distcc on 2.2 kernels; after a long time the machine stops accepting new connections. If you can't upgrade the kernel your options are very limited: either work out the precise patch, or reboot your machines regularly. The second is probably easier. -- Martin From willy@w.ods.org Sat Jun 11 00:45:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 00:45:11 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5B7j5Xq022836 for ; Sat, 11 Jun 2005 00:45:06 -0700 Date: Sat, 11 Jun 2005 09:43:50 +0200 From: Willy Tarreau To: "David S. Miller" Cc: xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050611074350.GD28759@alpha.home.local> References: <42A9C607.4030209@unixtrix.com> <42A9BA87.4010600@stud.feec.vutbr.cz> <20050610222645.GA1317@pcw.home.local> <20050610.154248.130848042.davem@davemloft.net> <20050611062413.GA1324@pcw.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050611062413.GA1324@pcw.home.local> User-Agent: Mutt/1.4i X-archive-position: 2362 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 5687 Lines: 125 Hi David, well, I could easily build a proof of concept demonstrating the security problem implied by the simultaneous connect support. For this, I have two machines on the LAN. One (wks, 10.0.3.9, 2.4.29) wants to connect to www.kernel.org:80 (204.152.191.5). It works as expected : wks:willy$ printf "HEAD / HTTP/1.0\r\n\r\n" | nc -p 10000 204.152.191.5 80; echo "ret=$?" HTTP/1.1 200 OK Date: Sat, 11 Jun 2005 07:08:27 GMT Server: Apache/2.0.52 (Fedora) Accept-Ranges: bytes Connection: close Content-Type: text/html ret=0 The other one (pcw) tries to prevent wks from connecting to www.kernel.org, by sending to it about 10 SYNs per second spoofing kernel.org's port 80 : pcw# hping2 -i u100000 -k -a 204.152.191.5 -s 80 -I eth0 10.0.3.9 -p 10000 -S -M 12345678 During this, the client cannot connect to www.kernel.org from this port anymore : wks$ printf "HEAD / HTTP/1.0\r\n\r\n" | nc -p 10000 204.152.191.5 80; echo "ret=$?" ret=1 Capture on the victim (wks=victim, pcw=attacker, www=www.kernel.org): wks 09:06:44.020809 10.0.3.9.10000 > 204.152.191.5.80: S 4010109823:4010109823(0) win 5840 (DF) pcw 09:06:44.065589 204.152.191.5.80 > 10.0.3.9.10000: S 12345678:12345678(0) win 512 wks 09:06:44.065621 10.0.3.9.10000 > 204.152.191.5.80: S 4010109823:4010109823(0) ack 12345679 win 5840 (DF) pcw 09:06:44.166544 204.152.191.5.80 > 10.0.3.9.10000: S 12345678:12345678(0) win 512 www 09:06:44.217896 204.152.191.5.80 > 10.0.3.9.10000: S 2774672577:2774672577(0) ack 4010109824 win 5840 (DF) wks 09:06:44.217939 10.0.3.9.10000 > 204.152.191.5.80: . ack 12345679 win 5840 (DF) wks 09:06:47.020040 10.0.3.9.10000 > 204.152.191.5.80: S 4010109823:4010109823(0) ack 12345679 win 5840 (DF) ... => cannot establish, because of either my local firewall or www.kernel.org's blocks wrong ACKs. Without a firewall, wks would have got an RST. With the attached patch, I can no longer block the communication : 09:31:23.004379 IP (tos 0x0, ttl 64, id 36202, offset 0, flags [DF], length: 60) 10.0.3.1.10000 > 204.152.191.5.80: S [tcp sum ok] 1176290222:1176290222(0) win 13920 09:31:23.051743 IP (tos 0x0, ttl 64, id 9074, offset 0, flags [none], length: 40) 204.152.191.5.80 > 10.0.3.1.10000: S [tcp sum ok] 12345678:12345678(0) win 512 09:31:23.102683 IP (tos 0x0, ttl 64, id 42364, offset 0, flags [none], length: 40) 204.152.191.5.80 > 10.0.3.1.10000: S [tcp sum ok] 12345678:12345678(0) win 512 09:31:23.203546 IP (tos 0x0, ttl 58, id 0, offset 0, flags [DF], length: 60) 204.152.191.5.80 > 10.0.3.1.10000: S [tcp sum ok] 3923636405:3923636405(0) ack 1176290223 win 5792 09:31:23.203625 IP (tos 0x0, ttl 64, id 36204, offset 0, flags [DF], length: 52) 10.0.3.1.10000 > 204.152.191.5.80: . [tcp sum ok] 1176290223:1176290223(0) ack 3923636406 win 3480 => the client ignores fake SYNs and the connection establishes normally. The proposed patch adds a "tcp_simult_connect "sysctl which is disabled by default to fix the problem for non-aware people. Those who know they need the simultaneous connect can enable it manually, but I doubt we can find many of them. Does it seem appropriate for mainline ? In this case, I would also backport it to 2.4 and send it to you for inclusion. Thanks, Willy diff -urN linux-2.6.11.11/include/linux/sysctl.h linux-2.6.11.11-tcp/include/linux/sysctl.h --- linux-2.6.11.11/include/linux/sysctl.h Mon Mar 28 07:06:45 2005 +++ linux-2.6.11.11-tcp/include/linux/sysctl.h Sat Jun 11 09:00:22 2005 @@ -345,6 +345,7 @@ NET_TCP_MODERATE_RCVBUF=106, NET_TCP_TSO_WIN_DIVISOR=107, NET_TCP_BIC_BETA=108, + NET_TCP_SIMULT_CONNECT=109, }; enum { diff -urN linux-2.6.11.11/include/net/tcp.h linux-2.6.11.11-tcp/include/net/tcp.h --- linux-2.6.11.11/include/net/tcp.h Mon Mar 28 07:06:45 2005 +++ linux-2.6.11.11-tcp/include/net/tcp.h Sat Jun 11 08:56:16 2005 @@ -608,6 +608,7 @@ extern int sysctl_tcp_bic_beta; extern int sysctl_tcp_moderate_rcvbuf; extern int sysctl_tcp_tso_win_divisor; +extern int sysctl_tcp_simult_connect; extern atomic_t tcp_memory_allocated; extern atomic_t tcp_sockets_allocated; diff -urN linux-2.6.11.11/net/ipv4/sysctl_net_ipv4.c linux-2.6.11.11-tcp/net/ipv4/sysctl_net_ipv4.c --- linux-2.6.11.11/net/ipv4/sysctl_net_ipv4.c Mon Mar 28 07:06:48 2005 +++ linux-2.6.11.11-tcp/net/ipv4/sysctl_net_ipv4.c Sat Jun 11 08:55:27 2005 @@ -690,6 +690,14 @@ .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = NET_TCP_SIMULT_CONNECT, + .procname = "tcp_simult_connect", + .data = &sysctl_tcp_simult_connect, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0 } }; diff -urN linux-2.6.11.11/net/ipv4/tcp_input.c linux-2.6.11.11-tcp/net/ipv4/tcp_input.c --- linux-2.6.11.11/net/ipv4/tcp_input.c Fri Jun 10 22:49:43 2005 +++ linux-2.6.11.11-tcp/net/ipv4/tcp_input.c Sat Jun 11 08:58:54 2005 @@ -84,6 +84,7 @@ int sysctl_tcp_stdurg; int sysctl_tcp_rfc1337; +int sysctl_tcp_simult_connect; int sysctl_tcp_max_orphans = NR_FILE; int sysctl_tcp_frto; int sysctl_tcp_nometrics_save; @@ -4620,7 +4621,7 @@ if (tp->rx_opt.ts_recent_stamp && tp->rx_opt.saw_tstamp && tcp_paws_check(&tp->rx_opt, 0)) goto discard_and_undo; - if (th->syn) { + if (th->syn && sysctl_tcp_simult_connect) { /* We see SYN without ACK. It is attempt of * simultaneous connect with crossed SYNs. * Particularly, it can be connect to self. From vda@ilport.com.ua Sat Jun 11 05:45:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 05:45:58 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5BCjkXq004164 for ; Sat, 11 Jun 2005 05:45:49 -0700 Received: (qmail 28653 invoked by alias); 11 Jun 2005 12:44:35 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 11 Jun 2005 12:44:29 -0000 From: Denis Vlasenko To: , "'Pavel Machek'" , "'Jeff Garzik'" , "'Netdev list'" , "'kernel list'" , "'James P. Ketrenos'" Subject: Re: ipw2100: firmware problem Date: Sat, 11 Jun 2005 15:44:25 +0300 User-Agent: KMail/1.5.4 References: <000f01c56dbf$9b15de90$600cc60a@amer.sykes.com> In-Reply-To: <000f01c56dbf$9b15de90$600cc60a@amer.sykes.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506111544.25462.vda@ilport.com.ua> X-archive-position: 2363 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 2576 Lines: 59 On Friday 10 June 2005 16:23, Alejandro Bonilla wrote: > > Adding kernel level wireless autoconfiguration duplicates the effort. > > Since I am not going to give up a requirement to be able to stay radio > > silent at boot (me too wants freedom, not only you), you need to add > > disable=1 module parameter to each driver, which adds to the mess. > > > > ALSA does the Right Thing. Sound is completely muted out at > > module load. > > It's a user freedom to set desired volume level after that. > > Yeah right. I remember I had to google for 10 minutes to find the answer for > this one. Why would you install something, for it to not work? > > It thing of Mute in ALSA is stupid. If you want Sound, you install the Sound > and enable it. Why would it make you google for more things to do? ALSA mute > on install is WAY way, not OK. > > You *will* have to use a How-To with ALSA, nobody knows that your sound > would be off because some people decided it. Well, which sound level shall be set instead? 100%? Maybe too loud for my 500 watt loudpeakers, eh? 50%? Still too many. 5%? Nah. My machine at work has a headphone which is anything but loud. 5% is nearly silence for it. See? It's not a kernel matter at which volume sound must be set. It is impossible to decide on the 'right' default. > But this is out of the Topic. I agree with you all, but as I mentioned in a > more current email, this is a laptop, not a server. Things behave > differently and you want things faster. (Yes, I could have a script) Or laptop oriented distro can have a script for you, just like they already do have for DHCPizing all ifaces. Not kernel business. > What I'm saying, is that just as ALSA, you will have to google even more > just to be able to look for the boot param for the driver for it to ASSOC on > boot like the Original drive does. Instead, if you simply don't want to > associate then turn off the Radio. > > It's a simple FN+F2 or depends on your laptop. > > Let's not make this a bigger thread, just decide and then do it that way. > I'm looking at this on the side of a supporter, seeing the emails from > users... "how do I make it behave as it was before" "it won't assoc on boot > anymore" Users which can not figure it by themself have not much power in dictating how kernel drivers are written. Sure we listen to users, but we won't blidly follow any and all suggestions. If users want it different nodoby prohibits them from writing their own drivers, right? Or patching existing ones, for that matter. Send patches to lkml for discussion. -- vda From rdenis@simphalempin.com Sat Jun 11 06:14:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 06:14:10 -0700 (PDT) Received: from durga.via.ecp.fr (durga.via.ecp.fr [138.195.130.75]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5BDE2Xq005662 for ; Sat, 11 Jun 2005 06:14:02 -0700 Received: from localhost (localhost.localdomain [127.0.0.1]) by durga.via.ecp.fr (Postfix) with ESMTP id 394AC20EC; Sat, 11 Jun 2005 15:12:37 +0200 (CEST) Received: from durga.via.ecp.fr ([127.0.0.1]) by localhost (durga [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 07115-19; Sat, 11 Jun 2005 15:12:36 +0200 (CEST) Received: from auguste.via.ecp.fr (auguste.via.ecp.fr [IPv6:2002:8ac3:802d:1242:20d:60ff:fe38:6d16]) by durga.via.ecp.fr (Postfix) with ESMTP id C51B32143; Sat, 11 Jun 2005 15:12:36 +0200 (CEST) From: =?utf-8?q?R=C3=A9mi_Denis-Courmont?= Organization: SimPhalempin.Com To: davem@davemloft.net, pekkas@netcore.fi Subject: [PATCH] networking: [IPv6] Don't generate temporary for TUN devices Date: Sat, 11 Jun 2005 15:12:40 +0200 User-Agent: KMail/1.7.2 Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Message-Id: <200506111512.42592.rdenis@simphalempin.com> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5BDE2Xq005662 X-archive-position: 2364 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rdenis@simphalempin.com Precedence: bulk X-list: netdev Content-Length: 1380 Lines: 36 Hello, Userland layer-2 tunneling devices allocated through the TUNTAP driver (drivers/net/tun.c) have a type of ARPHRD_NONE, and have no link-layer address. The kernel complains at regular interval when IPv6 Privacy extension are enabled because it can't find an hardware address : Dec 29 11:02:04 auguste kernel: __ipv6_regen_rndid(idev=cb3e0c00): cannot get EUI64 identifier; use random bytes. IPv6 Privacy extensions should probably be disabled on that sort of device. They won't work anyway. If userland wants a more usual Ethernet-ish interface with usual IPv6 autoconfiguration, it will use a TAP device with an emulated link-layer and a random hardware address rather than a TUN device. As far as I could fine, TUN virtual device from TUNTAP is the very only sort of device using ARPHRD_NONE as kernel device type. Signed-off-by: Rémi Denis-Courmont --- a/net/ipv6/addrconf.c.orig 2004-12-29 10:50:27.000000000 +0100 +++ b/net/ipv6/addrconf.c 2004-12-29 10:50:41.000000000 +0100 @@ -372,6 +372,7 @@ ndev->regen_timer.data = (unsigned long) ndev; if ((dev->flags&IFF_LOOPBACK) || dev->type == ARPHRD_TUNNEL || + dev->type == ARPHRD_NONE || dev->type == ARPHRD_SIT) { printk(KERN_INFO "Disabled Privacy Extensions on device %p(%s)\n", -- Rémi Denis-Courmont http://www.simphalempin.com/home/ From yoshfuji@linux-ipv6.org Sat Jun 11 06:34:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 06:34:14 -0700 (PDT) Received: from yue.st-paulia.net ([203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5BDY8Xq006725 for ; Sat, 11 Jun 2005 06:34:09 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id 9F38E33CC2; Sat, 11 Jun 2005 22:33:02 +0900 (JST) Date: Sat, 11 Jun 2005 22:32:57 +0900 (JST) Message-Id: <20050611.223257.46118215.yoshfuji@linux-ipv6.org> To: rdenis@simphalempin.com, davem@davemloft.net Cc: davem@davemloft.net, pekkas@netcore.fi, netdev@oss.sgi.com Subject: Re: [PATCH] networking: [IPv6] Don't generate temporary for TUN devices From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <200506111512.42592.rdenis@simphalempin.com> References: <200506111512.42592.rdenis@simphalempin.com> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5BDY8Xq006725 X-archive-position: 2365 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 469 Lines: 12 In article <200506111512.42592.rdenis@simphalempin.com> (at Sat, 11 Jun 2005 15:12:40 +0200), Rémi Denis-Courmont says: > Dec 29 11:02:04 auguste kernel: __ipv6_regen_rndid(idev=cb3e0c00): > cannot get EUI64 identifier; use random bytes. : > As far as I could fine, TUN virtual device from TUNTAP is the very only > sort of device using ARPHRD_NONE as kernel device type. Acked-by: YOSHIFUJI Hideaki --yoshfuji From khc@pm.waw.pl Sat Jun 11 10:45:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 10:46:02 -0700 (PDT) Received: from khc.piap.pl ([195.187.100.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5BHjsXq019220 for ; Sat, 11 Jun 2005 10:45:57 -0700 Received: by khc.piap.pl (Postfix, from userid 500) id 2A2D334107; Sat, 11 Jun 2005 19:44:29 +0200 (CEST) To: Subject: Re: TCP stalls - sack?, 2.6.12pre6 References: From: Krzysztof Halasa Date: Sat, 11 Jun 2005 19:44:28 +0200 In-Reply-To: (Krzysztof Halasa's message of "Fri, 10 Jun 2005 00:58:55 +0200") Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2366 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: khc@pm.waw.pl Precedence: bulk X-list: netdev Content-Length: 2036 Lines: 47 Hi, Another TCP stall (2.6.12pre6 on both ends): > intrepid is X11-server with ssh connection to defiant (X11-forwarding, > EPIC100 PCI NIC). defiant is an older notebook machine and it was running > XEmacs and Firefox with ssh/X11 (cardbus DEC 21143). > Both on the same Ethernet subnet. Both standard MTU etc. [DEF] is TCP defiant:ssh [INT] is TCP intrepid:4782 intrepid:~# netstat -to | grep 4782; tcpdump -w stall -s 1600 -i eth0 Sat Jun 11 15:55:51 CEST 2005 Recv-Q Send-Q tcp 0 0 [INT] [DEF] ESTABLISHED keepalive (675.46/0/0) tcpdump file basically shows (TCP already stalled): 15:56:27.1 [INT] > [DEF]: P 569644050:569644162(112) ack 3725119932 win 31164 15:56:27.1 [DEF] > [INT]: . ack 112 win 16022 15:56:27.1 [INT] > [DEF]: P 112:256(144) ack 1 win 31164 15:56:27.1 [DEF] > [INT]: . ack 256 win 16022 15:56:27.1 [INT] > [DEF]: P 256:336(80) ack 1 win 31164 15:56:27.1 [DEF] > [INT]: . ack 336 win 16022 ... 16:04:09.6 [DEF] > [INT]: . ack 35536 win 16022 16:04:09.7 [INT] > [DEF]: P 35536:35616(80) ack 1 win 31164 16:04:09.7 [DEF] > [INT]: . ack 35616 win 16022 16:05:12.7 [DEF] > [INT]: . 1:1449(1448) ack 35616 win 16022 16:08:14.3 [INT] > [DEF]: P 35616:35760(144) ack 1 win 31164 16:08:14.3 [DEF] > [INT]: R 3725119932:3725119932(0) win 0 defiant:~# netstat -to | grep 4782 Recv-Q Send-Q tcp 0 54512 [DEF] [INT] ESTABLISHED on (81.30/10/0) and Send-Q is constant until TCP reset. > Ideas? -- Krzysztof Halasa From herbert@gondor.apana.org.au Sat Jun 11 12:34:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 12:34:12 -0700 (PDT) Received: from goliath.apana.org.au (goliath.apana.org.au [202.12.88.44]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5BJY4Xq028284 for ; Sat, 11 Jun 2005 12:34:05 -0700 Received: from arnor.apana.org.au ([203.14.152.115] ident=mail) by goliath.apana.org.au with esmtp (Exim 4.50) id 1DhBiq-0004FF-Sh; Sun, 12 Jun 2005 05:32:49 +1000 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhBii-0006vg-00; Sun, 12 Jun 2005 05:32:40 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhBic-0005dp-00; Sun, 12 Jun 2005 05:32:34 +1000 From: Herbert Xu To: willy@w.ods.org (Willy Tarreau) Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Organization: Core In-Reply-To: <20050611074350.GD28759@alpha.home.local> X-Newsgroups: apana.lists.os.linux.kernel,apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Sun, 12 Jun 2005 05:32:34 +1000 X-SA-Exim-Connect-IP: 203.14.152.115 X-SA-Exim-Mail-From: herbert@gondor.apana.org.au X-SA-Exim-Scanned: No (on goliath.apana.org.au); SAEximRunCond expanded to false X-archive-position: 2367 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 517 Lines: 14 Willy Tarreau wrote: > > During this, the client cannot connect to www.kernel.org from this port > anymore : > wks$ printf "HEAD / HTTP/1.0\r\n\r\n" | nc -p 10000 204.152.191.5 80; echo "ret=$?" > ret=1 What if you let the client connect from a random port which is what it should do? -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From bunk@stusta.de Sat Jun 11 12:38:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 12:38:54 -0700 (PDT) Received: from mailout.stusta.mhn.de (mailout.stusta.mhn.de [141.84.69.5]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5BJcnXq028897 for ; Sat, 11 Jun 2005 12:38:51 -0700 Received: (qmail 5924 invoked from network); 11 Jun 2005 19:37:30 -0000 Received: from r063144.stusta.swh.mhn.de (10.150.63.144) by mailout.stusta.mhn.de with SMTP; 11 Jun 2005 19:37:30 -0000 Received: by r063144.stusta.swh.mhn.de (Postfix, from userid 1000) id 7B24DBB804; Sat, 11 Jun 2005 21:37:29 +0200 (CEST) Date: Sat, 11 Jun 2005 21:37:29 +0200 From: Adrian Bunk To: John covici Cc: linux-kernel@vger.kernel.org, cramerj@intel.com, john.ronciak@intel.com, ganesh.venkatesan@intel.com, netdev@oss.sgi.com Subject: Re: e1000 not working using 2.6.11 Message-ID: <20050611193729.GJ3770@stusta.de> References: <17064.41290.461755.920152@ccs.covici.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17064.41290.461755.920152@ccs.covici.com> User-Agent: Mutt/1.5.9i X-archive-position: 2368 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bunk@stusta.de Precedence: bulk X-list: netdev Content-Length: 1417 Lines: 38 On Thu, Jun 09, 2005 at 04:06:34PM -0400, John covici wrote: > Hi. I am not getting good results on a box I have which uses an Intel > Pro gigabit Ethernet driver for its network connection. What happens > is that I get messages like watchdog xmit timeout and lots of errors > out of ifconfig. Here is the listpci entry for that card. It works > under the other OS, so I imagine the hardware is OK. > > 0000:01:0b.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02) > Subsystem: Intel Corp.: Unknown device 3013 > Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 19 > Memory at e7000000 (64-bit, non-prefetchable) [size=128K] > Memory at e7020000 (64-bit, non-prefetchable) [size=64K] > I/O ports at b400 [size=64] > Capabilities: [dc] Power Management version 2 > Capabilities: [e4] PCI-X non-bridge device. > Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- > > > Any assistance would be appreciated. Does 2.6.12-rc6 work? If not, the maintainers of this driver (Cc'ed) might be able to help you. > John Covici cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed From willy@w.ods.org Sat Jun 11 12:53:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 11 Jun 2005 12:53:20 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5BJr2Xq029727 for ; Sat, 11 Jun 2005 12:53:16 -0700 Date: Sat, 11 Jun 2005 21:51:44 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050611195144.GF28759@alpha.home.local> References: <20050611074350.GD28759@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i X-archive-position: 2369 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1006 Lines: 27 Hi Herbert, On Sun, Jun 12, 2005 at 05:32:34AM +1000, Herbert Xu wrote: > Willy Tarreau wrote: > > > > During this, the client cannot connect to www.kernel.org from this port > > anymore : > > wks$ printf "HEAD / HTTP/1.0\r\n\r\n" | nc -p 10000 204.152.191.5 80; echo "ret=$?" > > ret=1 > > What if you let the client connect from a random port which is what it > should do? Of course, if the port chosen by the client is not in the range probed by the attacker, everything's OK. My point is that relying *only* on a port number is a bit limitative. It is even more when some protocols only bind to privileged source ports, or always use the same port range at boot (eg: a router establishing a BGP connection to the ISP's router). Please note that if I only called it "small DoS", it's clearly because I don't consider this critical, but I think that most people involved in security will find that DoSes based on port guessing should be addressed when possible. Regards, Willy From herbert@gondor.apana.org.au Sun Jun 12 01:15:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 01:15:23 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5C8FFXq032147 for ; Sun, 12 Jun 2005 01:15:16 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhNb6-0002BM-00; Sun, 12 Jun 2005 18:13:36 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhNax-0006M7-00; Sun, 12 Jun 2005 18:13:27 +1000 Date: Sun, 12 Jun 2005 18:13:27 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612081327.GA24384@gondor.apana.org.au> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050611195144.GF28759@alpha.home.local> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2370 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 850 Lines: 21 On Sat, Jun 11, 2005 at 09:51:44PM +0200, Willy Tarreau wrote: > > Please note that if I only called it "small DoS", it's clearly because > I don't consider this critical, but I think that most people involved > in security will find that DoSes based on port guessing should be > addressed when possible. Sorry but this patch is pointless. If I wanted to prevent you from connecting to www.kernel.org 80 and I knew your source port number I'd be directly sending you fake SYN-ACK packets which will kill your connection immediately. If you want reliability and security you really should be using IPsec. There is no other way. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From willy@w.ods.org Sun Jun 12 01:35:31 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 01:35:35 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5C8ZUXq000760 for ; Sun, 12 Jun 2005 01:35:30 -0700 Date: Sun, 12 Jun 2005 10:34:09 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612083409.GA8220@alpha.home.local> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612081327.GA24384@gondor.apana.org.au> User-Agent: Mutt/1.4i X-archive-position: 2371 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1575 Lines: 35 On Sun, Jun 12, 2005 at 06:13:27PM +1000, Herbert Xu wrote: > On Sat, Jun 11, 2005 at 09:51:44PM +0200, Willy Tarreau wrote: > > > > Please note that if I only called it "small DoS", it's clearly because > > I don't consider this critical, but I think that most people involved > > in security will find that DoSes based on port guessing should be > > addressed when possible. > > Sorry but this patch is pointless. If I wanted to prevent you from > connecting to www.kernel.org 80 and I knew your source port number > I'd be directly sending you fake SYN-ACK packets which will kill > your connection immediately. Only if your ACK was within my SEQ window, which adds about 20 bits of random when my initial window is 5840. You would then need to send one million times more packets to achieve the same goal. > If you want reliability and security you really should be using IPsec. > There is no other way. I agree with you on the fact that people who need security must use secure protocols. I had the same words last year when people discovered that a TCP RST could kill a BGP session, and the end of the internet was announced. Hey, if someone needs secure BGP, he must use MD5 sums from the start. I'm not meaning to make TCP as secure as IPsec, but I think that when supporting a feature (simultaneous connect) that nobody uses and many OSes do not even support introduces a weakness, we could at least make it optional. It could also rely on a #if CONFIG_TCP_SIMULT which will slightly reduce kernel size for people who know they don't want it. Cheers, Willy From herbert@gondor.apana.org.au Sun Jun 12 03:32:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 03:32:11 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CAW3Xq004992 for ; Sun, 12 Jun 2005 03:32:06 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhPjV-0002mF-00; Sun, 12 Jun 2005 20:30:25 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhPjQ-0006XO-00; Sun, 12 Jun 2005 20:30:20 +1000 Date: Sun, 12 Jun 2005 20:30:20 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612103020.GA25111@gondor.apana.org.au> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612083409.GA8220@alpha.home.local> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2372 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 782 Lines: 17 On Sun, Jun 12, 2005 at 10:34:09AM +0200, Willy Tarreau wrote: > > > Sorry but this patch is pointless. If I wanted to prevent you from > > connecting to www.kernel.org 80 and I knew your source port number > > I'd be directly sending you fake SYN-ACK packets which will kill > > your connection immediately. > > Only if your ACK was within my SEQ window, which adds about 20 bits of > random when my initial window is 5840. You would then need to send one > million times more packets to achieve the same goal. Nope, no sequence validity check is made on the SYN-ACK. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From tgraf@suug.ch Sun Jun 12 03:47:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 03:47:25 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CAlMXq005953 for ; Sun, 12 Jun 2005 03:47:22 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id E3BFF1C0ED; Sun, 12 Jun 2005 12:46:28 +0200 (CEST) Date: Sun, 12 Jun 2005 12:46:28 +0200 From: Thomas Graf To: rahul.hari@cse06.itbhu.org Cc: netdev@oss.sgi.com, netdev@vger.kernel.org, lartc-request@mailman.ds9a.nl, diffserv-general@lists.sourceforge.net, linux.kernel@googlegroups.com Subject: Re: testing techniques to confirm the effectiveness of changes made to sch_gred.c Message-ID: <20050612104628.GA22463@postel.suug.ch> References: <4532f3170506101739702e31ad@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4532f3170506101739702e31ad@mail.gmail.com> X-archive-position: 2373 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 1517 Lines: 38 * Rahul Hari <4532f3170506101739702e31ad@mail.gmail.com> 2005-06-11 06:09 > I have made some changes to the file sch_gred.c to modify the GRED > queueing discipline to support the following features: > 1) The first virtual queue should get absolute priority while > dequeueing (not caring if the others get starved) > 2) While in equalise mode and with RIO mode enabled, the packets in > the first virtual queue should not be counted for calculating the > qave. You do not need to modify gred to achieve this, use a prio qdisc with 2 bands, band 1 covers your "first virtual queue" with a single red attached, band 2 covers the rest and uses a gred. > 1) Since the process deals with dequeueing, i have to make changes to > gred_dequeue only. If t->tab[0] != 0 then we dequeue the packet > otherwise do not dequeue it. What you describe above is: only dequeue when DP 0 is configured, probably not what you want. The only way to prioritize within gred the way you want is to modify dequeue() that it iterates through sch->q looking for a skb with tcindex==DP0 and use it instead of the skb at the queue head. > 2) > if (t->eqp && t->grio) { > > for (i=0;iDPs;i++) { > if ((!t->tab[i]) || (i==q->DP) || (i==0)) > continue; > > if ((t->tab[i] != q) && (PSCHED_IS_PASTPERFECT(t->tab[i]->qidlestart))) > qave +=t->tab[i]->qave; > } You no longer consider the priority so it won't be wred anymore, also if (i == q->DP) continue makes a check t->tab[i] != q unnecessary. From willy@w.ods.org Sun Jun 12 04:42:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 04:42:09 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CBg3Xq012082 for ; Sun, 12 Jun 2005 04:42:04 -0700 Date: Sun, 12 Jun 2005 13:40:39 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612114039.GI28759@alpha.home.local> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612103020.GA25111@gondor.apana.org.au> User-Agent: Mutt/1.4i X-archive-position: 2374 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1534 Lines: 33 On Sun, Jun 12, 2005 at 08:30:20PM +1000, Herbert Xu wrote: > On Sun, Jun 12, 2005 at 10:34:09AM +0200, Willy Tarreau wrote: > > > > > Sorry but this patch is pointless. If I wanted to prevent you from > > > connecting to www.kernel.org 80 and I knew your source port number > > > I'd be directly sending you fake SYN-ACK packets which will kill > > > your connection immediately. > > > > Only if your ACK was within my SEQ window, which adds about 20 bits of > > random when my initial window is 5840. You would then need to send one > > million times more packets to achieve the same goal. > > Nope, no sequence validity check is made on the SYN-ACK. Sorry Herbert, but both RFC793 page 32 figure 9 and my Linux box disagree with this statement. Look: at line 5, A rejects the SYN-ACK because the ACK is wrong during the session setup. And if you send the SYN-ACK on an established session, either it's in the window in which case the other end will send an RST, or it's outside the window, in which case the other end will resend an ACK to tell you what it expects. So I maintain my statement that the SYN-ACK must be within the window to cause a session reset. That's why I considered cisco's approach a total bullshit, because they mangled the TCP implementation to protect against in-window RSTs, but they failed to see that SYN-ACK would do exactly the same. I fail to find a case where both the SEQ and the ACK are ignored. This is why I believe that the simultaneous connect mode introduces a weakness. Cheers, Willy From herbert@gondor.apana.org.au Sun Jun 12 05:08:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 05:08:12 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CC86Xq013567 for ; Sun, 12 Jun 2005 05:08:07 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhREW-0003Q7-00; Sun, 12 Jun 2005 22:06:32 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhRER-0001Wr-00; Sun, 12 Jun 2005 22:06:27 +1000 Date: Sun, 12 Jun 2005 22:06:27 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612120627.GA5858@gondor.apana.org.au> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612114039.GI28759@alpha.home.local> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2375 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 540 Lines: 12 On Sun, Jun 12, 2005 at 01:40:39PM +0200, Willy Tarreau wrote: > > Sorry Herbert, but both RFC793 page 32 figure 9 and my Linux box disagree > with this statement. Look: at line 5, A rejects the SYN-ACK because the > ACK is wrong during the session setup. Look at the first check inside th->ack in tcp_rcv_synsent_state_process. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From tgraf@suug.ch Sun Jun 12 05:23:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 05:23:38 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CCNZXq014429 for ; Sun, 12 Jun 2005 05:23:35 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 22FE41C0ED; Sun, 12 Jun 2005 14:22:47 +0200 (CEST) Date: Sun, 12 Jun 2005 14:22:47 +0200 From: Thomas Graf To: Herbert Xu Cc: Willy Tarreau , davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612122247.GB22463@postel.suug.ch> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612120627.GA5858@gondor.apana.org.au> X-archive-position: 2376 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 623 Lines: 12 * Herbert Xu <20050612120627.GA5858@gondor.apana.org.au> 2005-06-12 22:06 > On Sun, Jun 12, 2005 at 01:40:39PM +0200, Willy Tarreau wrote: > > > > Sorry Herbert, but both RFC793 page 32 figure 9 and my Linux box disagree > > with this statement. Look: at line 5, A rejects the SYN-ACK because the > > ACK is wrong during the session setup. > > Look at the first check inside th->ack in tcp_rcv_synsent_state_process. Usually a continious flow of ACK+RST is used to prevent a connection from being established, it's more reliable because even if you hit the ISS+rcv_next window the connection attempt will still be reset. From willy@w.ods.org Sun Jun 12 05:34:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 05:34:15 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CCYBXq015207 for ; Sun, 12 Jun 2005 05:34:11 -0700 Date: Sun, 12 Jun 2005 14:32:53 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612123253.GK28759@alpha.home.local> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612120627.GA5858@gondor.apana.org.au> User-Agent: Mutt/1.4i X-archive-position: 2377 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1213 Lines: 26 On Sun, Jun 12, 2005 at 10:06:27PM +1000, Herbert Xu wrote: > On Sun, Jun 12, 2005 at 01:40:39PM +0200, Willy Tarreau wrote: > > > > Sorry Herbert, but both RFC793 page 32 figure 9 and my Linux box disagree > > with this statement. Look: at line 5, A rejects the SYN-ACK because the > > ACK is wrong during the session setup. > > Look at the first check inside th->ack in tcp_rcv_synsent_state_process. Herbert, I perfectly agree with this check and it's consistent with what I observe. But as you know, there's a difference between resetting a session and sending an RST to say that we refuse a segment. This check does not kill the session, it sends an RST whose SEQ is equal to the SYN-ACK's ACK. It's possible you though the "reset_and_undo" label was used to kill the session, but it's not the case (although the naming is not clear). So if the remote end was the one which sent the SYN-ACK, it will clear its session. If it has been spoofed, it will ignore the RST because in turn, the SEQ will not be within its window. Try it by yourself if you don't believe me. I've done lots of tests with hping2 and I've never managed to kill a session with both a SEQ and ACK outside the windows. Regards, Willy From herbert@gondor.apana.org.au Sun Jun 12 06:15:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 06:15:24 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CDFEXq016840 for ; Sun, 12 Jun 2005 06:15:15 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhSHK-0003sA-00; Sun, 12 Jun 2005 23:13:30 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhSHD-0002ey-00; Sun, 12 Jun 2005 23:13:23 +1000 Date: Sun, 12 Jun 2005 23:13:23 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612131323.GA10188@gondor.apana.org.au> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612123253.GK28759@alpha.home.local> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2378 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 874 Lines: 24 On Sun, Jun 12, 2005 at 02:32:53PM +0200, Willy Tarreau wrote: > > but it's not the case (although the naming is not clear). So if the remote > end was the one which sent the SYN-ACK, it will clear its session. If it has > been spoofed, it will ignore the RST because in turn, the SEQ will not be > within its window. This is what should happen: 1) client A sends SYN to server B. 2) attcker C sends spoofed SYN-ACK to client A purporting to be server B. 3) client A sends RST to server B. The RST packet is sent by client A using its sequence numbers. Therefore it will pass the sequence number check on server B. 4) server B resets the connection. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sun Jun 12 06:18:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 06:18:10 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CDI6Xq017209 for ; Sun, 12 Jun 2005 06:18:07 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhSKA-0003tG-00; Sun, 12 Jun 2005 23:16:26 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhSK8-0002fg-00; Sun, 12 Jun 2005 23:16:24 +1000 Date: Sun, 12 Jun 2005 23:16:24 +1000 To: Thomas Graf Cc: Willy Tarreau , davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612131624.GB10188@gondor.apana.org.au> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612122247.GB22463@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612122247.GB22463@postel.suug.ch> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2379 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 768 Lines: 18 On Sun, Jun 12, 2005 at 02:22:47PM +0200, Thomas Graf wrote: > > > Look at the first check inside th->ack in tcp_rcv_synsent_state_process. > > Usually a continious flow of ACK+RST is used to prevent a connection > from being established, it's more reliable because even if you hit the > ISS+rcv_next window the connection attempt will still be reset. Sure. My point is that there are a hundred and one ways to attack a TCP connection in a manner similar to the original method that started this thread. So fixes like this are pretty pointless. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From herbert@gondor.apana.org.au Sun Jun 12 06:35:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 06:35:29 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CDZMXq018435 for ; Sun, 12 Jun 2005 06:35:23 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhSb1-00044p-00; Sun, 12 Jun 2005 23:33:51 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhSaz-0001dc-00; Sun, 12 Jun 2005 23:33:49 +1000 Date: Sun, 12 Jun 2005 23:33:49 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612133349.GA6279@gondor.apana.org.au> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612131323.GA10188@gondor.apana.org.au> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2380 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 956 Lines: 24 On Sun, Jun 12, 2005 at 11:13:23PM +1000, herbert wrote: > On Sun, Jun 12, 2005 at 02:32:53PM +0200, Willy Tarreau wrote: > > > > but it's not the case (although the naming is not clear). So if the remote > > end was the one which sent the SYN-ACK, it will clear its session. If it has > > been spoofed, it will ignore the RST because in turn, the SEQ will not be > > within its window. > > This is what should happen: Sorry, you're right. The SEQ check should catch this. However, a few lines down in that same function there is a th->rst check which will kill the connection just as effectively. My point is that there are many ways to kill TCP connections in ways similar to what you proposed initially so it isn't that special. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From willy@w.ods.org Sun Jun 12 06:38:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 06:38:18 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CDcEXq018992 for ; Sun, 12 Jun 2005 06:38:15 -0700 Date: Sun, 12 Jun 2005 15:36:54 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612133654.GA8951@alpha.home.local> References: <20050611074350.GD28759@alpha.home.local> <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612131323.GA10188@gondor.apana.org.au> User-Agent: Mutt/1.4i X-archive-position: 2381 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 953 Lines: 28 On Sun, Jun 12, 2005 at 11:13:23PM +1000, Herbert Xu wrote: > On Sun, Jun 12, 2005 at 02:32:53PM +0200, Willy Tarreau wrote: > > > > but it's not the case (although the naming is not clear). So if the remote > > end was the one which sent the SYN-ACK, it will clear its session. If it has > > been spoofed, it will ignore the RST because in turn, the SEQ will not be > > within its window. > > This is what should happen: > > 1) client A sends SYN to server B. > 2) attcker C sends spoofed SYN-ACK to client A purporting to be server B. > 3) client A sends RST to server B. Agreed till here. > The RST packet is sent by client A using its sequence numbers. Therefore > it will pass the sequence number check on server B. > > 4) server B resets the connection. No, precisely the RST sent by A will take its SEQ from C's ACK number. This is why B will *not* reset the connection (again, tested) if C's ACK was not within B's window. Cheers, Willy From willy@w.ods.org Sun Jun 12 06:48:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 06:48:49 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CDmhXq019730 for ; Sun, 12 Jun 2005 06:48:44 -0700 Date: Sun, 12 Jun 2005 15:47:25 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612134725.GB8951@alpha.home.local> References: <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133349.GA6279@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612133349.GA6279@gondor.apana.org.au> User-Agent: Mutt/1.4i X-archive-position: 2382 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1628 Lines: 38 On Sun, Jun 12, 2005 at 11:33:49PM +1000, Herbert Xu wrote: > On Sun, Jun 12, 2005 at 11:13:23PM +1000, herbert wrote: > > On Sun, Jun 12, 2005 at 02:32:53PM +0200, Willy Tarreau wrote: > > > > > > but it's not the case (although the naming is not clear). So if the remote > > > end was the one which sent the SYN-ACK, it will clear its session. If it has > > > been spoofed, it will ignore the RST because in turn, the SEQ will not be > > > within its window. > > > > This is what should happen: > > Sorry, you're right. The SEQ check should catch this. No problem. Fortunately, this part of the code is *very well* documented :-) > However, a few lines down in that same function there is a th->rst > check which will kill the connection just as effectively. Yes, but only if there's an ACK and the ACK is exactly equal to snd_next, so the connection will survive. > My point is that there are many ways to kill TCP connections in ways > similar to what you proposed initially so it isn't that special. No, there are plenty of ways to kill TCP connections when you can guess the window (which is more and more easy thanks to window scaling). But I have yet found no way to kill a TCP session without this info, except by exploiting the simultaneous connect feature. My point was that it would not be too difficult to remotely prevent an anti-virus or IDS from downloading its updates when you know the update site's address and you know that by default it uses source ports 1024-4999 to connect outside. I don't really care for BGP however because people should use MD5 or they get what they deserve. Cheers, Willy From herbert@gondor.apana.org.au Sun Jun 12 06:51:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 06:51:58 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CDppXq020320 for ; Sun, 12 Jun 2005 06:51:52 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhSqy-0004By-00; Sun, 12 Jun 2005 23:50:20 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhSqw-0002qG-00; Sun, 12 Jun 2005 23:50:18 +1000 Date: Sun, 12 Jun 2005 23:50:18 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612135018.GA10910@gondor.apana.org.au> References: <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133349.GA6279@gondor.apana.org.au> <20050612134725.GB8951@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612134725.GB8951@alpha.home.local> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2383 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1107 Lines: 27 On Sun, Jun 12, 2005 at 03:47:25PM +0200, Willy Tarreau wrote: > > Yes, but only if there's an ACK and the ACK is exactly equal to snd_next, > so the connection will survive. Sorry I wasn't thinking straight. > > > My point is that there are many ways to kill TCP connections in ways > > similar to what you proposed initially so it isn't that special. > > No, there are plenty of ways to kill TCP connections when you can guess > the window (which is more and more easy thanks to window scaling). But > I have yet found no way to kill a TCP session without this info, except > by exploiting the simultaneous connect feature. I still stand by this point though. The most obvious thing I can think of right now is to change your attack to simply connect to kernel.org's webserver first from source port 10000. That will cause the real SYN packet to fail the sequence number check. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From yoshfuji@linux-ipv6.org Sun Jun 12 07:05:00 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 07:05:04 -0700 (PDT) Received: from yue.st-paulia.net (yue.linux-ipv6.org [203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CE50Xq021215 for ; Sun, 12 Jun 2005 07:05:00 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id 4D54B33CC2; Sun, 12 Jun 2005 23:03:59 +0900 (JST) Date: Sun, 12 Jun 2005 23:03:58 +0900 (JST) Message-Id: <20050612.230358.119812619.yoshfuji@linux-ipv6.org> To: davem@davemloft.net Cc: yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: [PATCH] Ensure to use icmpv6_socket in non-preemptive context. From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2384 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 1954 Lines: 70 Hello. [IPV6] Make sure to use icmpv6_socket in non-preemptive context. We saw following trace several times: |BUG: using smp_processor_id() in preemptible [00000001] code: httpd/30137 |caller is icmpv6_send+0x23/0x540 | [] smp_processor_id+0x9b/0xb8 | [] icmpv6_send+0x23/0x540 This is because of icmpv6_socket, which is the only one user of smp_processor_id() in icmpv6_send(), AFAIK. Since it should be used in non-preemptive context, let's defer the dereference after disabling preemption (by icmpv6_xmit_lock()). Signed-off-by: YOSHIFUJI Hideaki diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -277,8 +277,8 @@ void icmpv6_send(struct sk_buff *skb, in { struct inet6_dev *idev = NULL; struct ipv6hdr *hdr = skb->nh.ipv6h; - struct sock *sk = icmpv6_socket->sk; - struct ipv6_pinfo *np = inet6_sk(sk); + struct sock *sk; + struct ipv6_pinfo *np; struct in6_addr *saddr = NULL; struct dst_entry *dst; struct icmp6hdr tmp_hdr; @@ -358,6 +358,9 @@ void icmpv6_send(struct sk_buff *skb, in if (icmpv6_xmit_lock()) return; + sk = icmpv6_socket->sk; + np = inet6_sk(sk); + if (!icmpv6_xrlim_allow(sk, type, &fl)) goto out; @@ -423,9 +426,9 @@ out: static void icmpv6_echo_reply(struct sk_buff *skb) { - struct sock *sk = icmpv6_socket->sk; + struct sock *sk; struct inet6_dev *idev; - struct ipv6_pinfo *np = inet6_sk(sk); + struct ipv6_pinfo *np; struct in6_addr *saddr = NULL; struct icmp6hdr *icmph = (struct icmp6hdr *) skb->h.raw; struct icmp6hdr tmp_hdr; @@ -454,6 +457,9 @@ static void icmpv6_echo_reply(struct sk_ if (icmpv6_xmit_lock()) return; + sk = icmpv6_socket->sk; + np = inet6_sk(sk); + if (!fl.oif && ipv6_addr_is_multicast(&fl.fl6_dst)) fl.oif = np->mcast_oif; -- YOSHIFUJI Hideaki @ USAGI Project GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA From willy@w.ods.org Sun Jun 12 07:25:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 07:25:22 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CEPIXq022271 for ; Sun, 12 Jun 2005 07:25:19 -0700 Date: Sun, 12 Jun 2005 16:24:01 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612142401.GA10772@alpha.home.local> References: <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133349.GA6279@gondor.apana.org.au> <20050612134725.GB8951@alpha.home.local> <20050612135018.GA10910@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612135018.GA10910@gondor.apana.org.au> User-Agent: Mutt/1.4i X-archive-position: 2385 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1697 Lines: 40 On Sun, Jun 12, 2005 at 11:50:18PM +1000, Herbert Xu wrote: > On Sun, Jun 12, 2005 at 03:47:25PM +0200, Willy Tarreau wrote: > > > > Yes, but only if there's an ACK and the ACK is exactly equal to snd_next, > > so the connection will survive. > > Sorry I wasn't thinking straight. > > > > > > My point is that there are many ways to kill TCP connections in ways > > > similar to what you proposed initially so it isn't that special. > > > > No, there are plenty of ways to kill TCP connections when you can guess > > the window (which is more and more easy thanks to window scaling). But > > I have yet found no way to kill a TCP session without this info, except > > by exploiting the simultaneous connect feature. > > I still stand by this point though. The most obvious thing I can think > of right now is to change your attack to simply connect to kernel.org's > webserver first from source port 10000. That will cause the real SYN > packet to fail the sequence number check. This case is interesting, but it will be resolved in two possible ways : 1) no firewall in front of A - C spoofs A and sends a fake SYN to B - B responds to A with a SYN-ACK - A sends an RST to B, which clears the session - A wants to connect and sends its SYN to B which accepts it. 2) A behind a firewall - C spoofs A and sends a fake SYN to B - B responds to A with a SYN-ACK, which does not reach A (firewall, etc...) - A tries to connect to B and sends its SYN with a different SEQ - B responds to A with only an ACK (no SYN) indicating the expected SEQ. - A responds to B's ACK with an RST and B flushes its session too. - A resends its SYN to B which accepts it. Cheers, Willy From tgraf@suug.ch Sun Jun 12 07:45:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 07:45:20 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CEjFXq023431 for ; Sun, 12 Jun 2005 07:45:15 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 5D6B31C0ED; Sun, 12 Jun 2005 16:44:26 +0200 (CEST) Date: Sun, 12 Jun 2005 16:44:26 +0200 From: Thomas Graf To: Willy Tarreau Cc: Herbert Xu , davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612144426.GC22463@postel.suug.ch> References: <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133654.GA8951@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612133654.GA8951@alpha.home.local> X-archive-position: 2386 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 886 Lines: 18 * Willy Tarreau <20050612133654.GA8951@alpha.home.local> 2005-06-12 15:36 > > The RST packet is sent by client A using its sequence numbers. Therefore > > it will pass the sequence number check on server B. > > > > 4) server B resets the connection. > > No, precisely the RST sent by A will take its SEQ from C's ACK number. > This is why B will *not* reset the connection (again, tested) if C's ACK > was not within B's window. Absolutely but it relies on the other stack being correctly implemented. The attack would work perfectly fine if there wasn't the rule that a RST must not be sent in response to another RST. The attack has been successful and still is because some firewalls are configured to send RSTs without respecting this rule. I like your patch and the idea behind it, it can successfully defeat the most simple method of preventing connections being established. From willy@w.ods.org Sun Jun 12 08:04:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 08:04:05 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CF40Xq024400 for ; Sun, 12 Jun 2005 08:04:01 -0700 Date: Sun, 12 Jun 2005 17:02:39 +0200 From: Willy Tarreau To: Thomas Graf Cc: Herbert Xu , davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612150239.GA10865@alpha.home.local> References: <20050611195144.GF28759@alpha.home.local> <20050612081327.GA24384@gondor.apana.org.au> <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133654.GA8951@alpha.home.local> <20050612144426.GC22463@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612144426.GC22463@postel.suug.ch> User-Agent: Mutt/1.4i X-archive-position: 2387 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1692 Lines: 38 On Sun, Jun 12, 2005 at 04:44:26PM +0200, Thomas Graf wrote: > * Willy Tarreau <20050612133654.GA8951@alpha.home.local> 2005-06-12 15:36 > > > The RST packet is sent by client A using its sequence numbers. Therefore > > > it will pass the sequence number check on server B. > > > > > > 4) server B resets the connection. > > > > No, precisely the RST sent by A will take its SEQ from C's ACK number. > > This is why B will *not* reset the connection (again, tested) if C's ACK > > was not within B's window. > > Absolutely but it relies on the other stack being correctly implemented. > The attack would work perfectly fine if there wasn't the rule that a RST > must not be sent in response to another RST. Of course, if you target a buggy stack, you can expect anything. > The attack has been successful and still is because some firewalls > are configured to send RSTs without respecting this rule. In fact, I voluntarily did not speak about firewalls because almost all of them are very sensible to TCP DoSes. First of all, all those which don't check sequence numbers will blindly kill a session when they receive an RST. And some of the other ones will not let certain ACK packets pass through, which will make other DoS described in this thread effective while it should not be, by not letting the server tell the client to reset its session when really needed. > I like your patch and the idea behind it, it can successfully defeat the > most simple method of preventing connections being established. That's what I thought, too. I have another one for 2.4.31 which only adds a CONFIG option to remove the associated code, which reduces the image by 400 bytes. Cheers, Willy From vda@ilport.com.ua Sun Jun 12 10:11:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 10:12:01 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5CHBuXq032137 for ; Sun, 12 Jun 2005 10:11:57 -0700 Received: (qmail 29506 invoked by alias); 12 Jun 2005 17:10:46 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 12 Jun 2005 17:10:37 -0000 From: Denis Vlasenko To: Willy Tarreau , "David S. Miller" Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Date: Sun, 12 Jun 2005 20:10:33 +0300 User-Agent: KMail/1.5.4 Cc: xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com References: <42A9C607.4030209@unixtrix.com> <20050611062413.GA1324@pcw.home.local> <20050611074350.GD28759@alpha.home.local> In-Reply-To: <20050611074350.GD28759@alpha.home.local> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506122010.33075.vda@ilport.com.ua> X-archive-position: 2388 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 187 Lines: 7 > Does it seem appropriate for mainline ? In this case, I would also backport > it to 2.4 and send it to you for inclusion. It does not contain a comment why it is configurable. -- vda From willy@w.ods.org Sun Jun 12 10:37:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 10:37:43 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CHbcXq000796 for ; Sun, 12 Jun 2005 10:37:39 -0700 Date: Sun, 12 Jun 2005 19:36:14 +0200 From: Willy Tarreau To: Denis Vlasenko Cc: "David S. Miller" , xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612173614.GA11157@alpha.home.local> References: <42A9C607.4030209@unixtrix.com> <20050611062413.GA1324@pcw.home.local> <20050611074350.GD28759@alpha.home.local> <200506122010.33075.vda@ilport.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200506122010.33075.vda@ilport.com.ua> User-Agent: Mutt/1.4i X-archive-position: 2389 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 4502 Lines: 109 On Sun, Jun 12, 2005 at 08:10:33PM +0300, Denis Vlasenko wrote: > > Does it seem appropriate for mainline ? In this case, I would also backport > > it to 2.4 and send it to you for inclusion. > > It does not contain a comment why it is configurable. You're right. Better with this ? Willy -- diff -pruNX dontdiff linux-2.6.11.11/Documentation/networking/ip-sysctl.txt linux-2.6.11.11-tcp/Documentation/networking/ip-sysctl.txt --- linux-2.6.11.11/Documentation/networking/ip-sysctl.txt Sun Mar 6 13:08:46 2005 +++ linux-2.6.11.11-tcp/Documentation/networking/ip-sysctl.txt Sun Jun 12 19:28:50 2005 @@ -368,6 +368,27 @@ tcp_frto - BOOLEAN where packet loss is typically due to random radio interference rather than intermediate router congestion. +tcp_simult_connect - BOOLEAN + Enables TCP simultaneous connect feature conforming to RFC793. + Strict implementation of RFC793 (TCP) requires support for a feature + called "simultaneous connect", which allows two clients to connect to + each other without anyone entering a listening state. While almost + never used, and supported by few OSes, Linux supports this feature. + + However, it introduces a weakness in the protocol which makes it very + easy for an attacker to prevent a client from connecting to a known + server. The attacker only has to guess the source port to shut down + the client connection during its establishment. The impact is limited, + but it may be used to prevent an antivirus or IPS from fetching updates + and not detecting an attack, or to prevent an SSL gateway from fetching + a CRL for example. + + If you want backwards compatibility with every possible application, + you should set it to 1. If you prefer to enhance security on your + systems at the risk of breaking very rare specific applications, you'd + better let it to 0. + Default: 0 + somaxconn - INTEGER Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 128. See also tcp_max_syn_backlog for additional tuning diff -pruNX dontdiff linux-2.6.11.11/include/linux/sysctl.h linux-2.6.11.11-tcp/include/linux/sysctl.h --- linux-2.6.11.11/include/linux/sysctl.h Sun Jun 12 10:44:01 2005 +++ linux-2.6.11.11-tcp/include/linux/sysctl.h Sat Jun 11 09:00:22 2005 @@ -345,6 +345,7 @@ enum NET_TCP_MODERATE_RCVBUF=106, NET_TCP_TSO_WIN_DIVISOR=107, NET_TCP_BIC_BETA=108, + NET_TCP_SIMULT_CONNECT=109, }; enum { diff -pruNX dontdiff linux-2.6.11.11/include/net/tcp.h linux-2.6.11.11-tcp/include/net/tcp.h --- linux-2.6.11.11/include/net/tcp.h Sun Jun 12 10:44:01 2005 +++ linux-2.6.11.11-tcp/include/net/tcp.h Sat Jun 11 08:56:16 2005 @@ -608,6 +608,7 @@ extern int sysctl_tcp_bic_low_window; extern int sysctl_tcp_bic_beta; extern int sysctl_tcp_moderate_rcvbuf; extern int sysctl_tcp_tso_win_divisor; +extern int sysctl_tcp_simult_connect; extern atomic_t tcp_memory_allocated; extern atomic_t tcp_sockets_allocated; diff -pruNX dontdiff linux-2.6.11.11/net/ipv4/sysctl_net_ipv4.c linux-2.6.11.11-tcp/net/ipv4/sysctl_net_ipv4.c --- linux-2.6.11.11/net/ipv4/sysctl_net_ipv4.c Sun Jun 12 10:44:01 2005 +++ linux-2.6.11.11-tcp/net/ipv4/sysctl_net_ipv4.c Sat Jun 11 08:55:27 2005 @@ -690,6 +690,14 @@ ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = NET_TCP_SIMULT_CONNECT, + .procname = "tcp_simult_connect", + .data = &sysctl_tcp_simult_connect, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0 } }; diff -pruNX dontdiff linux-2.6.11.11/net/ipv4/tcp_input.c linux-2.6.11.11-tcp/net/ipv4/tcp_input.c --- linux-2.6.11.11/net/ipv4/tcp_input.c Sun Jun 12 10:44:01 2005 +++ linux-2.6.11.11-tcp/net/ipv4/tcp_input.c Sun Jun 12 19:33:56 2005 @@ -84,6 +84,7 @@ int sysctl_tcp_adv_win_scale = 2; int sysctl_tcp_stdurg; int sysctl_tcp_rfc1337; +int sysctl_tcp_simult_connect; int sysctl_tcp_max_orphans = NR_FILE; int sysctl_tcp_frto; int sysctl_tcp_nometrics_save; @@ -4620,10 +4621,12 @@ discard: if (tp->rx_opt.ts_recent_stamp && tp->rx_opt.saw_tstamp && tcp_paws_check(&tp->rx_opt, 0)) goto discard_and_undo; - if (th->syn) { + if (th->syn && sysctl_tcp_simult_connect) { /* We see SYN without ACK. It is attempt of * simultaneous connect with crossed SYNs. * Particularly, it can be connect to self. + * This feature is disabled by default as it introduces a + * weakness in the protocol. It can be enabled by a sysctl. */ tcp_set_state(sk, TCP_SYN_RECV); From vda@ilport.com.ua Sun Jun 12 10:48:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 10:48:38 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5CHmUXq001566 for ; Sun, 12 Jun 2005 10:48:33 -0700 Received: (qmail 31069 invoked by alias); 12 Jun 2005 17:47:17 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 12 Jun 2005 17:47:11 -0000 From: Denis Vlasenko To: Willy Tarreau Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Date: Sun, 12 Jun 2005 20:47:07 +0300 User-Agent: KMail/1.5.4 Cc: "David S. Miller" , xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com References: <42A9C607.4030209@unixtrix.com> <200506122010.33075.vda@ilport.com.ua> <20050612173614.GA11157@alpha.home.local> In-Reply-To: <20050612173614.GA11157@alpha.home.local> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506122047.07257.vda@ilport.com.ua> X-archive-position: 2390 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 741 Lines: 22 On Sunday 12 June 2005 20:36, Willy Tarreau wrote: > On Sun, Jun 12, 2005 at 08:10:33PM +0300, Denis Vlasenko wrote: > > > Does it seem appropriate for mainline ? In this case, I would also backport > > > it to 2.4 and send it to you for inclusion. > > > > It does not contain a comment why it is configurable. > > You're right. Better with this ? Very nice. BTW, is there any real world applications which ever used this? > + If you want backwards compatibility with every possible application, > + you should set it to 1. If you prefer to enhance security on your > + systems at the risk of breaking very rare specific applications, you'd > + better let it to 0. > + Default: 0 This text leaves an impression that they exist. -- vda From willy@w.ods.org Sun Jun 12 11:15:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 11:15:51 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CIFkXq002953 for ; Sun, 12 Jun 2005 11:15:47 -0700 Date: Sun, 12 Jun 2005 20:14:25 +0200 From: Willy Tarreau To: Denis Vlasenko Cc: "David S. Miller" , xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050612181425.GA11284@alpha.home.local> References: <42A9C607.4030209@unixtrix.com> <200506122010.33075.vda@ilport.com.ua> <20050612173614.GA11157@alpha.home.local> <200506122047.07257.vda@ilport.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200506122047.07257.vda@ilport.com.ua> User-Agent: Mutt/1.4i X-archive-position: 2391 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1323 Lines: 32 On Sun, Jun 12, 2005 at 08:47:07PM +0300, Denis Vlasenko wrote: > On Sunday 12 June 2005 20:36, Willy Tarreau wrote: > > On Sun, Jun 12, 2005 at 08:10:33PM +0300, Denis Vlasenko wrote: > > > > Does it seem appropriate for mainline ? In this case, I would also backport > > > > it to 2.4 and send it to you for inclusion. > > > > > > It does not contain a comment why it is configurable. > > > > You're right. Better with this ? > > Very nice. BTW, is there any real world applications which > ever used this? Not that I'm aware of, but that does not mean they don't exist. Until yesterday, I even thought that it was never implemented. As most other systems don't implement it, the applications, if they exist, are Linux or BSD-dependant. It's even difficult to use because the two ends must loop around the connect() call until it succeeds, because as long as they're not both trying to connect, they get RSTs back. > > + If you want backwards compatibility with every possible application, > > + you should set it to 1. If you prefer to enhance security on your > > + systems at the risk of breaking very rare specific applications, you'd > > + better let it to 0. > > + Default: 0 > > This text leaves an impression that they exist. In doubt, I consider that they might exist. It's just like martians :-) Willy From rahulhsaxena@gmail.com Sun Jun 12 13:06:12 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 13:06:15 -0700 (PDT) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CK6BXq009371 for ; Sun, 12 Jun 2005 13:06:11 -0700 Received: by zproxy.gmail.com with SMTP id 34so809535nzf for ; Sun, 12 Jun 2005 13:05:01 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=NX1fsT9D+9c9239sWhUB8uKg2thFgoxj3f0gdqrNZIaDdD+/BkV0e9BYxg1/vHdws+QV4svOSEHkagTIb6n7ySIv9SHDvHiXXRDLPe6UYKIzlrVWcNaGXHJHUN7ECvlA1q+8l8c/mq9VkOxSsJdptk/FXHNnlmzMwtYSCfu5Q8g= Received: by 10.36.222.70 with SMTP id u70mr2470953nzg; Sun, 12 Jun 2005 13:05:01 -0700 (PDT) Received: by 10.36.4.6 with HTTP; Sun, 12 Jun 2005 13:05:01 -0700 (PDT) Message-ID: <4532f3170506121305327ad0f6@mail.gmail.com> Date: Mon, 13 Jun 2005 01:35:01 +0530 From: Rahul Hari Reply-To: rahul.hari@cse06.itbhu.org To: Thomas Graf Subject: Re: testing techniques to confirm the effectiveness of changes made to sch_gred.c Cc: netdev@oss.sgi.com, netdev@vger.kernel.org, lartc-request@mailman.ds9a.nl, diffserv-general@lists.sourceforge.net In-Reply-To: <20050612104628.GA22463@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline References: <4532f3170506101739702e31ad@mail.gmail.com> <20050612104628.GA22463@postel.suug.ch> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5CK6BXq009371 X-archive-position: 2392 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rahulhsaxena@gmail.com Precedence: bulk X-list: netdev Content-Length: 1200 Lines: 33 > > 1) Since the process deals with dequeueing, i have to make changes to > > gred_dequeue only. If t->tab[0] != 0 then we dequeue the packet > > otherwise do not dequeue it. > > What you describe above is: only dequeue when DP 0 is configured, > probably not what you want. The only way to prioritize within gred > the way you want is to modify dequeue() that it iterates through > sch->q looking for a skb with tcindex==DP0 and use it instead of > the skb at the queue head. > Thanks for the reply Thomas, by checking t->tab[0]!=0, the approach I wanted to follow was that "if I have a packet in the virtual queue with DP 0, then I should not be dequeuing any packets from the other virtual queues", ie, take no action at all. with best regards, Rahul -- ---------------------- "The fear you let build up in your mind is worse than the situation that actually exists" from "who moved my cheese" --------------------------------------------------------------------------------- Rahul Hari Senior Under Grad. Student, Department of CSE, ITBHU, Varanasi. Ph: +91-9845347020 rahul.hari@cse06.itbhu.org ------------------------------------------------------------------------------------------ From juhl-lkml@dif.dk Sun Jun 12 15:01:31 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 15:01:34 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5CM1UXq012946 for ; Sun, 12 Jun 2005 15:01:30 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 9C96FFFC6A for ; Mon, 13 Jun 2005 00:08:05 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 08822-08 for ; Mon, 13 Jun 2005 00:08:03 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id DA7C5FFC7C for ; Mon, 13 Jun 2005 00:08:01 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Sun, 12 Jun 2005 23:59:15 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYPSJA; Mon, 13 Jun 2005 00:00:12 +0200 Date: Mon, 13 Jun 2005 00:05:33 +0200 (CEST) From: Jesper Juhl To: LKML Cc: "David S. Miller" , Ross Biro , netdev@oss.sgi.com Subject: [PATCH] net: fix sparse warning (plain int as NULL) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2393 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 784 Lines: 30 Here's a patch to fix a small sparse warning in net/ipv4/tcp_input.c : net/ipv4/tcp_input.c:4179:29: warning: Using plain integer as NULL pointer Signed-off-by: Jesper Juhl --- net/ipv4/tcp_input.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.12-rc6-mm1-orig/net/ipv4/tcp_input.c 2005-06-12 15:58:58.000000000 +0200 +++ linux-2.6.12-rc6-mm1/net/ipv4/tcp_input.c 2005-06-12 23:58:41.000000000 +0200 @@ -4176,7 +4176,7 @@ int tcp_rcv_state_process(struct sock *s */ if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr && !tp->srtt) - tcp_ack_saw_tstamp(tp, 0, 0); + tcp_ack_saw_tstamp(tp, NULL, 0); if (tp->rx_opt.tstamp_ok) tp->advmss -= TCPOLEN_TSTAMP_ALIGNED; Please CC me on replies. From Valdis.Kletnieks@vt.edu Sun Jun 12 19:06:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 19:06:23 -0700 (PDT) Received: from h80ad2736.async.vt.edu (h80ad2736.async.vt.edu [128.173.39.54]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5D26EXq024175 for ; Sun, 12 Jun 2005 19:06:17 -0700 Received: from turing-police.cc.vt.edu (localhost [127.0.0.1]) by turing-police.cc.vt.edu (8.13.4/8.13.4) with ESMTP id j5D24EOE005565; Sun, 12 Jun 2005 22:04:16 -0400 Message-Id: <200506130204.j5D24EOE005565@turing-police.cc.vt.edu> X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.1-RC3 To: Willy Tarreau Cc: Denis Vlasenko , "David S. Miller" , xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) In-Reply-To: Your message of "Sun, 12 Jun 2005 20:14:25 +0200." <20050612181425.GA11284@alpha.home.local> From: Valdis.Kletnieks@vt.edu References: <42A9C607.4030209@unixtrix.com> <200506122010.33075.vda@ilport.com.ua> <20050612173614.GA11157@alpha.home.local> <200506122047.07257.vda@ilport.com.ua> <20050612181425.GA11284@alpha.home.local> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1118628251_24934P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Sun, 12 Jun 2005 22:04:12 -0400 X-archive-position: 2394 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Valdis.Kletnieks@vt.edu Precedence: bulk X-list: netdev Content-Length: 1034 Lines: 31 --==_Exmh_1118628251_24934P Content-Type: text/plain; charset=us-ascii On Sun, 12 Jun 2005 20:14:25 +0200, Willy Tarreau said: > On Sun, Jun 12, 2005 at 08:47:07PM +0300, Denis Vlasenko wrote: > > Very nice. BTW, is there any real world applications which > > ever used this? > > Not that I'm aware of, but that does not mean they don't exist. Until > yesterday, I even thought that it was never implemented. As most other > systems don't implement it, the applications, if they exist, are Linux > or BSD-dependant. A more likely explanation is that there existed TOPS-20 or Multics code that actually used that for something. Remember that BSD and Linux both came along long after RFC793 came out.... --==_Exmh_1118628251_24934P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iD8DBQFCrOmbcC3lWbTT17ARAh6UAKDllrvzzA/u7DLz7U465OhXZSqJLACg8e8B SVOuvr28yBxqHicG3qQptUg= =Bwr2 -----END PGP SIGNATURE----- --==_Exmh_1118628251_24934P-- From jgarzik@pobox.com Sun Jun 12 20:32:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 20:32:50 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5D3WkXq030525 for ; Sun, 12 Jun 2005 20:32:47 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1Dhffb-0005HQ-4E; Mon, 13 Jun 2005 03:31:28 +0000 Message-ID: <42ACFE0C.1080604@pobox.com> Date: Sun, 12 Jun 2005 23:31:24 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "John W. Linville" CC: netdev@oss.sgi.com, linux-kernel@vger.kernel.org, akpm@osdl.org Subject: Re: [patch 2.6.12-rc6] 3c59x: remove superfluous vortex_debug test from boomerang_start_xmit References: <20050610142702.GC10449@tuxdriver.com> In-Reply-To: <20050610142702.GC10449@tuxdriver.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2395 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 270 Lines: 9 John W. Linville wrote: > Remove the superfluous test of "if (vortex_debug > 3)" inside the > "if (vortex_debug > 6)" clause early in boomerang_start_xmit. > > Signed-off-by: John W. Linville ACK (I presume akpm will send this one upstream) From herbert@gondor.apana.org.au Sun Jun 12 21:49:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 21:50:01 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5D4ntXq000868 for ; Sun, 12 Jun 2005 21:49:56 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1Dhgry-0001dY-00; Mon, 13 Jun 2005 14:48:18 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1Dhgrq-0008NE-00; Mon, 13 Jun 2005 14:48:10 +1000 Date: Mon, 13 Jun 2005 14:48:10 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050613044810.GA32103@gondor.apana.org.au> References: <20050612083409.GA8220@alpha.home.local> <20050612103020.GA25111@gondor.apana.org.au> <20050612114039.GI28759@alpha.home.local> <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133349.GA6279@gondor.apana.org.au> <20050612134725.GB8951@alpha.home.local> <20050612135018.GA10910@gondor.apana.org.au> <20050612142401.GA10772@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050612142401.GA10772@alpha.home.local> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2396 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1013 Lines: 25 On Sun, Jun 12, 2005 at 04:24:01PM +0200, Willy Tarreau wrote: > > 1) no firewall in front of A > - C spoofs A and sends a fake SYN to B > - B responds to A with a SYN-ACK > - A sends an RST to B, which clears the session > - A wants to connect and sends its SYN to B which accepts it. Well the attacker simply has to keep sending the same SYN packet over and over again until A runs out of SYN retries. What I really don't like about your patch is the fact that it is trying to impose a policy decision (that of forbidding all simultaneous connection initiations) inside the TCP stack. A much better place to do that is netfilter. If you do it there then not only will your protect all Linux machines from this attack, but you'll also protect all the other BSD-derived TCP stacks. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From willy@w.ods.org Sun Jun 12 23:19:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 12 Jun 2005 23:19:13 -0700 (PDT) Received: from willy.net1.nerim.net (willy.net1.nerim.net [62.212.114.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5D6J9Xq005609 for ; Sun, 12 Jun 2005 23:19:10 -0700 Date: Mon, 13 Jun 2005 08:17:48 +0200 From: Willy Tarreau To: Herbert Xu Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050613061748.GA13144@alpha.home.local> References: <20050612120627.GA5858@gondor.apana.org.au> <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133349.GA6279@gondor.apana.org.au> <20050612134725.GB8951@alpha.home.local> <20050612135018.GA10910@gondor.apana.org.au> <20050612142401.GA10772@alpha.home.local> <20050613044810.GA32103@gondor.apana.org.au> <20050613052148.GF8907@alpha.home.local> <20050613052404.GA7611@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050613052404.GA7611@gondor.apana.org.au> User-Agent: Mutt/1.4i X-archive-position: 2399 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: willy@w.ods.org Precedence: bulk X-list: netdev Content-Length: 1110 Lines: 26 On Mon, Jun 13, 2005 at 03:24:04PM +1000, Herbert Xu wrote: > On Mon, Jun 13, 2005 at 07:21:48AM +0200, Willy Tarreau wrote: > > > > > A much better place to do that is netfilter. If you do it there > > > then not only will your protect all Linux machines from this attack, > > > but you'll also protect all the other BSD-derived TCP stacks. > > > > Netfilter already blocks simultaneous connection. A SYN in return to > > a SYN produces an INVALID state. > > Any reason why that isn't enough? I don't think there are a lot of people who load ip_conntrack and insert a single DROP rule on their servers just to workaround weaknesses in the TCP stack. If they did, they would not be more confident into netfilter either because it would be logical to expect the same reasoning (eg: let's not fix XX here, TCP will catch it). What's the problem with the sysctl ? If you prefer, I can change the patch to keep the feature enabled by default so that only people aware of the problem have to fix it by hand. But I found it better the other way : people who need the feature enable it by hand. Cheers, willy From herbert@gondor.apana.org.au Mon Jun 13 00:41:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 13 Jun 2005 00:41:43 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5D7fYXq019822 for ; Mon, 13 Jun 2005 00:41:35 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhjY6-0002TL-00; Mon, 13 Jun 2005 17:39:58 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhjY3-0005d0-00; Mon, 13 Jun 2005 17:39:55 +1000 Date: Mon, 13 Jun 2005 17:39:55 +1000 To: "David S. Miller" , James Morris , Patrick McHardy , YOSHIFUJI Hideaki , netdev@oss.sgi.com Subject: [net-2.6.13 3/3] [IPSEC] Add XFRM_STATE_NOPMTUDISC flag Message-ID: <20050613073955.GC21545@gondor.apana.org.au> References: <20050613073353.GA21454@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="pAwQNkOnpTn9IO2O" Content-Disposition: inline In-Reply-To: <20050613073353.GA21454@gondor.apana.org.au> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2403 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 4240 Lines: 134 --pAwQNkOnpTn9IO2O Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi: This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states. It is similar to the nopmtudisc on IPIP/GRE tunnels. It only has an effect on IPv4 tunnel mode states. For these states, it will ensure that the DF flag is always cleared. This is primarily useful to work around ICMP blackholes. In future this flag could also allow a larger MTU to be set within the tunnel just like IPIP/GRE tunnels. This could be useful for short haul tunnels where temporary fragmentation outside the tunnel is desired over smaller fragments inside the tunnel. Signed-off-by: Herbert Xu -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --pAwQNkOnpTn9IO2O Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="p3.patch" diff --git a/include/linux/pfkeyv2.h b/include/linux/pfkeyv2.h --- a/include/linux/pfkeyv2.h +++ b/include/linux/pfkeyv2.h @@ -245,6 +245,7 @@ struct sadb_x_nat_t_port { /* Security Association flags */ #define SADB_SAFLAGS_PFS 1 +#define SADB_SAFLAGS_NOPMTUDISC 0x20000000 #define SADB_SAFLAGS_DECAP_DSCP 0x40000000 #define SADB_SAFLAGS_NOECN 0x80000000 diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h --- a/include/linux/xfrm.h +++ b/include/linux/xfrm.h @@ -196,6 +196,7 @@ struct xfrm_usersa_info { __u8 flags; #define XFRM_STATE_NOECN 1 #define XFRM_STATE_DECAP_DSCP 2 +#define XFRM_STATE_NOPMTUDISC 4 }; struct xfrm_usersa_id { diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c --- a/net/ipv4/xfrm4_output.c +++ b/net/ipv4/xfrm4_output.c @@ -33,6 +33,7 @@ static void xfrm4_encap(struct sk_buff * struct dst_entry *dst = skb->dst; struct xfrm_state *x = dst->xfrm; struct iphdr *iph, *top_iph; + int flags; iph = skb->nh.iph; skb->h.ipiph = iph; @@ -51,10 +52,13 @@ static void xfrm4_encap(struct sk_buff * /* DS disclosed */ top_iph->tos = INET_ECN_encapsulate(iph->tos, iph->tos); - if (x->props.flags & XFRM_STATE_NOECN) + + flags = x->props.flags; + if (flags & XFRM_STATE_NOECN) IP_ECN_clear(top_iph); - top_iph->frag_off = iph->frag_off & htons(IP_DF); + top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ? + 0 : (iph->frag_off & htons(IP_DF)); if (!top_iph->frag_off) __ip_select_ident(top_iph, dst, 0); diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c --- a/net/ipv4/xfrm4_state.c +++ b/net/ipv4/xfrm4_state.c @@ -7,12 +7,20 @@ * */ +#include #include #include #include static struct xfrm_state_afinfo xfrm4_state_afinfo; +static int xfrm4_init_flags(struct xfrm_state *x) +{ + if (ipv4_config.no_pmtu_disc) + x->props.flags |= XFRM_STATE_NOPMTUDISC; + return 0; +} + static void __xfrm4_init_tempsel(struct xfrm_state *x, struct flowi *fl, struct xfrm_tmpl *tmpl, @@ -109,6 +117,7 @@ __xfrm4_find_acq(u8 mode, u32 reqid, u8 static struct xfrm_state_afinfo xfrm4_state_afinfo = { .family = AF_INET, .lock = RW_LOCK_UNLOCKED, + .init_flags = xfrm4_init_flags, .init_tempsel = __xfrm4_init_tempsel, .state_lookup = __xfrm4_state_lookup, .find_acq = __xfrm4_find_acq, diff --git a/net/key/af_key.c b/net/key/af_key.c --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -690,6 +690,8 @@ static struct sk_buff * pfkey_xfrm_state sa->sadb_sa_flags |= SADB_SAFLAGS_NOECN; if (x->props.flags & XFRM_STATE_DECAP_DSCP) sa->sadb_sa_flags |= SADB_SAFLAGS_DECAP_DSCP; + if (x->props.flags & XFRM_STATE_NOPMTUDISC) + sa->sadb_sa_flags |= SADB_SAFLAGS_NOPMTUDISC; /* hard time */ if (hsc & 2) { @@ -974,6 +976,8 @@ static struct xfrm_state * pfkey_msg2xfr x->props.flags |= XFRM_STATE_NOECN; if (sa->sadb_sa_flags & SADB_SAFLAGS_DECAP_DSCP) x->props.flags |= XFRM_STATE_DECAP_DSCP; + if (sa->sadb_sa_flags & SADB_SAFLAGS_NOPMTUDISC) + x->props.flags |= XFRM_STATE_NOPMTUDISC; lifetime = (struct sadb_lifetime*) ext_hdrs[SADB_EXT_LIFETIME_HARD-1]; if (lifetime != NULL) { --pAwQNkOnpTn9IO2O-- From herbert@gondor.apana.org.au Mon Jun 13 00:47:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 13 Jun 2005 00:47:10 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5D7l3Xq020813 for ; Mon, 13 Jun 2005 00:47:04 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DhjdO-0002UK-00; Mon, 13 Jun 2005 17:45:26 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DhjdJ-0005e5-00; Mon, 13 Jun 2005 17:45:21 +1000 Date: Mon, 13 Jun 2005 17:45:21 +1000 To: Willy Tarreau Cc: davem@davemloft.net, xschmi00@stud.feec.vutbr.cz, alastair@unixtrix.com, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] fix small DoS on connect() (was Re: BUG: Unusual TCP Connect() results.) Message-ID: <20050613074521.GA21661@gondor.apana.org.au> References: <20050612123253.GK28759@alpha.home.local> <20050612131323.GA10188@gondor.apana.org.au> <20050612133349.GA6279@gondor.apana.org.au> <20050612134725.GB8951@alpha.home.local> <20050612135018.GA10910@gondor.apana.org.au> <20050612142401.GA10772@alpha.home.local> <20050613044810.GA32103@gondor.apana.org.au> <20050613052148.GF8907@alpha.home.local> <20050613052404.GA7611@gondor.apana.org.au> <20050613061748.GA13144@alpha.home.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050613061748.GA13144@alpha.home.local> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2404 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1073 Lines: 27 On Mon, Jun 13, 2005 at 08:17:48AM +0200, Willy Tarreau wrote: > > What's the problem with the sysctl ? If you prefer, I can change the patch > to keep the feature enabled by default so that only people aware of the > problem have to fix it by hand. But I found it better the other way : people > who need the feature enable it by hand. Well that's exactly my problem :) I reckon it should be off by default because the threat posed by this problem is IMHO small compared to some of the other standard threats that are applicable to TCP. Plus this is a well-documented feature so we can't be sure that someone somewhere isn't depending on it. However, if it were off by default then there is very little value in providing it at all since the same thing can be achived easily through netfilter. Anyway, let's leave it to Dave to make the decision. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From grundler@cup.hp.com Tue Jun 14 17:11:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 14 Jun 2005 17:11:57 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5F0BsBK005666 for ; Tue, 14 Jun 2005 17:11:54 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel13.hp.com (Postfix) with ESMTP id AF1681C03F2D; Tue, 14 Jun 2005 08:43:46 -0700 (PDT) Received: from localhost.localdomain (postfix@debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id IAA04370; Tue, 14 Jun 2005 08:37:46 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id 0F3DD8FBD3; Tue, 14 Jun 2005 08:46:25 -0700 (PDT) Date: Tue, 14 Jun 2005 08:46:25 -0700 From: Grant Grundler To: Michael Chan Cc: "David S. Miller" , iod00d@hp.com, netdev@oss.sgi.com Subject: Re: [PATCH] tg3_msi() and weakly ordered memory Message-ID: <20050614154625.GB24371@esmail.cup.hp.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i X-archive-position: 2419 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 801 Lines: 20 On Mon, Jun 13, 2005 at 11:54:23PM -0700, Michael Chan wrote: > > Once you write "0x1" to the mailbox register, the device stops > > updating the status block and stops generating interrupts. > > > > That is what makes a lot of things safe. > > Only interrupts are stopped, status block will still be updated subject to > during-ints coalescing. Will setting during-ints to a very high threshhold essentially allow us to "indefinitely" process stuff without taking any interrupts? Would the threshhold counter get reset every time we write back the status tag WITHOUT re-enableing interrupts? If not, I suspect the CPU will circulate in tg3_poll until during-ints is exhausted and DMA will stop until CPU reenables interrupts. ie not until it's done processing outstanding packets. thanks, grant From grundler@cup.hp.com Tue Jun 14 17:11:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 14 Jun 2005 17:12:00 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5F0BuBK005672 for ; Tue, 14 Jun 2005 17:11:56 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel13.hp.com (Postfix) with ESMTP id A8DB21C031DC; Tue, 14 Jun 2005 08:37:43 -0700 (PDT) Received: from localhost.localdomain (postfix@debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id IAA03952; Tue, 14 Jun 2005 08:31:42 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id B2F1B8FBD3; Tue, 14 Jun 2005 08:40:21 -0700 (PDT) Date: Tue, 14 Jun 2005 08:40:21 -0700 From: Grant Grundler To: Michael Chan Cc: Grant Grundler , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] tg3_msi() and weakly ordered memory Message-ID: <20050614154021.GA24371@esmail.cup.hp.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i X-archive-position: 2420 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 2418 Lines: 54 On Mon, Jun 13, 2005 at 11:46:47PM -0700, Michael Chan wrote: > Yes, you're right. rmb() is needed between reading the tag and tg3_has_work() > to guarantee strict ordering. Otherwise, tg3_has_work() may get ahead and > read stale information that may be older than the tag. Ok - thanks for confirming. I just wasn't sure any more. It's been a year or so since I looked at this last time. > But the clearing of > the SD_STATUS_UPDATED bit does not need any additional barriers. ... > You're right again. The SD_STATUS_UPDATED bit should be cleared right before > checking for new work. Clearing the SD_STATUS_UPDATED bit tells the non-msi > irq handler that all work up to the last status block update has been > processed. ... > The clearing of the SD_STATUS_UPDATED bit does not have to follow very strict > ordering for the following reasons: > > 1. It has no hardware significance. It is purely to tell the irq handler that > the current status block has been processed. For MSI, since we don't even > check that bit in tg3_msi(), we can skip clearing that bit. But I think it is > safer to clear it because tg3_cond_int() is checking it. Ok - I thought the NIC was reading that back for some reason. If we can ignore SD_STATUS_UPDATED and use a flag not related to sblk, I think it would be cacheline friendlier. But it's a minor issue. > 2. We only clear it when interrupt from the NIC is disabled, either in irq > handler or tg3_poll(). So there is no potential contention. > > So the current sequence is fine. ok - thank you for the clarification. > It is important to read the actual status block with the latest indices to > determine whether there is new work, especially in the non-tagged case where > you may have race condition between software and hardware. Certainly...my point was we should not read them on every iteration of the RX or TX loop that processes packets - but rather outside that loop. And I'd think we want to read the three values (status tag, tx_consumer, and rx_producer) as a set - ie read them and process stuff until we've exhausted the "work quota" or the driver has caught up to the HW (status tag stops changing). I don't see a problem with exiting tg3_poll despite more work pending if we know we are going to catch it on the next round. After all, we are polling - latency is going to be semi random depending on where we are in the sequence anyway. thanks, grant From grundler@cup.hp.com Tue Jun 14 17:28:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 14 Jun 2005 17:28:59 -0700 (PDT) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5F0SrBK007175 for ; Tue, 14 Jun 2005 17:28:53 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel11.hp.com (Postfix) with ESMTP id A4D2338DF; Mon, 13 Jun 2005 20:34:42 -0700 (PDT) Received: from localhost.localdomain (postfix@debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id UAA26531; Mon, 13 Jun 2005 20:28:37 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id 2C4FC8FBD3; Mon, 13 Jun 2005 20:37:15 -0700 (PDT) Date: Mon, 13 Jun 2005 20:37:15 -0700 From: Grant Grundler To: "David S. Miller" Cc: mchan@broadcom.com, netdev@oss.sgi.com, iod00d@hp.com Subject: [PATCH] tg3_msi() and weakly ordered memory Message-ID: <20050614033715.GA22376@esmail.cup.hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.9i X-archive-position: 2421 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 2436 Lines: 64 Dave, I reviewed the "new" (to me) use of tags and MSI in tg3_msi() and tg3_poll() and I like the new scheme. It's pretty clean. But I did come up with four potential "issues" - mostly revolving around enforcing order of memory access on weakly ordered platforms: 1) tg3_poll() and tg3_msi() are not consistent with use of rmb(). tg3_poll has an rmb() between reading status_tag and tg3_has_work(). The patch (against tg3 v3.29) below adds a similar rmb() to tg3_msi(). Does tg3_msi() need a "rmb()" like in the attached patch? Or rather a mb() to deal with clearing SD_STATUS_UPDATED bit? 2) tg3_poll() and tg3_msi() are not consistent on how they clear the SD_STATUS_UPDATED bit. tg3_poll() does not clear SD_STATUS_UPDATED bit after reading status_tag. I think everytime the driver discovers the status_tag changed, it should to clear SD_STATUS_UPDATED. Michael, can you confirm/deny that offhand? I'm not sure anymore what order the sblk fields (status_tag, tx_consumer, and rx_producer) should be read before clearing SD_STATUS_UPDATED bit. I expect a recommended order exists. ISTR something like: read status_tag rmb() read tx_consumer and rx_producer mb() clear SD_STATUS_UPDATED 3) Based on the above sequence, tg3 might need one more rmb() between reading sblk status_tag and the inline code for tg3_has_work(). 4) I'd also prefer if tg3 would read tx_consumer/rx_producer fields *only* in tg3_msi() and tg3_poll() when sblk status_tag is read. All other references (e.g. tg3_has_work(), tg3_rx(), etc) would use a cached copy of those fields. My goal would be to reduce the competition for access to sblk cacheline and get the memory ordering issues right. My fear is regularly reading the cacheline by the CPU will take away exclusive (write) access from the IO subsystem and ping-pong the cacheline more often than necessary. Would you entertain a patch for this? thanks, grant Signed-off-by: Grant Grundler --- a/drivers/net/tg3.c 25 May 2005 17:12:47 -0000 1.35 +++ b/drivers/net/tg3.c 14 Jun 2005 01:37:43 -0000 @@ -2946,6 +2946,7 @@ static irqreturn_t tg3_msi(int irq, void */ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); tp->last_tag = sblk->status_tag; + rmb(): sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ From yoshfuji@linux-ipv6.org Tue Jun 14 17:54:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 14 Jun 2005 17:54:46 -0700 (PDT) Received: from yue.st-paulia.net ([203.178.140.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5F0scBK008287 for ; Tue, 14 Jun 2005 17:54:38 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.st-paulia.net (Postfix) with ESMTP id 7107E33CC2; Wed, 15 Jun 2005 09:53:31 +0900 (JST) Date: Wed, 15 Jun 2005 09:53:30 +0900 (JST) Message-Id: <20050615.095330.94972569.yoshfuji@linux-ipv6.org> To: davem@davemloft.net Cc: netdev@vger.kernel.org, netdev@oss.sgi.com Subject: Netdev List (Re: [PATCH] Ensure to use icmpv6_socket in non-preemptive context.) From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= In-Reply-To: <20050613.150037.70219957.davem@davemloft.net> References: <20050612.230358.119812619.yoshfuji@linux-ipv6.org> <20050613.150037.70219957.davem@davemloft.net> Organization: USAGI Project X-URL: http://www.yoshifuji.org/%7Ehideaki/ X-Fingerprint: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA X-PGP-Key-URL: http://www.yoshifuji.org/%7Ehideaki/hideaki@yoshifuji.org.asc X-Face: "5$Al-.M>NJ%a'@hhZdQm:."qn~PA^gq4o*>iCFToq*bAi#4FRtx}enhuQKz7fNqQz\BYU] $~O_5m-9'}MIs`XGwIEscw;e5b>n"B_?j/AkL~i/MEaZBLP X-Mailer: Mew version 2.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2422 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: yoshfuji@linux-ipv6.org Precedence: bulk X-list: netdev Content-Length: 359 Lines: 11 In article <20050613.150037.70219957.davem@davemloft.net> (at Mon, 13 Jun 2005 15:00:37 -0700 (PDT)), "David S. Miller" says: > Please update your address book, netdev has moved to vger.kernel.org > :-) Oh, I didn't noticed that; Sorry, I could have missed the announcement. Do we have to subscribe new list by ourselves? --yoshfuji From grundler@cup.hp.com Wed Jun 15 12:05:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 12:05:25 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FJ5MH9010536 for ; Wed, 15 Jun 2005 12:05:22 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel13.hp.com (Postfix) with ESMTP id 3FE6D1C04D3E; Tue, 14 Jun 2005 11:01:47 -0700 (PDT) Received: from localhost.localdomain (postfix@debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id KAA13700; Tue, 14 Jun 2005 10:55:46 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id 964AF8FBD3; Tue, 14 Jun 2005 11:04:25 -0700 (PDT) Date: Tue, 14 Jun 2005 11:04:25 -0700 From: Grant Grundler To: Michael Chan Cc: Grant Grundler , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] tg3_msi() and weakly ordered memory Message-ID: <20050614180425.GE24371@esmail.cup.hp.com> References: <20050614154021.GA24371@esmail.cup.hp.com> <1118767397.7059.19.camel@rh4> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1118767397.7059.19.camel@rh4> User-Agent: Mutt/1.5.9i X-archive-position: 2424 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 960 Lines: 32 On Tue, Jun 14, 2005 at 09:43:17AM -0700, Michael Chan wrote: ... > Something like: > > if (sblk->status_tag != tp->last_tag) > clear_interrupt(); > netif_rx_schedule(); > > This way we don't have to clear the SD_STATUS_UPDATED bit. I will > experiment with this and see if it works well. that sounds good - thanks. > I don't think we are reading the index on every iteration. In tg3_rx(), > we read it at the beginning before the loop, and one more time if we > have caught up with the hw index before exiting the loop. oh - sorry - my bad. Same is true for tg3_tx(). And I just noticed I'm smoking crack on "nested locks" too... one is "lock" and the other is "tx_lock". *sigh* - need more sleep. > I mildly disagree. I think we should maximize the amount of work done in > tg3_poll(). For example, reading the rx_producer index one more time > when we have caught up with hw index before exiting the loop is a good > thing IMO. ok. thanks, grant From juhl-lkml@dif.dk Wed Jun 15 12:09:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 12:10:06 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FJ9pH9011302 for ; Wed, 15 Jun 2005 12:09:52 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id A2BADFFC9C for ; Wed, 15 Jun 2005 21:16:28 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 09213-02 for ; Wed, 15 Jun 2005 21:16:28 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id D24DBFFCA6 for ; Wed, 15 Jun 2005 21:16:25 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 21:07:27 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQMN9; Wed, 15 Jun 2005 21:08:24 +0200 Date: Wed, 15 Jun 2005 21:13:52 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: Hideaki YOSHIFUJI , Alexey Kuznetsov , James Morris , "Fred N. van Kempen" , Ross Biro , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [-mm PATCH] signed vs unsigned cleanup in net/ipv4/raw.c Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2425 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 4597 Lines: 140 This patch cleans up some signed versus unsigned variable use in net/ipv4/raw.c Before this patch, building net/ipv4/raw.c from 2.6.12-rc6-mm1 with gcc -W produces a bunch of warnings : net/ipv4/raw.c: In function `raw_send_hdrinc': net/ipv4/raw.c:272: warning: comparison between signed and unsigned net/ipv4/raw.c:301: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_probe_proto_opt': net/ipv4/raw.c:340: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_sendmsg': net/ipv4/raw.c:387: warning: comparison of unsigned expression < 0 is always false net/ipv4/raw.c:405: warning: comparison between signed and unsigned net/ipv4/raw.c:517: warning: signed and unsigned type in conditional expression net/ipv4/raw.c:374: warning: unused parameter `iocb' net/ipv4/raw.c: In function `raw_close': net/ipv4/raw.c:527: warning: unused parameter `timeout' net/ipv4/raw.c: In function `raw_bind': net/ipv4/raw.c:545: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_recvmsg': net/ipv4/raw.c:613: warning: signed and unsigned type in conditional expression net/ipv4/raw.c:565: warning: unused parameter `iocb' net/ipv4/raw.c: In function `raw_seticmpfilter': net/ipv4/raw.c:627: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_geticmpfilter': net/ipv4/raw.c:643: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_seq_stop': net/ipv4/raw.c:799: warning: unused parameter `seq' net/ipv4/raw.c:799: warning: unused parameter `v' net/ipv4/raw.c: In function `raw_seq_open': net/ipv4/raw.c:847: warning: unused parameter `inode' 10 of which are related to signedness issues. With this patch we are down to just 4 signed vs unsigned warnings - cleaning up the last 4 didn't really seem feasible. net/ipv4/raw.c: In function `raw_sendmsg': net/ipv4/raw.c:405: warning: comparison between signed and unsigned net/ipv4/raw.c:374: warning: unused parameter `iocb' net/ipv4/raw.c: In function `raw_close': net/ipv4/raw.c:530: warning: unused parameter `timeout' net/ipv4/raw.c: In function `raw_bind': net/ipv4/raw.c:548: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_recvmsg': net/ipv4/raw.c:568: warning: unused parameter `iocb' net/ipv4/raw.c: In function `raw_seticmpfilter': net/ipv4/raw.c:633: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_geticmpfilter': net/ipv4/raw.c:649: warning: comparison between signed and unsigned net/ipv4/raw.c: In function `raw_seq_stop': net/ipv4/raw.c:805: warning: unused parameter `seq' net/ipv4/raw.c:805: warning: unused parameter `v' net/ipv4/raw.c: In function `raw_seq_open': net/ipv4/raw.c:853: warning: unused parameter `inode' So, here's the patch. I hope you like it and want to merge it :-) Signed-off-by: Jesper Juhl --- net/ipv4/raw.c | 18 ++++++++++++------ 1 files changed, 12 insertions(+), 6 deletions(-) --- linux-2.6.12-rc6-mm1-orig/net/ipv4/raw.c 2005-06-12 15:58:58.000000000 +0200 +++ linux-2.6.12-rc6-mm1/net/ipv4/raw.c 2005-06-15 20:55:06.000000000 +0200 @@ -259,7 +259,7 @@ int raw_rcv(struct sock *sk, struct sk_b return 0; } -static int raw_send_hdrinc(struct sock *sk, void *from, int length, +static int raw_send_hdrinc(struct sock *sk, void *from, size_t length, struct rtable *rt, unsigned int flags) { @@ -298,7 +298,7 @@ static int raw_send_hdrinc(struct sock * goto error_fault; /* We don't modify invalid header */ - if (length >= sizeof(*iph) && iph->ihl * 4 <= length) { + if (length >= sizeof(*iph) && (size_t)(iph->ihl * 4) <= length) { if (!iph->saddr) iph->saddr = rt->rt_src; iph->check = 0; @@ -332,7 +332,7 @@ static void raw_probe_proto_opt(struct f u8 __user *type = NULL; u8 __user *code = NULL; int probed = 0; - int i; + unsigned int i; if (!msg->msg_iov) return; @@ -384,7 +384,7 @@ static int raw_sendmsg(struct kiocb *ioc int err; err = -EMSGSIZE; - if (len < 0 || len > 0xFFFF) + if (len > 0xFFFF) goto out; /* @@ -514,7 +514,10 @@ done: kfree(ipc.opt); ip_rt_put(rt); -out: return err < 0 ? err : len; +out: + if (err < 0) + return err; + return len; do_confirm: dst_confirm(&rt->u.dst); @@ -610,7 +613,10 @@ static int raw_recvmsg(struct kiocb *ioc copied = skb->len; done: skb_free_datagram(sk, skb); -out: return err ? err : copied; +out: + if (err) + return err; + return copied; } static int raw_init(struct sock *sk) Please CC me on replies. -- Jesper Juhl From grundler@cup.hp.com Wed Jun 15 12:05:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 12:05:22 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FJ5JH9010530 for ; Wed, 15 Jun 2005 12:05:19 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel13.hp.com (Postfix) with ESMTP id E9BF01C0377E; Tue, 14 Jun 2005 10:53:02 -0700 (PDT) Received: from localhost.localdomain (postfix@debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id KAA13001; Tue, 14 Jun 2005 10:47:02 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id 4C7EF8FBD3; Tue, 14 Jun 2005 10:55:41 -0700 (PDT) Date: Tue, 14 Jun 2005 10:55:41 -0700 From: Grant Grundler To: Michael Chan Cc: Grant Grundler , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] tg3_msi() and weakly ordered memory Message-ID: <20050614175541.GD24371@esmail.cup.hp.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i X-archive-position: 2423 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 2495 Lines: 56 On Mon, Jun 13, 2005 at 11:46:47PM -0700, Michael Chan wrote: > rmb() is needed between reading the tag and tg3_has_work() > to guarantee strict ordering. Thinking about this more... tg3_has_work() could be reduced to comparing status tag with last_tag (vs each of the TX/RX indices). That assumes all the tg3 NICs support status tags...if not, then we have to keep checking indices. [ BTW, I noticed spin_lock*(&tp->lock) calls are nested in tg3_poll. That's a bug, right? I'm still looking at v3.29 ] The current implementation of tg3_poll() processes TX and then RX. The status tag we read afterwards and the TX/RX indices checked could be newer than the TX/RX indices used during processing. Is tg3 then roughly rate limited to the TX and RX queue depth per poll interval? (I'm still thinking during-ints limits how much DMA can occur) Given TG3_TX_RING_SIZE is 512, then I would max out at ~500Kpps if there is any RX traffic that causes tg3_has_work() to come back true. While this might be normally ok, I'm looking to maximize pktgen output w/o disabling/enabling interrupts for each "batch" of TX packets. > > 2) tg3_poll() and tg3_msi() are not consistent on how they clear > > the SD_STATUS_UPDATED bit. tg3_poll() does not clear SD_STATUS_UPDATED > > bit after reading status_tag. I think everytime the driver discovers > > the status_tag changed, it should to clear SD_STATUS_UPDATED. > > Michael, can you confirm/deny that offhand? > > You're right again. The SD_STATUS_UPDATED bit should be cleared right before > checking for new work. Clearing the SD_STATUS_UPDATED bit tells the non-msi > irq handler that all work up to the last status block update has been > processed. If I understood this correctly, tg3 may already have new work pending when tg3_has_work() is called from tg3_poll(). tg3_poll() does not tell the card anything but promises to pick up where it left off the next time tg3_poll() is called. If we don't tell the card anything, it means at some point it's going to stop doing DMA....this might be one of the things preventing tg3 from doing link rate with pktgen pushing 64byte packets. ... > It is important to read the actual status block with the latest indices to > determine whether there is new work, especially in the non-tagged case where > you may have race condition between software and hardware. Yes - I think I understand were several of the races can occur. Probably not seeing all of them though. thanks again, grant From davem@davemloft.net Wed Jun 15 12:18:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 12:18:08 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FJI5H9012389 for ; Wed, 15 Jun 2005 12:18:05 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DidNE-0001Nj-Av; Wed, 15 Jun 2005 12:16:28 -0700 Date: Wed, 15 Jun 2005 12:16:28 -0700 (PDT) Message-Id: <20050615.121628.112622743.davem@davemloft.net> To: juhl-lkml@dif.dk Cc: yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, waltje@uWalt.NL.Mugnet.ORG.sgi.com, ross.biro@gmail.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [-mm PATCH] signed vs unsigned cleanup in net/ipv4/raw.c From: "David S. Miller" In-Reply-To: References: X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2426 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 332 Lines: 9 I'm not merging this thing, at least no all at once. "size_t" vs. "unsigned int" vs. "int" length comparisons are where all the security problems come from in the protocol stack Therefore you should make a seperate patch for each type change you make and explain why it doesn't add some regression in terms of signedness issues. From juhl-lkml@dif.dk Wed Jun 15 12:24:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 12:24:18 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FJOEH9016826 for ; Wed, 15 Jun 2005 12:24:15 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 6D95CFFC61 for ; Wed, 15 Jun 2005 21:30:58 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 09553-04 for ; Wed, 15 Jun 2005 21:30:58 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id 1F6EFFFC76 for ; Wed, 15 Jun 2005 21:30:56 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 21:21:58 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQMPV; Wed, 15 Jun 2005 21:22:55 +0200 Date: Wed, 15 Jun 2005 21:28:23 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: juhl-lkml@dif.dk, yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, waltje@uWalt.NL.Mugnet.ORG.sgi.com, ross.biro@gmail.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [-mm PATCH] signed vs unsigned cleanup in net/ipv4/raw.c In-Reply-To: <20050615.121628.112622743.davem@davemloft.net> Message-ID: References: <20050615.121628.112622743.davem@davemloft.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2427 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 531 Lines: 20 On Wed, 15 Jun 2005, David S. Miller wrote: > > I'm not merging this thing, at least no all at once. > > "size_t" vs. "unsigned int" vs. "int" length comparisons are where all > the security problems come from in the protocol stack > > Therefore you should make a seperate patch for each type > change you make and explain why it doesn't add some regression > in terms of signedness issues. > Fair enough, I'll split it into little bits and submit them one by one with explanations. Not a problem at all. -- Jesper Juhl From davem@davemloft.net Wed Jun 15 12:59:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 12:59:46 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FJxfH9018190 for ; Wed, 15 Jun 2005 12:59:41 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Die1U-0000jL-Vg; Wed, 15 Jun 2005 12:58:05 -0700 Date: Wed, 15 Jun 2005 12:58:04 -0700 (PDT) Message-Id: <20050615.125804.126575159.davem@davemloft.net> To: juhl-lkml@dif.dk Cc: yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, waltje@uWalt.NL.Mugnet.ORG.sgi.com, ross.biro@gmail.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [-mm PATCH] signed vs unsigned cleanup in net/ipv4/raw.c From: "David S. Miller" In-Reply-To: References: <20050615.121628.112622743.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2428 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 222 Lines: 7 From: Jesper Juhl Date: Wed, 15 Jun 2005 21:28:23 +0200 (CEST) > Fair enough, I'll split it into little bits and submit them one by one > with explanations. Not a problem at all. Thanks a lot Jesper. From juhl-lkml@dif.dk Wed Jun 15 14:25:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:26:00 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLPuH9021264 for ; Wed, 15 Jun 2005 14:25:57 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 25E0DFFC9A for ; Wed, 15 Jun 2005 23:32:43 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12426-06 for ; Wed, 15 Jun 2005 23:32:42 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id 89608FF476 for ; Wed, 15 Jun 2005 23:32:40 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 23:23:41 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQM92; Wed, 15 Jun 2005 23:24:38 +0200 Date: Wed, 15 Jun 2005 23:30:06 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: Hideaki YOSHIFUJI , Alexey Kuznetsov , James Morris , Ross Biro , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [-mm PATCH][1/4] net: signed vs unsigned cleanup in net/ipv4/raw.c Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2430 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 1116 Lines: 39 This patch silences these two gcc -W warnings in net/ipv4/raw.c : net/ipv4/raw.c:517: warning: signed and unsigned type in conditional expression net/ipv4/raw.c:613: warning: signed and unsigned type in conditional expression It doesn't change the behaviour of the code, simply writes the conditional expression with plain 'if()' syntax instead of '? :' , but since this breaks it into sepperate statements gcc no longer complains about having both a signed and unsigned value in the same conditional expression. --- linux-2.6.12-rc6-mm1-orig/net/ipv4/raw.c 2005-06-12 15:58:58.000000000 +0200 +++ linux-2.6.12-rc6-mm1/net/ipv4/raw.c 2005-06-15 22:22:44.000000000 +0200 @@ -514,7 +514,10 @@ done: kfree(ipc.opt); ip_rt_put(rt); -out: return err < 0 ? err : len; +out: + if (err < 0) + return err; + return len; do_confirm: dst_confirm(&rt->u.dst); @@ -610,7 +613,10 @@ static int raw_recvmsg(struct kiocb *ioc copied = skb->len; done: skb_free_datagram(sk, skb); -out: return err ? err : copied; +out: + if (err) + return err; + return copied; } static int raw_init(struct sock *sk) From juhl-lkml@dif.dk Wed Jun 15 14:28:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:28:14 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLSBH9022161 for ; Wed, 15 Jun 2005 14:28:11 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 92474FFC63 for ; Wed, 15 Jun 2005 23:34:57 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12585-10 for ; Wed, 15 Jun 2005 23:34:57 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id 5D82BFFC92 for ; Wed, 15 Jun 2005 23:34:56 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 23:25:57 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQM9Z; Wed, 15 Jun 2005 23:26:54 +0200 Date: Wed, 15 Jun 2005 23:32:22 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: Hideaki YOSHIFUJI , Alexey Kuznetsov , James Morris , Ross Biro , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [-mm PATCH][4/4] net: signed vs unsigned cleanup in net/ipv4/raw.c Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2433 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 1779 Lines: 44 This patch changes the type of the third parameter 'length' of the raw_send_hdrinc() function from 'int' to 'size_t'. This makes sense since this function is only ever called from one location, and the value passed as the third parameter in that location is itself of type size_t, so this makes the recieving functions parameter type match. Also, inside raw_send_hdrinc() the 'length' variable is used in comparisons with unsigned values and passed as parameter to functions expecting unsigned values (it's used in a single comparison with a signed value, but that one can never actually be negative so the patch also casts that one to size_t to stop gcc worrying, and it is passed in a single instance to memcpy_fromiovecend() which expects a signed int, but as far as I can see that's not a problem since the value of 'length' shouldn't ever exceed the value of a signed int). Signed-off-by: Jesper Juhl --- net/ipv4/raw.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) --- linux-2.6.12-rc6-mm1/net/ipv4/raw.c.with_patch-3 2005-06-15 23:17:23.000000000 +0200 +++ linux-2.6.12-rc6-mm1/net/ipv4/raw.c 2005-06-15 23:26:48.000000000 +0200 @@ -259,7 +259,7 @@ int raw_rcv(struct sock *sk, struct sk_b return 0; } -static int raw_send_hdrinc(struct sock *sk, void *from, int length, +static int raw_send_hdrinc(struct sock *sk, void *from, size_t length, struct rtable *rt, unsigned int flags) { @@ -298,7 +298,7 @@ static int raw_send_hdrinc(struct sock * goto error_fault; /* We don't modify invalid header */ - if (length >= sizeof(*iph) && iph->ihl * 4 <= length) { + if (length >= sizeof(*iph) && (size_t)(iph->ihl * 4) <= length) { if (!iph->saddr) iph->saddr = rt->rt_src; iph->check = 0; From juhl-lkml@dif.dk Wed Jun 15 14:25:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:25:29 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLPOH9021201 for ; Wed, 15 Jun 2005 14:25:25 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 045A7FF476 for ; Wed, 15 Jun 2005 23:32:08 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12348-10 for ; Wed, 15 Jun 2005 23:32:07 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id E5DD6FFC92 for ; Wed, 15 Jun 2005 23:32:04 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 23:23:07 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQM91; Wed, 15 Jun 2005 23:24:04 +0200 Date: Wed, 15 Jun 2005 23:29:32 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: Hideaki YOSHIFUJI , Alexey Kuznetsov , James Morris , Ross Biro , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [-mm PATCH][0/4] net: signed vs unsigned cleanup in net/ipv4/raw.c Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2429 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 477 Lines: 18 David, As promised, my previous net/ipv4/raw.c signed/unsigned cleanup patch in little bits with explanations. This series of patches cleans up signed versus unsigned variable use in net/ipv4/raw.c . The patches are created incrementally on top of each other but will probably each apply (with a little fuzz) on their own out of order. Please keep me on CC if you reply since I'm not subscribed to both lists the patches are send to (only to lkml). -- Jesper Juhl From juhl-lkml@dif.dk Wed Jun 15 14:27:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:27:38 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLRYH9021865 for ; Wed, 15 Jun 2005 14:27:34 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id A0522FFC92 for ; Wed, 15 Jun 2005 23:34:20 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12585-08 for ; Wed, 15 Jun 2005 23:34:20 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id B18A6FFC9B for ; Wed, 15 Jun 2005 23:34:17 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 23:25:19 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQM9V; Wed, 15 Jun 2005 23:26:16 +0200 Date: Wed, 15 Jun 2005 23:31:44 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: Hideaki YOSHIFUJI , Alexey Kuznetsov , James Morris , Ross Biro , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [-mm PATCH][3/4] net: signed vs unsigned cleanup in net/ipv4/raw.c Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2432 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 1150 Lines: 35 This patch changes the type of the local variable 'i' in raw_probe_proto_opt() from 'int' to 'unsigned int'. The only use of 'i' in this function is as a counter in a for() loop and subsequent index into the msg->msg_iov[] array. Since 'i' is compared in a loop to the unsigned variable msg->msg_iovlen gcc -W generates this warning : net/ipv4/raw.c:340: warning: comparison between signed and unsigned Changing 'i' to unsigned silences this warning and is safe since the array index can never be negative anyway, so unsigned int is the logical type to use for 'i' and also enables a larger msg_iov[] array (but I don't know if that will ever matter). Signed-off-by: Jesper Juhl --- net/ipv4/raw.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.12-rc6-mm1/net/ipv4/raw.c.with_patch-2 2005-06-15 23:04:40.000000000 +0200 +++ linux-2.6.12-rc6-mm1/net/ipv4/raw.c 2005-06-15 23:09:42.000000000 +0200 @@ -332,7 +332,7 @@ static void raw_probe_proto_opt(struct f u8 __user *type = NULL; u8 __user *code = NULL; int probed = 0; - int i; + unsigned int i; if (!msg->msg_iov) return; From juhl-lkml@dif.dk Wed Jun 15 14:26:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:26:53 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLQlH9021461 for ; Wed, 15 Jun 2005 14:26:48 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 19F2DFFCAB for ; Wed, 15 Jun 2005 23:33:34 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12476-10 for ; Wed, 15 Jun 2005 23:33:33 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id EB76CFFCA1 for ; Wed, 15 Jun 2005 23:33:30 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 23:24:33 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQM93; Wed, 15 Jun 2005 23:25:29 +0200 Date: Wed, 15 Jun 2005 23:30:58 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: Hideaki YOSHIFUJI , Alexey Kuznetsov , James Morris , Ross Biro , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [-mm PATCH][2/4] net: signed vs unsigned cleanup in net/ipv4/raw.c Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2431 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 825 Lines: 28 This patch gets rid of the following gcc -W warning in net/ipv4/raw.c : net/ipv4/raw.c:387: warning: comparison of unsigned expression < 0 is always false Since 'len' is of type size_t it is unsigned and can thus never be <0, and since this is obvious from the function declaration just a few lines above I think it's ok to remove the pointless check for len<0. Signed-off-by: Jesper Juhl --- net/ipv4/raw.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.12-rc6-mm1/net/ipv4/raw.c.with_patch-1 2005-06-15 22:39:17.000000000 +0200 +++ linux-2.6.12-rc6-mm1/net/ipv4/raw.c 2005-06-15 22:39:36.000000000 +0200 @@ -384,7 +384,7 @@ static int raw_sendmsg(struct kiocb *ioc int err; err = -EMSGSIZE; - if (len < 0 || len > 0xFFFF) + if (len > 0xFFFF) goto out; /* From davem@davemloft.net Wed Jun 15 14:31:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:31:22 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLVHH9023977 for ; Wed, 15 Jun 2005 14:31:17 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DifSL-0000eC-F3; Wed, 15 Jun 2005 14:29:53 -0700 Date: Wed, 15 Jun 2005 14:29:53 -0700 (PDT) Message-Id: <20050615.142953.59469324.davem@davemloft.net> To: juhl-lkml@dif.dk Cc: yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, ross.biro@gmail.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [-mm PATCH][4/4] net: signed vs unsigned cleanup in net/ipv4/raw.c From: "David S. Miller" In-Reply-To: References: X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2434 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 306 Lines: 9 From: Jesper Juhl Date: Wed, 15 Jun 2005 23:32:22 +0200 (CEST) > - if (length >= sizeof(*iph) && iph->ihl * 4 <= length) { > + if (length >= sizeof(*iph) && (size_t)(iph->ihl * 4) <= length) { Would changing the "4" into "4U" kill this warning just the same? I think I'd prefer that. From juhl-lkml@dif.dk Wed Jun 15 14:36:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:36:14 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLa6H9024597 for ; Wed, 15 Jun 2005 14:36:07 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 0DD1EFFC8B for ; Wed, 15 Jun 2005 23:42:48 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12667-08 for ; Wed, 15 Jun 2005 23:42:47 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id A50FFFFC9C for ; Wed, 15 Jun 2005 23:42:45 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 23:33:47 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQM0K; Wed, 15 Jun 2005 23:34:44 +0200 Date: Wed, 15 Jun 2005 23:40:12 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: juhl-lkml@dif.dk, yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, ross.biro@gmail.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [-mm PATCH][4/4] net: signed vs unsigned cleanup in net/ipv4/raw.c In-Reply-To: <20050615.142953.59469324.davem@davemloft.net> Message-ID: References: <20050615.142953.59469324.davem@davemloft.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2435 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 1310 Lines: 45 On Wed, 15 Jun 2005, David S. Miller wrote: > From: Jesper Juhl > Date: Wed, 15 Jun 2005 23:32:22 +0200 (CEST) > > > - if (length >= sizeof(*iph) && iph->ihl * 4 <= length) { > > + if (length >= sizeof(*iph) && (size_t)(iph->ihl * 4) <= length) { > > Would changing the "4" into "4U" kill this warning just the same? > It would. > I think I'd prefer that. > No problem. Here's a replacement patch nr. 4 : Signed-off-by: Jesper Juhl --- net/ipv4/raw.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) --- linux-2.6.12-rc6-mm1/net/ipv4/raw.c.with_patch-3 2005-06-15 23:17:23.000000000 +0200 +++ linux-2.6.12-rc6-mm1/net/ipv4/raw.c 2005-06-15 23:37:11.000000000 +0200 @@ -259,7 +259,7 @@ int raw_rcv(struct sock *sk, struct sk_b return 0; } -static int raw_send_hdrinc(struct sock *sk, void *from, int length, +static int raw_send_hdrinc(struct sock *sk, void *from, size_t length, struct rtable *rt, unsigned int flags) { @@ -298,7 +298,7 @@ static int raw_send_hdrinc(struct sock * goto error_fault; /* We don't modify invalid header */ - if (length >= sizeof(*iph) && iph->ihl * 4 <= length) { + if (length >= sizeof(*iph) && iph->ihl * 4U <= length) { if (!iph->saddr) iph->saddr = rt->rt_src; iph->check = 0; From davem@davemloft.net Wed Jun 15 14:42:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:42:43 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLgcH9025167 for ; Wed, 15 Jun 2005 14:42:38 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DifdM-0000np-Kp; Wed, 15 Jun 2005 14:41:16 -0700 Date: Wed, 15 Jun 2005 14:41:16 -0700 (PDT) Message-Id: <20050615.144116.41632938.davem@davemloft.net> To: juhl-lkml@dif.dk Cc: yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, ross.biro@gmail.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [-mm PATCH][4/4] net: signed vs unsigned cleanup in net/ipv4/raw.c From: "David S. Miller" In-Reply-To: References: <20050615.142953.59469324.davem@davemloft.net> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2436 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 281 Lines: 10 From: Jesper Juhl Date: Wed, 15 Jun 2005 23:40:12 +0200 (CEST) > On Wed, 15 Jun 2005, David S. Miller wrote: > > > I think I'd prefer that. > > > No problem. Here's a replacement patch nr. 4 : Thanks a lot. All 4 patches applied to my 2.6.13-pending tree. From juhl-lkml@dif.dk Wed Jun 15 14:44:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 15 Jun 2005 14:44:12 -0700 (PDT) Received: from saerimmer.dif.dk (mail.dif.dk [193.138.115.101]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5FLi8H9025682 for ; Wed, 15 Jun 2005 14:44:09 -0700 Received: from localhost (localhost [127.0.0.1]) by saerimmer.dif.dk (Postfix) with ESMTP id 5B49CFFC61 for ; Wed, 15 Jun 2005 23:50:55 +0200 (CEST) Received: from saerimmer.dif.dk ([127.0.0.1]) by localhost (saerimmer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12914-07 for ; Wed, 15 Jun 2005 23:50:55 +0200 (CEST) Received: from diftmgw2.backbone.dif.dk (diftmgw2.backbone.dif.dk [10.227.136.246]) by saerimmer.dif.dk (Postfix) with ESMTP id 9FD7BFFC9B for ; Wed, 15 Jun 2005 23:50:52 +0200 (CEST) Received: from DIFPST1A.backbone.dif.dk ([10.227.136.220]) by diftmgw2.backbone.dif.dk with InterScan Messaging Security Suite; Wed, 15 Jun 2005 23:41:54 +0200 Received: from [172.16.2.11] (10.227.136.29 [10.227.136.29]) by DIFPST1A.backbone.dif.dk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id LWFYQNA1; Wed, 15 Jun 2005 23:42:51 +0200 Date: Wed, 15 Jun 2005 23:48:19 +0200 (CEST) From: Jesper Juhl To: "David S. Miller" Cc: yoshfuji@linux-ipv6.org, kuznet@ms2.inr.ac.ru, jmorris@redhat.com, ross.biro@gmail.com, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [-mm PATCH][4/4] net: signed vs unsigned cleanup in net/ipv4/raw.c In-Reply-To: <20050615.144116.41632938.davem@davemloft.net> Message-ID: References: <20050615.142953.59469324.davem@davemloft.net> <20050615.144116.41632938.davem@davemloft.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2437 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: juhl-lkml@dif.dk Precedence: bulk X-list: netdev Content-Length: 415 Lines: 19 On Wed, 15 Jun 2005, David S. Miller wrote: > From: Jesper Juhl > Date: Wed, 15 Jun 2005 23:40:12 +0200 (CEST) > > > On Wed, 15 Jun 2005, David S. Miller wrote: > > > > > I think I'd prefer that. > > > > > No problem. Here's a replacement patch nr. 4 : > > Thanks a lot. All 4 patches applied to my 2.6.13-pending tree. > Great, thanks, 'twas a pleasure working with you :) -- Jesper From herbert@gondor.apana.org.au Thu Jun 16 04:39:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 16 Jun 2005 04:39:07 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5GBcxH9025436 for ; Thu, 16 Jun 2005 04:39:00 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1Disgh-0002X5-00; Thu, 16 Jun 2005 21:37:35 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1Disge-0005pJ-00; Thu, 16 Jun 2005 21:37:32 +1000 Date: Thu, 16 Jun 2005 21:37:32 +1000 To: "David S. Miller" Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 Message-ID: <20050616113732.GA22367@gondor.apana.org.au> References: <20050603.122558.88474819.davem@davemloft.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="5vNYLRcllDrimb99" Content-Disposition: inline In-Reply-To: <20050603.122558.88474819.davem@davemloft.net> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2438 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 5473 Lines: 200 --5vNYLRcllDrimb99 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Jun 03, 2005 at 07:25:58PM +0000, David S. Miller wrote: > > This version incorporates two bug fixes from Michael. > > 1) Check the mailbox register for 0x1 while polling on the COMPLETE > state bit. > > 2) Remove the BUG_ON() check in tg3_restart_ints(), it can legally and > harmlessly occur. > > Point #2 may want some refinements, but this patch below is good > enough for testing. Nice work Dave. I was thinking of how we could avoid waiting for the interrupt to occur after setting SYNC. Here is one way which is essentially a hand-coded spin lock. In fact with a bit of work we could convert it back to a real spin lock with spin_trylock. The advantage of this is that we won't have to rely on the interrupt to occur after setting SYNC. The disadvantage is that on certain architectures (sparc64 obviously :) we're now doing the relatively expensive bit operations on each IRQ. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --5vNYLRcllDrimb99 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p --- linux-2.6/drivers/net/tg3.h.orig 2005-06-16 21:10:30.000000000 +1000 +++ linux-2.6/drivers/net/tg3.h 2005-06-16 21:12:00.000000000 +1000 @@ -2009,21 +2009,22 @@ /* If the IRQ handler (which runs lockless) needs to be * quiesced, the following bitmask state is used. The * SYNC bit is set by non-IRQ context code to initiate - * the quiescence. The setter of this bit also forces - * an interrupt to run via the GRC misc host control - * register. - * - * The IRQ handler notes this, disables interrupts, and - * sets the COMPLETE bit. At this point the SYNC bit - * setter can be assured that interrupts will no longer - * get run. + * the quiescence. + * + * The IRQ sets the BUSY bit whenever it runs. When it + * notices that SYNC is set, it disables interrupts, + * clears the BUSY bit and returns. + * + * When the BUSY bit is cleared after the SYNC bit has + * been set, the setter can be assured that interrupts + * will no longer get run. * * In this way all SMP driver locks are never acquired * in hw IRQ context, only sw IRQ context or lower. */ unsigned long irq_state; #define TG3_IRQSTATE_SYNC 0 -#define TG3_IRQSTATE_COMPLETE 1 +#define TG3_IRQSTATE_BUSY 1 /* SMP locking strategy: * --- linux-2.6/drivers/net/tg3.c.orig 2005-06-16 21:10:27.000000000 +1000 +++ linux-2.6/drivers/net/tg3.c 2005-06-16 21:24:54.000000000 +1000 @@ -2931,32 +2931,26 @@ return (done ? 0 : 1); } -static void tg3_irq_quiesce(struct tg3 *tp) +static inline void tg3_irq_quiesce(struct tg3 *tp) { BUG_ON(test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)); set_bit(TG3_IRQSTATE_SYNC, &tp->irq_state); - smp_mb(); - tw32(GRC_LOCAL_CTRL, - tp->grc_local_ctrl | GRC_LCLCTRL_SETINT); - - while (!test_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state)) { - u32 val = tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); - - if (val == 0x00000001) - break; + while (test_bit(TG3_IRQSTATE_BUSY, &tp->irq_state)) cpu_relax(); - } } -static inline int tg3_irq_sync(struct tg3 *tp) +static inline int tg3_irq_enter(struct tg3 *tp) { - if (test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)) { - set_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state); - return 1; - } - return 0; + set_bit(TG3_IRQSTATE_BUSY, &tp->irq_state); + return test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state); +} + +static inline void tg3_irq_exit(struct tg3 *tp) +{ + smp_mb__before_clear_bit(); + clear_bit(TG3_IRQSTATE_BUSY, &tp->irq_state); } /* Fully shutdown all tg3 driver activity elsewhere in the system. @@ -2997,8 +2991,10 @@ */ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); tp->last_tag = sblk->status_tag; - if (tg3_irq_sync(tp)) + + if (tg3_irq_enter(tp)) goto out; + sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ @@ -3007,7 +3003,10 @@ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, tp->last_tag << 24); } + out: + tg3_irq_exit(tp); + return IRQ_RETVAL(1); } @@ -3034,8 +3033,10 @@ */ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); - if (tg3_irq_sync(tp)) + + if (tg3_irq_enter(tp)) goto out; + sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ @@ -3047,10 +3048,13 @@ 0x00000000); tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); } + +out: + tg3_irq_exit(tp); } else { /* shared interrupt */ handled = 0; } -out: + return IRQ_RETVAL(handled); } @@ -3078,8 +3082,10 @@ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); tp->last_tag = sblk->status_tag; - if (tg3_irq_sync(tp)) + + if (tg3_irq_enter(tp)) goto out; + sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ @@ -3091,10 +3097,13 @@ tp->last_tag << 24); tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); } + +out: + tg3_irq_exit(tp); } else { /* shared interrupt */ handled = 0; } -out: + return IRQ_RETVAL(handled); } --5vNYLRcllDrimb99-- From herbert@gondor.apana.org.au Thu Jun 16 05:01:24 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 16 Jun 2005 05:01:29 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5GC1MH9002673 for ; Thu, 16 Jun 2005 05:01:23 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1Dit2F-0002eK-00; Thu, 16 Jun 2005 21:59:51 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1Dit2A-00061f-00; Thu, 16 Jun 2005 21:59:46 +1000 Date: Thu, 16 Jun 2005 21:59:46 +1000 To: "David S. Miller" Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 Message-ID: <20050616115945.GA23064@gondor.apana.org.au> References: <20050603.122558.88474819.davem@davemloft.net> <20050616113732.GA22367@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="bp/iNruPH9dso1Pn" Content-Disposition: inline In-Reply-To: <20050616113732.GA22367@gondor.apana.org.au> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2439 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 3367 Lines: 110 --bp/iNruPH9dso1Pn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Jun 16, 2005 at 09:37:32PM +1000, herbert wrote: > > The advantage of this is that we won't have to rely on the interrupt > to occur after setting SYNC. The disadvantage is that on certain > architectures (sparc64 obviously :) we're now doing the relatively > expensive bit operations on each IRQ. Actually, why don't we utilise the existing synchronize_irq mechanism? Here is what we could do. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --bp/iNruPH9dso1Pn Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p --- linux-2.6/drivers/net/tg3.h.orig 2005-06-16 21:52:01.000000000 +1000 +++ linux-2.6/drivers/net/tg3.h 2005-06-16 21:58:02.000000000 +1000 @@ -2008,22 +2008,20 @@ /* If the IRQ handler (which runs lockless) needs to be * quiesced, the following bitmask state is used. The - * SYNC bit is set by non-IRQ context code to initiate - * the quiescence. The setter of this bit also forces - * an interrupt to run via the GRC misc host control - * register. - * - * The IRQ handler notes this, disables interrupts, and - * sets the COMPLETE bit. At this point the SYNC bit - * setter can be assured that interrupts will no longer - * get run. + * SYNC flag is set by non-IRQ context code to initiate + * the quiescence. + * + * When the IRQ handler notices that SYNC is set, it + * disables interrupts and returns. + * + * When all outstanding IRQ handlers have returned after + * the SYNC flag has been set, the setter can be assured + * that interrupts will no longer get run. * * In this way all SMP driver locks are never acquired * in hw IRQ context, only sw IRQ context or lower. */ - unsigned long irq_state; -#define TG3_IRQSTATE_SYNC 0 -#define TG3_IRQSTATE_COMPLETE 1 + unsigned int irq_sync; /* SMP locking strategy: * --- linux-2.6/drivers/net/tg3.c.orig 2005-06-16 21:52:04.000000000 +1000 +++ linux-2.6/drivers/net/tg3.c 2005-06-16 21:58:36.000000000 +1000 @@ -435,7 +435,7 @@ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, (tp->last_tag << 24)); tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); - tp->irq_state = 0; + tp->irq_sync = 0; tg3_cond_int(tp); } @@ -2931,32 +2931,18 @@ return (done ? 0 : 1); } -static void tg3_irq_quiesce(struct tg3 *tp) +static inline void tg3_irq_quiesce(struct tg3 *tp) { - BUG_ON(test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)); + BUG_ON(tp->irq_sync); - set_bit(TG3_IRQSTATE_SYNC, &tp->irq_state); - smp_mb(); - tw32(GRC_LOCAL_CTRL, - tp->grc_local_ctrl | GRC_LCLCTRL_SETINT); + tp->irq_sync = 1; - while (!test_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state)) { - u32 val = tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); - - if (val == 0x00000001) - break; - - cpu_relax(); - } + synchronize_irq(tp->pdev->irq); } static inline int tg3_irq_sync(struct tg3 *tp) { - if (test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)) { - set_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state); - return 1; - } - return 0; + return tp->irq_sync; } /* Fully shutdown all tg3 driver activity elsewhere in the system. --bp/iNruPH9dso1Pn-- From herbert@gondor.apana.org.au Thu Jun 16 06:06:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 16 Jun 2005 06:06:35 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5GD6SH9005502 for ; Thu, 16 Jun 2005 06:06:29 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1Diu3F-00031A-00; Thu, 16 Jun 2005 23:04:57 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1Diu3C-0006Go-00; Thu, 16 Jun 2005 23:04:54 +1000 Date: Thu, 16 Jun 2005 23:04:53 +1000 To: "David S. Miller" Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 Message-ID: <20050616130453.GA23682@gondor.apana.org.au> References: <20050603.122558.88474819.davem@davemloft.net> <20050616113732.GA22367@gondor.apana.org.au> <20050616115945.GA23064@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="MGYHOYXEY6WxJCY8" Content-Disposition: inline In-Reply-To: <20050616115945.GA23064@gondor.apana.org.au> User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2440 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 3144 Lines: 105 --MGYHOYXEY6WxJCY8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Jun 16, 2005 at 09:59:45PM +1000, herbert wrote: > > Actually, why don't we utilise the existing synchronize_irq mechanism? > Here is what we could do. Oops, I should've left the smp_mb() in tg3_irq_quiesce since synchronize_irq isn't a memory barrier. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --MGYHOYXEY6WxJCY8 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p --- linux-2.6/drivers/net/tg3.h.orig 2005-06-16 21:52:01.000000000 +1000 +++ linux-2.6/drivers/net/tg3.h 2005-06-16 21:58:02.000000000 +1000 @@ -2008,22 +2008,20 @@ /* If the IRQ handler (which runs lockless) needs to be * quiesced, the following bitmask state is used. The - * SYNC bit is set by non-IRQ context code to initiate - * the quiescence. The setter of this bit also forces - * an interrupt to run via the GRC misc host control - * register. - * - * The IRQ handler notes this, disables interrupts, and - * sets the COMPLETE bit. At this point the SYNC bit - * setter can be assured that interrupts will no longer - * get run. + * SYNC flag is set by non-IRQ context code to initiate + * the quiescence. + * + * When the IRQ handler notices that SYNC is set, it + * disables interrupts and returns. + * + * When all outstanding IRQ handlers have returned after + * the SYNC flag has been set, the setter can be assured + * that interrupts will no longer get run. * * In this way all SMP driver locks are never acquired * in hw IRQ context, only sw IRQ context or lower. */ - unsigned long irq_state; -#define TG3_IRQSTATE_SYNC 0 -#define TG3_IRQSTATE_COMPLETE 1 + unsigned int irq_sync; /* SMP locking strategy: * --- linux-2.6/drivers/net/tg3.c.orig 2005-06-16 21:52:04.000000000 +1000 +++ linux-2.6/drivers/net/tg3.c 2005-06-16 23:02:22.000000000 +1000 @@ -435,7 +435,7 @@ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, (tp->last_tag << 24)); tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); - tp->irq_state = 0; + tp->irq_sync = 0; tg3_cond_int(tp); } @@ -2933,30 +2933,17 @@ static void tg3_irq_quiesce(struct tg3 *tp) { - BUG_ON(test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)); + BUG_ON(tp->irq_sync); - set_bit(TG3_IRQSTATE_SYNC, &tp->irq_state); + tp->irq_sync = 1; smp_mb(); - tw32(GRC_LOCAL_CTRL, - tp->grc_local_ctrl | GRC_LCLCTRL_SETINT); - while (!test_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state)) { - u32 val = tr32(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW); - - if (val == 0x00000001) - break; - - cpu_relax(); - } + synchronize_irq(tp->pdev->irq); } static inline int tg3_irq_sync(struct tg3 *tp) { - if (test_bit(TG3_IRQSTATE_SYNC, &tp->irq_state)) { - set_bit(TG3_IRQSTATE_COMPLETE, &tp->irq_state); - return 1; - } - return 0; + return tp->irq_sync; } /* Fully shutdown all tg3 driver activity elsewhere in the system. --MGYHOYXEY6WxJCY8-- From shemminger@osdl.org Thu Jun 16 11:02:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 16 Jun 2005 11:02:46 -0700 (PDT) Received: from smtp.osdl.org (smtp.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5GI2bH9023511 for ; Thu, 16 Jun 2005 11:02:38 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j5GI1MjA001638 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 16 Jun 2005 11:01:22 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j5GI1MOn020848; Thu, 16 Jun 2005 11:01:22 -0700 Date: Thu, 16 Jun 2005 11:01:09 -0700 From: Stephen Hemminger To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: [PATCH] sk98lin: fix ethtool stats Message-ID: <20050616110109.728d315a@dxpl.pdx.osdl.net> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.110 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2441 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 4336 Lines: 102 The ethtool stats code in the sk98lin driver doesn't correctly match stats with names. Also it always reports stats for port 0, and doesn't update stats before reporting. This patch fixes that and adds statistics for the number of pause frames sent/received. Signed-off-by: Stephen Hemminger Index: work/drivers/net/sk98lin/skethtool.c =================================================================== --- work.orig/drivers/net/sk98lin/skethtool.c +++ work/drivers/net/sk98lin/skethtool.c @@ -268,6 +268,7 @@ static const char StringsStats[][ETH_GST "rx_bytes", "tx_bytes", "rx_errors", "tx_errors", "rx_dropped", "tx_dropped", + "rx_pause", "tx_pause", "multicasts", "collisions", "rx_length_errors", "rx_buffer_overflow_errors", "rx_crc_errors", "rx_frame_errors", @@ -297,37 +298,49 @@ static void getStrings(struct net_device static void getEthtoolStats(struct net_device *dev, struct ethtool_stats *stats, u64 *data) { - const DEV_NET *pNet = netdev_priv(dev); - const SK_AC *pAC = pNet->pAC; - const SK_PNMI_STRUCT_DATA *pPnmiStruct = &pAC->PnmiStruct; - - *data++ = pPnmiStruct->Stat[0].StatRxOkCts; - *data++ = pPnmiStruct->Stat[0].StatTxOkCts; - *data++ = pPnmiStruct->Stat[0].StatRxOctetsOkCts; - *data++ = pPnmiStruct->Stat[0].StatTxOctetsOkCts; - *data++ = pPnmiStruct->InErrorsCts; - *data++ = pPnmiStruct->Stat[0].StatTxSingleCollisionCts; - *data++ = pPnmiStruct->RxNoBufCts; - *data++ = pPnmiStruct->TxNoBufCts; - *data++ = pPnmiStruct->Stat[0].StatRxMulticastOkCts; - *data++ = pPnmiStruct->Stat[0].StatTxSingleCollisionCts; - *data++ = pPnmiStruct->Stat[0].StatRxRuntCts; - *data++ = pPnmiStruct->Stat[0].StatRxFifoOverflowCts; - *data++ = pPnmiStruct->Stat[0].StatRxFcsCts; - *data++ = pPnmiStruct->Stat[0].StatRxFramingCts; - *data++ = pPnmiStruct->Stat[0].StatRxShortsCts; - *data++ = pPnmiStruct->Stat[0].StatRxTooLongCts; - *data++ = pPnmiStruct->Stat[0].StatRxCextCts; - *data++ = pPnmiStruct->Stat[0].StatRxSymbolCts; - *data++ = pPnmiStruct->Stat[0].StatRxIRLengthCts; - *data++ = pPnmiStruct->Stat[0].StatRxCarrierCts; - *data++ = pPnmiStruct->Stat[0].StatRxJabberCts; - *data++ = pPnmiStruct->Stat[0].StatRxMissedCts; - *data++ = pAC->stats.tx_aborted_errors; - *data++ = pPnmiStruct->Stat[0].StatTxCarrierCts; - *data++ = pPnmiStruct->Stat[0].StatTxFifoUnderrunCts; - *data++ = pPnmiStruct->Stat[0].StatTxCarrierCts; - *data++ = pAC->stats.tx_window_errors; + DEV_NET *pNet = netdev_priv(dev); + SK_AC *pAC = pNet->pAC; + SK_PNMI_STRUCT_DATA *pPnmiStruct = &pAC->PnmiStruct; + u32 size = sizeof(*pPnmiStruct); + int port = pNet->NetNr; + int i = 0; + + if (netif_running(dev)) + SkPnmiGetStruct(pAC, pAC->IoBase, pPnmiStruct, + &size, port); + + i = 0; + data[i++] = pPnmiStruct->Stat[port].StatRxOkCts; + data[i++] = pPnmiStruct->Stat[port].StatTxOkCts; + data[i++] = pPnmiStruct->Stat[port].StatRxOctetsOkCts; + data[i++] = pPnmiStruct->Stat[port].StatTxOctetsOkCts; + data[i++] = pPnmiStruct->InErrorsCts; + data[i++] = pPnmiStruct->OutErrorsCts; + data[i++] = pPnmiStruct->RxNoBufCts; + data[i++] = pPnmiStruct->TxNoBufCts; + data[i++] = pPnmiStruct->Stat[port].StatRxPauseMacCtrlCts; + data[i++] = pPnmiStruct->Stat[port].StatTxPauseMacCtrlCts; + data[i++] = pPnmiStruct->Stat[port].StatRxMulticastOkCts; + data[i++] = pPnmiStruct->Stat[port].StatTxSingleCollisionCts; + data[i++] = pPnmiStruct->Stat[port].StatRxRuntCts; + data[i++] = pPnmiStruct->Stat[port].StatRxFifoOverflowCts; + data[i++] = pPnmiStruct->Stat[port].StatRxFcsCts; + data[i++] = pPnmiStruct->Stat[port].StatRxFramingCts; + data[i++] = pPnmiStruct->Stat[port].StatRxShortsCts; + data[i++] = pPnmiStruct->Stat[port].StatRxTooLongCts; + data[i++] = pPnmiStruct->Stat[port].StatRxCextCts; + data[i++] = pPnmiStruct->Stat[port].StatRxSymbolCts; + data[i++] = pPnmiStruct->Stat[port].StatRxIRLengthCts; + data[i++] = pPnmiStruct->Stat[port].StatRxCarrierCts; + data[i++] = pPnmiStruct->Stat[port].StatRxJabberCts; + data[i++] = pPnmiStruct->Stat[port].StatRxMissedCts; + data[i++] = pAC->stats.tx_aborted_errors; + data[i++] = pPnmiStruct->Stat[port].StatTxCarrierCts; + data[i++] = pPnmiStruct->Stat[port].StatTxFifoUnderrunCts; + data[i++] = pPnmiStruct->Stat[port].StatTxCarrierCts; + data[i++] = pAC->stats.tx_window_errors; + + BUG_ON(i != ARRAY_SIZE(StringsStats)); } From davem@davemloft.net Thu Jun 16 13:05:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 16 Jun 2005 13:05:42 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5GK5cH9031610 for ; Thu, 16 Jun 2005 13:05:38 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dj0b4-0002Dt-4B; Thu, 16 Jun 2005 13:04:18 -0700 Date: Thu, 16 Jun 2005 13:04:17 -0700 (PDT) Message-Id: <20050616.130417.78707215.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 From: "David S. Miller" In-Reply-To: <20050616130453.GA23682@gondor.apana.org.au> References: <20050616113732.GA22367@gondor.apana.org.au> <20050616115945.GA23064@gondor.apana.org.au> <20050616130453.GA23682@gondor.apana.org.au> X-Mailer: Mew version 3.3 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2442 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 744 Lines: 20 From: Herbert Xu Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 Date: Thu, 16 Jun 2005 23:04:53 +1000 > On Thu, Jun 16, 2005 at 09:59:45PM +1000, herbert wrote: > > > > Actually, why don't we utilise the existing synchronize_irq mechanism? > > Here is what we could do. > > Oops, I should've left the smp_mb() in tg3_irq_quiesce since > synchronize_irq isn't a memory barrier. Wow, that's a very cool idea. :-) In fact, I think it will eliminate some (but definitely not all) of the races that the new locking code has. When you posted the patch with the atomic bitop added to the interrupt handler, I was going to tell you that the whole idea was to make it near zero cost to the interrupt fast path. :) From mcgrof@gmail.com Thu Jun 16 16:53:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 16 Jun 2005 16:53:26 -0700 (PDT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.192]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5GNrLH9018720 for ; Thu, 16 Jun 2005 16:53:22 -0700 Received: by wproxy.gmail.com with SMTP id 68so776192wri for ; Thu, 16 Jun 2005 16:52:05 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=MWGLVFYPYzXkCMx5FuYadX+ktfo82/ehUHL2Z2V8cwH0GoTenu90xuDi6/fwgaYG3rvNAHcymZ1nS1bLk1CRCaOgMyw5nsTfmbOtTSgX7kbKpQo0UlyzRRD6X0LjpqcPSYDfgUEEBkmaQgUqdQhbhhq1e0j+rpagQ1Gpo0XCHP8= Received: by 10.54.26.72 with SMTP id 72mr929648wrz; Thu, 16 Jun 2005 16:52:04 -0700 (PDT) Received: by 10.54.13.22 with HTTP; Thu, 16 Jun 2005 16:52:04 -0700 (PDT) Message-ID: <43e72e8905061616527858ebd6@mail.gmail.com> Date: Thu, 16 Jun 2005 19:52:04 -0400 From: "Luis R. Rodriguez" Reply-To: "Luis R. Rodriguez" To: Olaf Hering Subject: Re: [PATCH] uninitialized variable in prism54 isl38xx_trigger_device Cc: Jeff Garzik , netdev@oss.sgi.com, prism54-private@prism54.org In-Reply-To: <20050525231651.GA21816@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline References: <20050525231651.GA21816@suse.de> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5GNrLH9018720 X-archive-position: 2443 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mcgrof@gmail.com Precedence: bulk X-list: netdev Content-Length: 2238 Lines: 52 Sure, why not. This has been applied to prism54 svn tree. On 5/25/05, Olaf Hering wrote: > > drivers/net/wireless/prism54/isl_38xx.c:131: warning: 'current_time.tv_sec' is used uninitialized in this function > drivers/net/wireless/prism54/isl_38xx.c:131: warning: 'current_time.tv_usec' is used uninitialized in this function > > Signed-off-by: Olaf Hering > Index: linux-2.6.12-rc5-olh/drivers/net/wireless/prism54/isl_38xx.c > =================================================================== > --- linux-2.6.12-rc5-olh.orig/drivers/net/wireless/prism54/isl_38xx.c > +++ linux-2.6.12-rc5-olh/drivers/net/wireless/prism54/isl_38xx.c > @@ -112,10 +112,10 @@ isl38xx_handle_wakeup(isl38xx_control_bl > void > isl38xx_trigger_device(int asleep, void __iomem *device_base) > { > - struct timeval current_time; > u32 reg, counter = 0; > > #if VERBOSE > SHOW_ERROR_MESSAGES > + struct timeval current_time; > DEBUG(SHOW_FUNCTION_CALLS, "isl38xx trigger device\n"); > #endif > > @@ -126,11 +126,11 @@ isl38xx_trigger_device(int asleep, void > do_gettimeofday(¤t_time); > DEBUG(SHOW_TRACING, "%08li.%08li Device wakeup triggered\n", > current_time.tv_sec, (long)current_time.tv_usec); > -#endif > > DEBUG(SHOW_TRACING, "%08li.%08li Device register read %08x\n", > current_time.tv_sec, (long)current_time.tv_usec, > readl(device_base + ISL38XX_CTRL_STAT_REG)); > +#endif > udelay(ISL38XX_WRITEIO_DELAY); > > reg = readl(device_base + ISL38XX_INT_IDENT_REG); > @@ -148,10 +148,12 @@ isl38xx_trigger_device(int asleep, void > counter++; > } > > +#if VERBOSE > SHOW_ERROR_MESSAGES > DEBUG(SHOW_TRACING, > "%08li.%08li Device register read %08x\n", > current_time.tv_sec, (long)current_time.tv_usec, > readl(device_base + ISL38XX_CTRL_STAT_REG)); > +#endif > udelay(ISL38XX_WRITEIO_DELAY); > > #if VERBOSE > SHOW_ERROR_MESSAGES > From jesse.brandeburg@intel.com Thu Jun 16 17:50:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 16 Jun 2005 17:50:27 -0700 (PDT) Received: from orsfmr005.jf.intel.com (fmr20.intel.com [134.134.136.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5H0oMH9005566 for ; Thu, 16 Jun 2005 17:50:23 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr005.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j5H0mQiF032057; Fri, 17 Jun 2005 00:48:26 GMT Received: from [134.134.3.122] (jbrandeb-desk.amr.corp.intel.com [134.134.3.122]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with ESMTP id j5H0mNq1019180; Fri, 17 Jun 2005 00:48:26 GMT Message-ID: <42B21DD7.3@intel.com> Date: Thu, 16 Jun 2005 17:48:23 -0700 From: Jesse Brandeburg Organization: Intel Corporation User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@vger.kernel.org, shemminger@osdl.org, jheffner@psc.edu, netdev@oss.sgi.com Subject: Re: [ipv4, e1000] multi client throughput testing References: <20050610.171127.59653238.davem@davemloft.net> In-Reply-To: <20050610.171127.59653238.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2444 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jesse.brandeburg@intel.com Precedence: bulk X-list: netdev Content-Length: 1333 Lines: 30 Ick, I get to be the bearer of my own bad news. I seem to mostly have a client misconfiguration problem. David S. Miller wrote: > From: Jesse Brandeburg > Date: Fri, 10 Jun 2005 16:56:50 -0700 (Pacific Daylight Time) > > > What did i miss? > > Thanks for all of the data Jesse. I'll try to sift through it this > weekend. Well, as it turns out I was sort of right all along, when i was thinking that the client's tcp windows were not being serviced quickly enough. First, I figured out that the windows client machines have a good "out of the box" behavior when receiving tcp data from linux. Second, the clients sending data to the server were maxing out their tcp window at 64k and did *not* have rfc1323 enabled. After enabling rfc1323 and upping the max window size to 128k, each client's throughput went up quite a bit (there may be more headroom i didn't test yet). Total throughput for us in this case is around 1560Mb/s now. I'd like to see it at 1700-1800 but I don't think it will do it. We're still running almost entirely in interrupt mode (with NAPI enabled) at about 7-8000 ints/s Now I will go back and run with the netfilter enabled kernel and take a look again at the faster replenish/fairness patches I've been working on. Thanks for your attention, Jesse From maca02@atlas.cz Fri Jun 17 06:52:57 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 06:53:01 -0700 (PDT) Received: from localhost.localdomain (maca.fortech.cz [213.250.192.50]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HDqsH9023925 for ; Fri, 17 Jun 2005 06:52:56 -0700 Received: from localhost (localhost [127.0.0.1]) by localhost.localdomain (8.12.11/8.12.8) with ESMTP id j5HDpSJ3020653 for ; Fri, 17 Jun 2005 14:51:28 +0100 Date: Fri, 17 Jun 2005 14:51:28 +0100 (BST) From: =?ISO-8859-2?Q?Tom=E1=B9_Macek?= X-X-Sender: root@localhost.localdomain To: netdev@oss.sgi.com Subject: receive only one record from the routing table Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2445 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: maca02@atlas.cz Precedence: bulk X-list: netdev Content-Length: 7548 Lines: 249 Hi, I have this program (see below), and I want him to find a certain route record in the kernel routing table. I copied this somewhere from the internet and add the NetlinkAddAttr() function, that should add an request on the destination address. But the program prints always the WHOLE routing table. But I would like to have a program, that would RECEIVE only one route from the kernel routing table for certain destination address only. Is it possible to do in rtnetlink? I wasn't able to find the answer on google and my tries all failed. Any help will be very appreciated! Tomas ================================================================================== #include #include #include #include #include #include #include #include #include #include #define BUFSIZE 8192 struct route_info { u_int dstAddr; u_int dstMask; u_int srcAddr; u_int gateWay; char ifName[IF_NAMESIZE]; }; int NetlinkAddAttr(struct nlmsghdr *n, int maxlen, int type, void *data, int alen) { int len = RTA_LENGTH(alen); struct rtattr *rta; if (NLMSG_ALIGN(n->nlmsg_len) + len > maxlen) return -1; rta = (struct rtattr *)(((char *)n) + NLMSG_ALIGN(n->nlmsg_len)); rta->rta_type = type; rta->rta_len = len; memcpy(RTA_DATA(rta), data, alen); n->nlmsg_len = NLMSG_ALIGN(n->nlmsg_len) + len; return 0; } int readNlSock(int sockFd, char *bufPtr, int seqNum, int pId) { struct nlmsghdr *nlHdr; int readLen = 0, msgLen = 0; do { /* Recieve response from the kernel */ if((readLen = recv(sockFd, bufPtr, BUFSIZE - msgLen, 0)) < 0){ perror("SOCK READ: "); return -1; } nlHdr = (struct nlmsghdr *)bufPtr; /* Check if the header is valid */ if((NLMSG_OK(nlHdr, readLen) == 0) || (nlHdr->nlmsg_type == NLMSG_ERROR)) { perror("Error in recieved packet"); return -1; } /* Check if the its the last message */ if(nlHdr->nlmsg_type == NLMSG_DONE){ break; } else{ /* Else move the pointer to buffer appropriately */ bufPtr += readLen; msgLen += readLen; } /* Check if its a multi part message */ if((nlHdr->nlmsg_flags & NLM_F_MULTI) == 0){ /* return if its not */ break; } } while((nlHdr->nlmsg_seq != seqNum) || (nlHdr->nlmsg_pid != pId)); return msgLen; } unsigned long netmask_from_bitcount(unsigned int bits) { return 0xffffffff << (32 - bits); } /* For printing the routes. */ void printRoute(struct route_info *rtInfo) { char tempBuf[512]; /* Print Destination address */ if(rtInfo->dstAddr != 0) strcpy(tempBuf, (char *)inet_ntoa(rtInfo->dstAddr)); else sprintf(tempBuf,"*.*.*.*\t\t"); fprintf(stdout,"%s\t\t", tempBuf); /* Print Gateway address */ if(rtInfo->gateWay != 0) strcpy(tempBuf, (char *)inet_ntoa(rtInfo->gateWay)); else sprintf(tempBuf,"*.*.*.*\t\t"); fprintf(stdout,"%s\t\t", tempBuf); /* Print Interface Name*/ fprintf(stdout,"%s\t\t", rtInfo->ifName); /* Print Source address */ if (rtInfo->srcAddr != 0) strcpy(tempBuf, (char *)inet_ntoa(rtInfo->srcAddr)); else sprintf(tempBuf,"*.*.*.*\t\t"); if (rtInfo->dstMask != 0) { struct in_addr ia; ia.s_addr = htonl(netmask_from_bitcount(rtInfo->dstMask)); sprintf(tempBuf, "%s\t", inet_ntoa(ia)); } else { sprintf(tempBuf, "0.0.0.0\t"); } fprintf(stdout,"%s\n", tempBuf); } void parseRoutes(struct nlmsghdr *nlHdr, struct route_info *rtInfo, char *find) { struct rtmsg *rtMsg; struct rtattr *rtAttr; int rtLen; char *tempBuf = NULL; struct in_addr ai; if (!inet_aton(find, &ai)) { return; } tempBuf = (char *)malloc(100); rtMsg = (struct rtmsg *)NLMSG_DATA(nlHdr); /* If the route is not for AF_INET or does not belong to main routing table then return. */ if((rtMsg->rtm_family != AF_INET) || (rtMsg->rtm_table != RT_TABLE_MAIN)) return; /* get the rtattr field */ rtAttr = (struct rtattr *)RTM_RTA(rtMsg); rtLen = RTM_PAYLOAD(nlHdr); rtInfo->dstMask = rtMsg->rtm_dst_len; /* Netmask */ for( ; RTA_OK(rtAttr,rtLen);rtAttr = RTA_NEXT(rtAttr,rtLen)) { switch(rtAttr->rta_type){ case RTA_OIF: if_indextoname(*(int *)RTA_DATA(rtAttr), rtInfo->ifName); break; case RTA_GATEWAY: rtInfo->gateWay = *(u_int *)RTA_DATA(rtAttr); break; case RTA_PREFSRC: rtInfo->srcAddr = *(u_int *)RTA_DATA(rtAttr); break; case RTA_DST: rtInfo->dstAddr = *(u_int *)RTA_DATA(rtAttr); break; } } /* if (rtInfo->dstAddr == ai.s_addr) { printf("match %s\n", inet_ntoa(rtInfo->dstAddr)); return; } */ printRoute(rtInfo); free(tempBuf); return; } int main(int argc, char *argv[]) { struct nlmsghdr *nlMsg; struct rtmsg *rtMsg; struct route_info *rtInfo; char msgBuf[BUFSIZE]; int sock, len, msgSeq = 0; char buff[1024]; /* Create Socket */ if((sock = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)) < 0) perror("Socket Creation: "); /* Initialize the buffer */ memset(msgBuf, 0, BUFSIZE); /* point the header and the msg structure pointers into the buffer */ nlMsg = (struct nlmsghdr *)msgBuf; rtMsg = (struct rtmsg *)NLMSG_DATA(nlMsg); /* Fill in the nlmsg header*/ nlMsg->nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)); // Length of message. nlMsg->nlmsg_type = RTM_GETROUTE; // Get the routes from kernel routing table . nlMsg->nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; // The message is a request for dump. nlMsg->nlmsg_seq = msgSeq++; // Sequence of the message packet. nlMsg->nlmsg_pid = getpid(); // PID of process sending the request. char *cp; unsigned int xx[4]; int i = 0; unsigned char *ap = (unsigned char *)xx; for (cp = argv[1], i = 0; *cp; cp++) { if (*cp <= '9' && *cp >= '0') { ap[i] = 10*ap[i] + (*cp-'0'); continue; } if (*cp == '.' && ++i <= 3) continue; return -1; } NetlinkAddAttr(nlMsg, sizeof(nlMsg), RTA_DST, &xx, 4); /* Send the request */ if(send(sock, nlMsg, nlMsg->nlmsg_len, 0) < 0){ printf("Write To Socket Failed...\n"); return -1; } /* Read the response */ if((len = readNlSock(sock, msgBuf, msgSeq, getpid())) < 0) { printf("Read From Socket Failed...\n"); return -1; } /* Parse and print the response */ rtInfo = (struct route_info *)malloc(sizeof(struct route_info)); fprintf(stdout, "Destination\t\tGateway\t\tInterface\t\tSource\t\tNetmask\n"); for( ; NLMSG_OK(nlMsg,len); nlMsg = NLMSG_NEXT(nlMsg,len)) { memset(rtInfo, 0, sizeof(struct route_info)); parseRoutes(nlMsg, rtInfo, argv[1]); } free(rtInfo); close(sock); return 0; } From tgraf@suug.ch Fri Jun 17 07:16:28 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 07:16:30 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HEGRH9026429 for ; Fri, 17 Jun 2005 07:16:28 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id DD24A1C0EB; Fri, 17 Jun 2005 16:15:27 +0200 (CEST) Date: Fri, 17 Jun 2005 16:15:27 +0200 From: Thomas Graf To: =?iso-8859-1?B?VG9t4T8=?= Macek Cc: netdev@oss.sgi.com Subject: Re: receive only one record from the routing table Message-ID: <20050617141527.GN22463@postel.suug.ch> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 2446 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 411 Lines: 7 * Tom?? Macek 2005-06-17 14:51 > nlMsg->nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)); // Length of message. > nlMsg->nlmsg_type = RTM_GETROUTE; // Get the routes from kernel routing table . > > nlMsg->nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; // The message is a request for dump. Omit NLM_F_DUMP and you'll be fine, see rfc3549. From jaegert@us.ibm.com Fri Jun 17 07:18:19 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 07:18:25 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HEIHH9027060 for ; Fri, 17 Jun 2005 07:18:18 -0700 Received: from sp1n294en1.watson.ibm.com (sp1n294en1.watson.ibm.com [129.34.20.40]) by igw2.watson.ibm.com (8.13.1/8.13.1/8.13.1-2005-04-25 igw) with ESMTP id j5HEHGu8012113; Fri, 17 Jun 2005 10:17:21 -0400 Received: from sp1n294en1.watson.ibm.com (localhost [127.0.0.1]) by sp1n294en1.watson.ibm.com (8.11.7-20030924/8.11.7/01-14-2004_2) with ESMTP id j5HEFsp516480; Fri, 17 Jun 2005 10:15:54 -0400 Received: from [9.2.18.177] (dyn9002018177.watson.ibm.com [9.2.18.177]) by sp1n294en1.watson.ibm.com (8.11.7-20030924/8.11.7/01-14-2004_1) with ESMTP id j5HEFm3575466; Fri, 17 Jun 2005 10:15:49 -0400 Subject: [PATCH 1/2] Update: LSM-IPSec Networking Hooks From: jaegert To: jmorris@redhat.com, davem@davemloft.net, herbert@gondor.apana.org.au, netdev@oss.sgi.com, chrisw@osdl.org Cc: jaegert@us.ibm.com, sergeh@us.ibm.com, latten@austin.ibm.com.sds.sgi.com Content-Type: text/plain Message-Id: <1119013877.30404.2720.camel@dyn9002018177.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) Date: Fri, 17 Jun 2005 09:11:17 -0400 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5HEIHH9027060 X-archive-position: 2447 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jaegert@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 43197 Lines: 1429 Hi, Just a followup on the patch sent on 6/14. I checked the code paths for checking the length of the user-provided security contexts via pfkey and xfrm_user and find that the these interfaces ensure that the length refers only to user data. I am resending the patch due to a couple of minor mods -- e.g., I need to apply one James's suggestions in a second place. This patch subsumes the previous. The 2/2 patch is unchanged. Regards, Trent. =================================================== This patch series implements per packet access control via the extension of the Linux Security Modules (LSM) interface by hooks in the XFRM and pfkey subsystems that leverage IPSec security associations to label packets. Extensions to the SELinux LSM are included that leverage the patch for this purpose. This patch implements the changes necessary to the XFRM subsystem, pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a socket to use only authorized security associations (or no security association) to send/receive network packets. Patch purpose: The patch is designed to enable access control per packets based on the strongly authenticated IPSec security association. Such access controls augment the existing ones based on network interface and IP address. The former are very coarse-grained, and the latter can be spoofed. By using IPSec, the system can control access to remote hosts based on cryptographic keys generated using the IPSec mechanism. This enables access control on a per-machine basis or per-application if the remote machine is running the same mechanism and trusted to enforce the access control policy. Patch design approach: The overall approach is that policy (xfrm_policy) entries set by user-level programs (e.g., setkey for ipsec-tools) are extended with a security context that is used at policy selection time in the XFRM subsystem to restrict the sockets that can send/receive packets via security associations (xfrm_states) that are built from those policies. A presentation available at www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf from the SELinux symposium describes the overall approach. Patch implementation details: On output, the policy retrieved (via xfrm_policy_lookup or xfrm_sk_policy_lookup) must be authorized for the security context of the socket and the same security context is required for resultant security association (retrieved or negotiated via racoon in ipsec-tools). This is enforced in xfrm_state_find. On input, the policy retrieved must also be authorized for the socket (at __xfrm_policy_check), and the security context of the policy must also match the security association being used. The patch has virtually no impact on packets that do not use IPSec. The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as before. Also, if IPSec is used without security contexts, the impact is minimal. The LSM must allow such policies to be selected for the combination of socket and remote machine, but subsequent IPSec processing proceeds as in the original case. Testing: The pfkey interface is tested using the ipsec-tools. ipsec-tools have been modified (a separate ipsec-tools patch is available for version 0.5) that supports assignment of xfrm_policy entries and security associations with security contexts via setkey and the negotiation using the security contexts via racoon. The xfrm_user interface is tested via ad hoc programs that set security contexts. These programs are also available from me, and contain programs for setting, getting, and deleting policy for testing this interface. Testing of sa functions was done by tracing kernel behavior. --- include/linux/pfkeyv2.h | 13 +++ include/linux/security.h | 119 +++++++++++++++++++++++++++++++++++ include/linux/xfrm.h | 36 ++++++++++ include/net/flow.h | 5 - include/net/xfrm.h | 21 ++++++ net/core/flow.c | 4 - net/ipv4/xfrm4_policy.c | 2 net/ipv6/xfrm6_policy.c | 2 net/key/af_key.c | 150 +++++++++++++++++++++++++++++++++++++++++++- net/xfrm/xfrm_policy.c | 66 ++++++++++++------- net/xfrm/xfrm_state.c | 16 +++- net/xfrm/xfrm_user.c | 158 +++++++++++++++++++++++++++++++++++++++++++++-- security/Kconfig | 13 +++ security/dummy.c | 37 +++++++++++ 14 files changed, 599 insertions(+), 43 deletions(-) diff -puN include/linux/pfkeyv2.h~lsm-xfrm-nethooks include/linux/pfkeyv2.h --- linux-2.6.12-rc6-xfrm/include/linux/pfkeyv2.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/linux/pfkeyv2.h 2005-06-13 13:22:59.000000000 -0400 @@ -216,6 +216,16 @@ struct sadb_x_nat_t_port { } __attribute__((packed)); /* sizeof(struct sadb_x_nat_t_port) == 8 */ +/* Generic LSM security context */ +struct sadb_x_sec_ctx { + uint16_t sadb_x_sec_len; + uint16_t sadb_x_sec_exttype; + uint8_t sadb_x_ctx_alg; /* LSMs: e.g., selinux == 1 */ + uint8_t sadb_x_ctx_doi; + uint16_t sadb_x_ctx_len; +} __attribute__((packed)); +/* sizeof(struct sadb_sec_ctx) = 8 */ + /* Message types */ #define SADB_RESERVED 0 #define SADB_GETSPI 1 @@ -324,7 +334,8 @@ struct sadb_x_nat_t_port { #define SADB_X_EXT_NAT_T_SPORT 21 #define SADB_X_EXT_NAT_T_DPORT 22 #define SADB_X_EXT_NAT_T_OA 23 -#define SADB_EXT_MAX 23 +#define SADB_X_EXT_SEC_CTX 24 +#define SADB_EXT_MAX 24 /* Identity Extension values */ #define SADB_IDENTTYPE_RESERVED 0 diff -puN include/linux/security.h~lsm-xfrm-nethooks include/linux/security.h --- linux-2.6.12-rc6-xfrm/include/linux/security.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/linux/security.h 2005-06-13 13:22:59.000000000 -0400 @@ -58,6 +58,12 @@ struct sk_buff; struct sock; struct sockaddr; struct socket; +struct flowi; +struct dst_entry; +struct xfrm_selector; +struct xfrm_policy; +struct xfrm_state; +struct xfrm_user_sec_ctx; extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb); extern int cap_netlink_recv(struct sk_buff *skb); @@ -802,6 +808,50 @@ struct swap_info_struct; * @sk_free_security: * Deallocate security structure. * + * Security hooks for XFRM operations. + * + * @xfrm_policy_alloc_security: + * @xp contains the xfrm_policy being added to Security Policy Database + * used by the XFRM system. + * @sec_ctx contains the security context information being provided by + * the user-level policy update program (e.g., setkey). + * Allocate a security structure to the xp->selector.security field. + * The security field is initialized to NULL when the xfrm_policy is + * allocated. + * Return 0 if operation was successful (memory to allocate, legal context) + * @xfrm_policy_clone_security: + * @old contains an existing xfrm_policy in the SPD. + * @new contains a new xfrm_policy being cloned from old. + * Allocate a security structure to the new->selector.security field + * that contains the information from the old->selector.security field. + * Return 0 if operation was successful (memory to allocate). + * @xfrm_policy_free_security: + * @xp contains the xfrm_policy + * Deallocate xp->selector.security. + * @xfrm_state_alloc_security: + * @x contains the xfrm_state being added to the Security Association + * Database by the XFRM system. + * @sec_ctx contains the security context information being provided by + * the user-level SA generation program (e.g., setkey or racoon). + * Allocate a security structure to the x->sel.security field. The + * security field is initialized to NULL when the xfrm_state is + * allocated. + * Return 0 if operation was successful (memory to allocate, legal context). + * @xfrm_state_free_security: + * @x contains the xfrm_state. + * Deallocate x>sel.security. + * @xfrm_policy_lookup: + * @sk contains the sock that is requesting to either send or receive a + * network communication. + * @sel contains the selector that matches the communication end points of + * the network communication (source, destination, and ports). + * @fl contains the flowi that indicates the communication protocol. + * @dir contains the direction of the flow (input or output). + * Check permission when a sock selects a xfrm_policy for processing + * XFRMs on a packet. The hook is called when selecting either a + * per-socket policy or a generic xfrm policy. + * Return 0 if permission is granted. + * * Security hooks affecting all System V IPC operations. * * @ipc_permission: @@ -1243,6 +1293,15 @@ struct security_operations { int (*sk_alloc_security) (struct sock *sk, int family, int priority); void (*sk_free_security) (struct sock *sk); #endif /* CONFIG_SECURITY_NETWORK */ + +#ifdef CONFIG_SECURITY_NETWORK_XFRM + int (*xfrm_policy_alloc_security) (struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx); + int (*xfrm_policy_clone_security) (struct xfrm_policy *old, struct xfrm_policy *new); + void (*xfrm_policy_free_security) (struct xfrm_policy *xp); + int (*xfrm_state_alloc_security) (struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx); + void (*xfrm_state_free_security) (struct xfrm_state *x); + int (*xfrm_policy_lookup)(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir); +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ }; /* global variables */ @@ -2854,5 +2913,65 @@ static inline void security_sk_free(stru } #endif /* CONFIG_SECURITY_NETWORK */ +#ifdef CONFIG_SECURITY_NETWORK_XFRM +static inline int security_xfrm_policy_alloc(struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx) +{ + return security_ops->xfrm_policy_alloc_security(xp, sec_ctx); +} + +static inline int security_xfrm_policy_clone(struct xfrm_policy *old, struct xfrm_policy *new) +{ + return security_ops->xfrm_policy_clone_security(old, new); +} + +static inline void security_xfrm_policy_free(struct xfrm_policy *xp) +{ + security_ops->xfrm_policy_free_security(xp); +} + +static inline int security_xfrm_state_alloc(struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx) +{ + return security_ops->xfrm_state_alloc_security(x, sec_ctx); +} + +static inline void security_xfrm_state_free(struct xfrm_state *x) +{ + security_ops->xfrm_state_free_security(x); +} + +static inline int security_xfrm_policy_lookup(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir) +{ + return security_ops->xfrm_policy_lookup(sk, sel, fl, dir); +} +#else /* CONFIG_SECURITY_NETWORK_XFRM */ +static inline int security_xfrm_policy_alloc(struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static inline int security_xfrm_policy_clone(struct xfrm_policy *old, struct xfrm_policy *new) +{ + return 0; +} + +static inline void security_xfrm_policy_free(struct xfrm_policy *xp) +{ +} + +static inline int security_xfrm_state_alloc(struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static inline void security_xfrm_state_free(struct xfrm_state *x) +{ +} + +static inline int security_xfrm_policy_lookup(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir) +{ + return 0; +} +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ + #endif /* ! __LINUX_SECURITY_H */ diff -puN include/linux/xfrm.h~lsm-xfrm-nethooks include/linux/xfrm.h --- linux-2.6.12-rc6-xfrm/include/linux/xfrm.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/linux/xfrm.h 2005-06-13 13:22:59.000000000 -0400 @@ -27,6 +27,22 @@ struct xfrm_id __u8 proto; }; +struct xfrm_sec_ctx { + __u8 ctx_doi; + __u8 ctx_alg; + __u16 ctx_len; + __u32 ctx_sid; + char ctx_str[0]; +}; + +/* Security Context Domains of Interpretation */ +#define XFRM_SC_DOI_RESERVED 0 +#define XFRM_SC_DOI_LSM 1 + +/* Security Context Algorithms */ +#define XFRM_SC_ALG_RESERVED 0 +#define XFRM_SC_ALG_SELINUX 1 + /* Selector, used as selector both on policy rules (SPD) and SAs. */ struct xfrm_selector @@ -43,8 +59,15 @@ struct xfrm_selector __u8 proto; int ifindex; uid_t user; + struct xfrm_sec_ctx *security; }; +/* All but the security field */ +static inline int xfrm_selector_base_size(void) +{ + return sizeof(struct xfrm_selector) - sizeof(struct xfrm_sec_ctx *); +} + #define XFRM_INF (~(__u64)0) struct xfrm_lifetime_cfg @@ -146,6 +169,18 @@ enum { #define XFRM_NR_MSGTYPES (XFRM_MSG_MAX + 1 - XFRM_MSG_BASE) +/* + * Generic LSM security context for comunicating to user space + * NOTE: Same format as sadb_x_sec_ctx + */ +struct xfrm_user_sec_ctx { + __u16 len; + __u16 exttype; + __u8 ctx_alg; /* LSMs: e.g., selinux == 1 */ + __u8 ctx_doi; + __u16 ctx_len; +}; + struct xfrm_user_tmpl { struct xfrm_id id; __u16 family; @@ -173,6 +208,7 @@ enum xfrm_attr_type_t { XFRMA_ALG_CRYPT, /* struct xfrm_algo */ XFRMA_ALG_COMP, /* struct xfrm_algo */ XFRMA_ENCAP, /* struct xfrm_algo + struct xfrm_encap_tmpl */ + XFRMA_SEC_CTX, /* struct xfrm_sec_ctx */ XFRMA_TMPL, /* 1 or more struct xfrm_user_tmpl */ __XFRMA_MAX diff -puN include/net/flow.h~lsm-xfrm-nethooks include/net/flow.h --- linux-2.6.12-rc6-xfrm/include/net/flow.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/net/flow.h 2005-06-13 13:22:59.000000000 -0400 @@ -84,10 +84,11 @@ struct flowi { #define FLOW_DIR_OUT 1 #define FLOW_DIR_FWD 2 -typedef void (*flow_resolve_t)(struct flowi *key, u16 family, u8 dir, +struct sock; +typedef void (*flow_resolve_t)(struct flowi *key, struct sock *sk, u16 family, u8 dir, void **objp, atomic_t **obj_refp); -extern void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir, +extern void *flow_cache_lookup(struct flowi *key, struct sock *sk, u16 family, u8 dir, flow_resolve_t resolver); extern void flow_cache_flush(void); extern atomic_t flow_cache_genid; diff -puN include/net/xfrm.h~lsm-xfrm-nethooks include/net/xfrm.h --- linux-2.6.12-rc6-xfrm/include/net/xfrm.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/net/xfrm.h 2005-06-13 13:22:59.000000000 -0400 @@ -493,6 +493,27 @@ xfrm_selector_match(struct xfrm_selector return 0; } +/* If neither has a context --> match + Otherwise, both must have a context and the sids, doi, alg must match */ +static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct xfrm_sec_ctx *s2) +{ + return ((!s1 && !s2) || + (s1 && s2 && + (s1->ctx_sid == s2->ctx_sid) && + (s1->ctx_doi == s2->ctx_doi) && + (s1->ctx_alg == s2->ctx_alg))); +} + +static inline struct xfrm_sec_ctx *xfrm_policy_security(struct xfrm_policy *xp) +{ + return (xp ? xp->selector.security : NULL); +} + +static inline struct xfrm_sec_ctx *xfrm_state_security(struct xfrm_state *x) +{ + return (x ? x->sel.security : NULL); +} + /* A struct encoding bundle of transformations to apply to some set of flow. * * dst->child points to the next element of bundle. diff -puN net/core/flow.c~lsm-xfrm-nethooks net/core/flow.c --- linux-2.6.12-rc6-xfrm/net/core/flow.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/core/flow.c 2005-06-13 13:22:59.000000000 -0400 @@ -162,7 +162,7 @@ static int flow_key_compare(struct flowi return 0; } -void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir, +void *flow_cache_lookup(struct flowi *key, struct sock *sk, u16 family, u8 dir, flow_resolve_t resolver) { struct flow_cache_entry *fle, **head; @@ -221,7 +221,7 @@ nocache: void *obj; atomic_t *obj_ref; - resolver(key, family, dir, &obj, &obj_ref); + resolver(key, sk, family, dir, &obj, &obj_ref); if (fle) { fle->genid = atomic_read(&flow_cache_genid); diff -puN net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks net/ipv4/xfrm4_policy.c --- linux-2.6.12-rc6-xfrm/net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/ipv4/xfrm4_policy.c 2005-06-13 13:22:59.000000000 -0400 @@ -36,6 +36,8 @@ __xfrm4_find_bundle(struct flowi *fl, st if (xdst->u.rt.fl.oif == fl->oif && /*XXX*/ xdst->u.rt.fl.fl4_dst == fl->fl4_dst && xdst->u.rt.fl.fl4_src == fl->fl4_src && + xfrm_sec_ctx_match(xfrm_policy_security(policy), + xfrm_state_security(dst->xfrm)) && xfrm_bundle_ok(xdst, fl, AF_INET)) { dst_clone(dst); break; diff -puN net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks net/ipv6/xfrm6_policy.c --- linux-2.6.12-rc6-xfrm/net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/ipv6/xfrm6_policy.c 2005-06-13 13:22:59.000000000 -0400 @@ -54,6 +54,8 @@ __xfrm6_find_bundle(struct flowi *fl, st xdst->u.rt6.rt6i_src.plen); if (ipv6_addr_equal(&xdst->u.rt6.rt6i_dst.addr, &fl_dst_prefix) && ipv6_addr_equal(&xdst->u.rt6.rt6i_src.addr, &fl_src_prefix) && + xfrm_sec_ctx_match(xfrm_policy_security(policy), + xfrm_state_security(dst->xfrm)) && xfrm_bundle_ok(xdst, fl, AF_INET6)) { dst_clone(dst); break; diff -puN net/key/af_key.c~lsm-xfrm-nethooks net/key/af_key.c --- linux-2.6.12-rc6-xfrm/net/key/af_key.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/key/af_key.c 2005-06-16 14:48:27.000000000 -0400 @@ -336,6 +336,7 @@ static u8 sadb_ext_min_len[] = { [SADB_X_EXT_NAT_T_SPORT] = (u8) sizeof(struct sadb_x_nat_t_port), [SADB_X_EXT_NAT_T_DPORT] = (u8) sizeof(struct sadb_x_nat_t_port), [SADB_X_EXT_NAT_T_OA] = (u8) sizeof(struct sadb_address), + [SADB_X_EXT_SEC_CTX] = (u8) sizeof(struct sadb_x_sec_ctx), }; /* Verify sadb_address_{len,prefixlen} against sa_family. */ @@ -383,6 +384,40 @@ static int verify_address_len(void *p) return 0; } +static inline int verify_sec_ctx_len(void *p) +{ + struct sadb_x_sec_ctx *sec_ctx = (struct sadb_x_sec_ctx *)p; + int len = 0; + + len += sizeof(struct sadb_x_sec_ctx); + len += sec_ctx->sadb_x_ctx_len; + len += sizeof(uint64_t) - 1; + len /= sizeof(uint64_t); + + if (sec_ctx->sadb_x_sec_len != len) + return -EINVAL; + + return 0; +} + +static inline struct xfrm_user_sec_ctx *pfkey_sadb2xfrm_user_ctx(struct sadb_x_sec_ctx *sec_ctx) +{ + struct xfrm_user_sec_ctx *uctx = NULL; + + if (sec_ctx) { + int ctx_size = sec_ctx->sadb_x_ctx_len; + uctx = kmalloc((sizeof(*uctx)+ctx_size), GFP_KERNEL); + uctx->len = sec_ctx->sadb_x_sec_len; + uctx->exttype = sec_ctx->sadb_x_sec_exttype; + uctx->ctx_doi = sec_ctx->sadb_x_ctx_doi; + uctx->ctx_alg = sec_ctx->sadb_x_ctx_alg; + uctx->ctx_len = sec_ctx->sadb_x_ctx_len; + memcpy(uctx + 1, sec_ctx + 1, + uctx->ctx_len); + } + return uctx; +} + static int present_and_same_family(struct sadb_address *src, struct sadb_address *dst) { @@ -438,6 +473,10 @@ static int parse_exthdrs(struct sk_buff if (verify_address_len(p)) return -EINVAL; } + if (ext_type == SADB_X_EXT_SEC_CTX) { + if (verify_sec_ctx_len(p)) + return -EINVAL; + } ext_hdrs[ext_type-1] = p; } p += ext_len; @@ -586,6 +625,9 @@ static struct sk_buff * pfkey_xfrm_state struct sadb_key *key; struct sadb_x_sa2 *sa2; struct sockaddr_in *sin; + struct sadb_x_sec_ctx *sec_ctx; + struct xfrm_sec_ctx *xfrm_ctx; + int ctx_size = 0; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) struct sockaddr_in6 *sin6; #endif @@ -609,6 +651,12 @@ static struct sk_buff * pfkey_xfrm_state sizeof(struct sadb_address)*2 + sockaddr_size*2 + sizeof(struct sadb_x_sa2); + + if ((xfrm_ctx = xfrm_state_security(x))) { + ctx_size = PFKEY_ALIGN8(xfrm_ctx->ctx_len); + size += sizeof(struct sadb_x_sec_ctx) + ctx_size; + } + /* identity & sensitivity */ if ((x->props.family == AF_INET && @@ -892,6 +940,20 @@ static struct sk_buff * pfkey_xfrm_state n_port->sadb_x_nat_t_port_reserved = 0; } + /* security context */ + if (xfrm_ctx) { + sec_ctx = (struct sadb_x_sec_ctx *) skb_put(skb, + sizeof(struct sadb_x_sec_ctx) + ctx_size); + sec_ctx->sadb_x_sec_len = + (sizeof(struct sadb_x_sec_ctx) + ctx_size) / sizeof(uint64_t); + sec_ctx->sadb_x_sec_exttype = SADB_X_EXT_SEC_CTX; + sec_ctx->sadb_x_ctx_doi = xfrm_ctx->ctx_doi; + sec_ctx->sadb_x_ctx_alg = xfrm_ctx->ctx_alg; + sec_ctx->sadb_x_ctx_len = xfrm_ctx->ctx_len; + memcpy(sec_ctx + 1, xfrm_ctx->ctx_str, + xfrm_ctx->ctx_len); + } + return skb; } @@ -902,6 +964,7 @@ static struct xfrm_state * pfkey_msg2xfr struct sadb_lifetime *lifetime; struct sadb_sa *sa; struct sadb_key *key; + struct sadb_x_sec_ctx *sec_ctx; uint16_t proto; int err; @@ -984,6 +1047,17 @@ static struct xfrm_state * pfkey_msg2xfr x->lft.soft_add_expires_seconds = lifetime->sadb_lifetime_addtime; x->lft.soft_use_expires_seconds = lifetime->sadb_lifetime_usetime; } + + sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1]; + if (sec_ctx != NULL) { + struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_ctx(sec_ctx); + + err = security_xfrm_state_alloc(x, uctx); + kfree(uctx); + if (err) + goto out; + } + key = (struct sadb_key*) ext_hdrs[SADB_EXT_KEY_AUTH-1]; if (sa->sadb_sa_auth) { int keysize = 0; @@ -1634,6 +1708,18 @@ parse_ipsecrequests(struct xfrm_policy * return 0; } +static inline int pfkey_xfrm_policy2sec_ctx_size(struct xfrm_policy *xp) +{ + struct xfrm_sec_ctx *xfrm_ctx = xfrm_policy_security(xp); + + if (xfrm_ctx) { + int len = sizeof(struct sadb_x_sec_ctx); + len += xfrm_ctx->ctx_len; + return PFKEY_ALIGN8(len); + } + return 0; +} + static int pfkey_xfrm_policy2msg_size(struct xfrm_policy *xp) { int sockaddr_size = pfkey_sockaddr_size(xp->family); @@ -1647,7 +1733,8 @@ static int pfkey_xfrm_policy2msg_size(st (sockaddr_size * 2) + sizeof(struct sadb_x_policy) + (xp->xfrm_nr * (sizeof(struct sadb_x_ipsecrequest) + - (socklen * 2))); + (socklen * 2))) + + pfkey_xfrm_policy2sec_ctx_size(xp); } static struct sk_buff * pfkey_xfrm_policy2msg_prep(struct xfrm_policy *xp) @@ -1671,6 +1758,8 @@ static void pfkey_xfrm_policy2msg(struct struct sadb_lifetime *lifetime; struct sadb_x_policy *pol; struct sockaddr_in *sin; + struct sadb_x_sec_ctx *sec_ctx; + struct xfrm_sec_ctx *xfrm_ctx; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) struct sockaddr_in6 *sin6; #endif @@ -1855,19 +1944,35 @@ static void pfkey_xfrm_policy2msg(struct } } } + + /* security context */ + if ((xfrm_ctx = xfrm_policy_security(xp))) { + int ctx_size = pfkey_xfrm_policy2sec_ctx_size(xp); + + sec_ctx = (struct sadb_x_sec_ctx *) skb_put(skb, ctx_size); + sec_ctx->sadb_x_sec_len = ctx_size / sizeof(uint64_t); + sec_ctx->sadb_x_sec_exttype = SADB_X_EXT_SEC_CTX; + sec_ctx->sadb_x_ctx_doi = xfrm_ctx->ctx_doi; + sec_ctx->sadb_x_ctx_alg = xfrm_ctx->ctx_alg; + sec_ctx->sadb_x_ctx_len = xfrm_ctx->ctx_len; + memcpy(sec_ctx + 1, xfrm_ctx->ctx_str, + xfrm_ctx->ctx_len); + } + hdr->sadb_msg_len = size / sizeof(uint64_t); hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - int err; + int err = 0; struct sadb_lifetime *lifetime; struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; struct sk_buff *out_skb; struct sadb_msg *out_hdr; + struct sadb_x_sec_ctx *sec_ctx; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1914,6 +2019,18 @@ static int pfkey_spdadd(struct sock *sk, if (xp->selector.dport) xp->selector.dport_mask = ~0; + sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1]; + if (sec_ctx != NULL) { + struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_ctx(sec_ctx); + + err = security_xfrm_policy_alloc(xp, uctx); + kfree(uctx); + if (err) { + err = -EINVAL; + goto out; + } + } + xp->lft.soft_byte_limit = XFRM_INF; xp->lft.hard_byte_limit = XFRM_INF; xp->lft.soft_packet_limit = XFRM_INF; @@ -1963,6 +2080,7 @@ static int pfkey_spdadd(struct sock *sk, return 0; out: + security_xfrm_policy_free(xp); kfree(xp); return err; } @@ -1972,10 +2090,11 @@ static int pfkey_spddelete(struct sock * int err; struct sadb_address *sa; struct sadb_x_policy *pol; - struct xfrm_policy *xp; + struct xfrm_policy *xp, tmp; struct sk_buff *out_skb; struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct sadb_x_sec_ctx *sec_ctx; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2004,7 +2123,17 @@ static int pfkey_spddelete(struct sock * if (sel.dport) sel.dport_mask = ~0; - xp = xfrm_policy_bysel(pol->sadb_x_policy_dir-1, &sel, 1); + sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1]; + memcpy(&tmp.selector, &sel, sizeof(struct xfrm_selector)); + if (sec_ctx != NULL) { + err = security_xfrm_policy_alloc( + &tmp, (struct xfrm_user_sec_ctx *)sec_ctx); + if (err) + return err; + } + + xp = xfrm_policy_bysel(pol->sadb_x_policy_dir-1, &tmp.selector, 1); + security_xfrm_policy_free(&tmp); if (xp == NULL) return -ENOENT; @@ -2482,6 +2611,7 @@ static struct xfrm_policy *pfkey_compile { struct xfrm_policy *xp; struct sadb_x_policy *pol = (struct sadb_x_policy*)data; + struct sadb_x_sec_ctx *sec_ctx; switch (family) { case AF_INET: @@ -2531,10 +2661,22 @@ static struct xfrm_policy *pfkey_compile (*dir = parse_ipsecrequests(xp, pol)) < 0) goto out; + /* security context too */ + if (len >= (pol->sadb_x_policy_len*8 + + sizeof(struct sadb_x_sec_ctx))) { + char *p = (char *) pol; + p += pol->sadb_x_policy_len*8; + sec_ctx = (struct sadb_x_sec_ctx *) p; + if (security_xfrm_policy_alloc( + xp, (struct xfrm_user_sec_ctx *)sec_ctx)) + goto out; + } + *dir = pol->sadb_x_policy_dir-1; return xp; out: + security_xfrm_policy_free(xp); kfree(xp); return NULL; } diff -puN net/xfrm/xfrm_policy.c~lsm-xfrm-nethooks net/xfrm/xfrm_policy.c --- linux-2.6.12-rc6-xfrm/net/xfrm/xfrm_policy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/xfrm/xfrm_policy.c 2005-06-13 13:22:59.000000000 -0400 @@ -10,7 +10,7 @@ * YOSHIFUJI Hideaki * Split up af-specific portion * Derek Atkins Add the post_input processor - * + * */ #include @@ -257,6 +257,7 @@ void __xfrm_policy_destroy(struct xfrm_p if (del_timer(&policy->timer)) BUG(); + security_xfrm_policy_free(policy); kfree(policy); } EXPORT_SYMBOL(__xfrm_policy_destroy); @@ -396,7 +397,8 @@ struct xfrm_policy *xfrm_policy_bysel(in write_lock_bh(&xfrm_policy_lock); for (p = &xfrm_policy_list[dir]; (pol=*p)!=NULL; p = &pol->next) { - if (memcmp(sel, &pol->selector, sizeof(*sel)) == 0) { + if ((memcmp(sel, &pol->selector, xfrm_selector_base_size()) == 0) && + (xfrm_sec_ctx_match(sel->security, xfrm_policy_security(pol)))) { xfrm_pol_hold(pol); if (delete) *p = pol->next; @@ -492,7 +494,7 @@ EXPORT_SYMBOL(xfrm_policy_walk); /* Find policy to apply to this flow. */ -static void xfrm_policy_lookup(struct flowi *fl, u16 family, u8 dir, +static void xfrm_policy_lookup(struct flowi *fl, struct sock *sk, u16 family, u8 dir, void **objp, atomic_t **obj_refp) { struct xfrm_policy *pol; @@ -506,9 +508,12 @@ static void xfrm_policy_lookup(struct fl continue; match = xfrm_selector_match(sel, fl, family); + if (match) { - xfrm_pol_hold(pol); - break; + if (!security_xfrm_policy_lookup(sk, sel, fl, dir)) { + xfrm_pol_hold(pol); + break; + } } } read_unlock_bh(&xfrm_policy_lock); @@ -516,15 +521,38 @@ static void xfrm_policy_lookup(struct fl *obj_refp = &pol->refcnt; } +static inline int policy_to_flow_dir(int dir) +{ + if (XFRM_POLICY_IN == FLOW_DIR_IN && + XFRM_POLICY_OUT == FLOW_DIR_OUT && + XFRM_POLICY_FWD == FLOW_DIR_FWD) + return dir; + switch (dir) { + default: + case XFRM_POLICY_IN: + return FLOW_DIR_IN; + case XFRM_POLICY_OUT: + return FLOW_DIR_OUT; + case XFRM_POLICY_FWD: + return FLOW_DIR_FWD; + }; +} + static struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl) { struct xfrm_policy *pol; read_lock_bh(&xfrm_policy_lock); if ((pol = sk->sk_policy[dir]) != NULL) { - int match = xfrm_selector_match(&pol->selector, fl, + struct xfrm_selector *sel = &pol->selector; + int match = xfrm_selector_match(sel, fl, sk->sk_family); + int err = 0; + if (match) + err = security_xfrm_policy_lookup(sk, sel, fl, policy_to_flow_dir(dir)); + + if (match && !err) xfrm_pol_hold(pol); else pol = NULL; @@ -595,6 +623,10 @@ static struct xfrm_policy *clone_policy( if (newp) { newp->selector = old->selector; + if (security_xfrm_policy_clone(old, newp)) { + kfree(newp); + return NULL; /* ENOMEM */ + } newp->lft = old->lft; newp->curlft = old->curlft; newp->action = old->action; @@ -706,22 +738,6 @@ xfrm_bundle_create(struct xfrm_policy *p return err; } -static inline int policy_to_flow_dir(int dir) -{ - if (XFRM_POLICY_IN == FLOW_DIR_IN && - XFRM_POLICY_OUT == FLOW_DIR_OUT && - XFRM_POLICY_FWD == FLOW_DIR_FWD) - return dir; - switch (dir) { - default: - case XFRM_POLICY_IN: - return FLOW_DIR_IN; - case XFRM_POLICY_OUT: - return FLOW_DIR_OUT; - case XFRM_POLICY_FWD: - return FLOW_DIR_FWD; - }; -} static int stale_bundle(struct dst_entry *dst); @@ -751,7 +767,7 @@ restart: if ((dst_orig->flags & DST_NOXFRM) || !xfrm_policy_list[XFRM_POLICY_OUT]) return 0; - policy = flow_cache_lookup(fl, family, + policy = flow_cache_lookup(fl, sk, family, policy_to_flow_dir(XFRM_POLICY_OUT), xfrm_policy_lookup); } @@ -942,7 +958,7 @@ int __xfrm_policy_check(struct sock *sk, int i; for (i=skb->sp->len-1; i>=0; i--) { - struct sec_decap_state *xvec = &(skb->sp->x[i]); + struct sec_decap_state *xvec = &(skb->sp->x[i]); if (!xfrm_selector_match(&xvec->xvec->sel, &fl, family)) return 0; @@ -960,7 +976,7 @@ int __xfrm_policy_check(struct sock *sk, pol = xfrm_sk_policy_lookup(sk, dir, &fl); if (!pol) - pol = flow_cache_lookup(&fl, family, + pol = flow_cache_lookup(&fl, sk, family, policy_to_flow_dir(dir), xfrm_policy_lookup); diff -puN net/xfrm/xfrm_state.c~lsm-xfrm-nethooks net/xfrm/xfrm_state.c --- linux-2.6.12-rc6-xfrm/net/xfrm/xfrm_state.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/xfrm/xfrm_state.c 2005-06-13 13:22:59.000000000 -0400 @@ -10,7 +10,7 @@ * Split up af-specific functions * Derek Atkins * Add UDP Encapsulation - * + * */ #include @@ -74,6 +74,7 @@ static void xfrm_state_gc_destroy(struct x->type->destructor(x); xfrm_put_type(x->type); } + security_xfrm_state_free(x); kfree(x); } @@ -338,7 +339,8 @@ xfrm_state_find(xfrm_address_t *daddr, x selector. */ if (x->km.state == XFRM_STATE_VALID) { - if (!xfrm_selector_match(&x->sel, fl, family)) + if (!xfrm_selector_match(&x->sel, fl, family) || + !xfrm_sec_ctx_match(xfrm_policy_security(pol), xfrm_state_security(x))) continue; if (!best || best->km.dying > x->km.dying || @@ -349,7 +351,8 @@ xfrm_state_find(xfrm_address_t *daddr, x acquire_in_progress = 1; } else if (x->km.state == XFRM_STATE_ERROR || x->km.state == XFRM_STATE_EXPIRED) { - if (xfrm_selector_match(&x->sel, fl, family)) + if (xfrm_selector_match(&x->sel, fl, family) && + xfrm_sec_ctx_match(xfrm_policy_security(pol), xfrm_state_security(x))) error = -ESRCH; } } @@ -374,6 +377,13 @@ xfrm_state_find(xfrm_address_t *daddr, x xfrm_init_tempsel(x, fl, tmpl, daddr, saddr, family); if (km_query(x, tmpl, pol) == 0) { + if (!xfrm_sec_ctx_match(xfrm_policy_security(pol), xfrm_state_security(x))) { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + x = NULL; + error = -EPERM; + goto out; + } x->km.state = XFRM_STATE_ACQ; list_add_tail(&x->bydst, xfrm_state_bydst+h); xfrm_state_hold(x); diff -puN net/xfrm/xfrm_user.c~lsm-xfrm-nethooks net/xfrm/xfrm_user.c --- linux-2.6.12-rc6-xfrm/net/xfrm/xfrm_user.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/xfrm/xfrm_user.c 2005-06-16 14:39:56.000000000 -0400 @@ -7,7 +7,7 @@ * Kazunori MIYAZAWA @USAGI * Kunihiro Ishiguro * IPv6 support - * + * */ #include @@ -209,6 +209,30 @@ static int attach_encap_tmpl(struct xfrm return 0; } + +static inline int xfrm_user_sec_ctx_size(struct xfrm_policy *xp) +{ + struct xfrm_sec_ctx *xfrm_ctx = xfrm_policy_security(xp); + int len = 0; + + if (xfrm_ctx) { + len += sizeof(struct xfrm_user_sec_ctx); + len += xfrm_ctx->ctx_len; + } + return len; +} + +static int attach_sec_ctx(struct xfrm_state *x, struct rtattr *u_arg) +{ + struct xfrm_user_sec_ctx *uxsc = RTA_DATA(u_arg); + + if (uxsc) { + return security_xfrm_state_alloc(x, uxsc); + } + + return 0; +} + static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) { memcpy(&x->id, &p->id, sizeof(x->id)); @@ -258,6 +282,9 @@ static struct xfrm_state *xfrm_state_con if (err) goto error; + if ((err = attach_sec_ctx(x, xfrma[XFRMA_SEC_CTX-1]))) + goto error; + x->curlft.add_time = (unsigned long) xtime.tv_sec; x->km.state = XFRM_STATE_VALID; x->km.seq = p->seq; @@ -344,6 +371,27 @@ struct xfrm_dump_info { int this_idx; }; +static int dump_one_sec_ctx(struct xfrm_sec_ctx *ctx, struct xfrm_user_sec_ctx *uctx, struct sk_buff *skb, int ctx_size) +{ + if (!ctx) + return -1; + + uctx->exttype = XFRMA_SEC_CTX; + uctx->len = ctx_size; + uctx->ctx_doi = ctx->ctx_doi; + uctx->ctx_alg = ctx->ctx_alg; + uctx->ctx_len = ctx->ctx_len; + + memcpy(uctx + 1, ctx->ctx_str, ctx->ctx_len); + + RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size, uctx); + + return 0; + +rtattr_failure: + return -1; +} + static int dump_one_state(struct xfrm_state *x, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -352,6 +400,7 @@ static int dump_one_state(struct xfrm_st struct xfrm_usersa_info *p; struct nlmsghdr *nlh; unsigned char *b = skb->tail; + struct xfrm_sec_ctx *xfrm_ctx; if (sp->this_idx < sp->start_idx) goto out; @@ -376,6 +425,18 @@ static int dump_one_state(struct xfrm_st if (x->encap) RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap); + if ((xfrm_ctx = xfrm_state_security(x))) { + int ctx_size = sizeof(struct xfrm_user_sec_ctx) + + xfrm_ctx->ctx_len + 1; + struct xfrm_user_sec_ctx *uctx = kmalloc(ctx_size, GFP_KERNEL); + int err; + + err = dump_one_sec_ctx(xfrm_ctx, uctx, skb, ctx_size); + kfree(uctx); + + if (err < 0) + goto rtattr_failure; + } nlh->nlmsg_len = skb->tail - b; out: sp->this_idx++; @@ -589,6 +650,25 @@ static int verify_newpolicy_info(struct return verify_policy_dir(p->dir); } +static int copy_sec_ctx(struct xfrm_policy *pol, struct xfrm_user_sec_ctx *uctx) +{ + int err = 0; + + if (uctx) { + err = security_xfrm_policy_alloc(pol, uctx); + } + + return err; +} + +static int copy_from_user_sec_ctx(struct xfrm_policy *pol, struct rtattr **xfrma) +{ + struct rtattr *rt = xfrma[XFRMA_SEC_CTX-1]; + struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); + + return copy_sec_ctx(pol, uctx); +} + static void copy_templates(struct xfrm_policy *xp, struct xfrm_user_tmpl *ut, int nr) { @@ -667,7 +747,10 @@ static struct xfrm_policy *xfrm_policy_c } copy_from_user_policy(xp, p); - err = copy_from_user_tmpl(xp, xfrma); + + if (!(err = copy_from_user_tmpl(xp, xfrma))) + err = copy_from_user_sec_ctx(xp, xfrma); + if (err) { *errp = err; kfree(xp); @@ -737,6 +820,27 @@ rtattr_failure: return -1; } +static int copy_to_user_sec_ctx(struct xfrm_policy *xp, struct sk_buff *skb, int src) +{ + int err = 0; + struct xfrm_sec_ctx *xfrm_ctx = xfrm_policy_security(xp); + + if (xfrm_ctx) { + int ctx_size = sizeof(struct xfrm_user_sec_ctx) + + xfrm_ctx->ctx_len; + struct xfrm_user_sec_ctx *uctx = kmalloc(ctx_size, GFP_KERNEL); + + if (!uctx) + return -ENOMEM; + + err = dump_one_sec_ctx(xfrm_ctx, uctx, skb, + ctx_size); + kfree(uctx); + } + + return err; +} + static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -758,6 +862,8 @@ static int dump_one_policy(struct xfrm_p copy_to_user_policy(xp, p, dir); if (copy_to_user_tmpl(xp, skb) < 0) goto nlmsg_failure; + if (copy_to_user_sec_ctx(xp, skb, 0) < 0) + goto nlmsg_failure; nlh->nlmsg_len = skb->tail - b; out: @@ -813,7 +919,7 @@ static struct sk_buff *xfrm_policy_netli static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { - struct xfrm_policy *xp; + struct xfrm_policy *xp, tmp; struct xfrm_userpolicy_id *p; int err; int delete; @@ -827,8 +933,20 @@ static int xfrm_get_policy(struct sk_buf if (p->index) xp = xfrm_policy_byid(p->dir, p->index, delete); - else - xp = xfrm_policy_bysel(p->dir, &p->sel, delete); + else { + struct rtattr **rtattrs = (struct rtattr **) xfrma; + struct rtattr *rt = rtattrs[XFRMA_SEC_CTX-1]; + + memcpy(&tmp.selector, &p->sel, sizeof(struct xfrm_selector)); + if (rt) { + struct xfrm_user_sec_ctx *uxsc = RTA_DATA(rt); + + if ((err = security_xfrm_policy_alloc(&tmp, uxsc))) + return err; + } + xp = xfrm_policy_bysel(p->dir, &tmp.selector, delete); + security_xfrm_policy_free(&tmp); + } if (xp == NULL) return -ENOENT; @@ -1110,6 +1228,8 @@ static int build_acquire(struct sk_buff if (copy_to_user_tmpl(xp, skb) < 0) goto nlmsg_failure; + if (copy_to_user_sec_ctx(xp, skb, 1) < 0) + goto nlmsg_failure; nlh->nlmsg_len = skb->tail - b; return skb->len; @@ -1127,6 +1247,7 @@ static int xfrm_send_acquire(struct xfrm len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_acquire)); + len += RTA_SPACE(xfrm_user_sec_ctx_size(xp)); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1147,8 +1268,9 @@ static struct xfrm_policy *xfrm_compile_ { struct xfrm_userpolicy_info *p = (struct xfrm_userpolicy_info *)data; struct xfrm_user_tmpl *ut = (struct xfrm_user_tmpl *) (p + 1); + struct xfrm_user_sec_ctx *uctx; struct xfrm_policy *xp; - int nr; + int nr = 0; switch (family) { case AF_INET: @@ -1176,9 +1298,26 @@ static struct xfrm_policy *xfrm_compile_ verify_newpolicy_info(p)) return NULL; + if (len > (sizeof(*p) + (XFRM_MAX_DEPTH * + sizeof(struct xfrm_user_tmpl)))) { + struct xfrm_user_tmpl *tmpl; + uctx = (struct xfrm_user_sec_ctx *) (ut + XFRM_MAX_DEPTH); + + if (len != sizeof(*p) + + (XFRM_MAX_DEPTH * sizeof(struct xfrm_user_tmpl)) + + uctx->len) + return NULL; + + /* spi must be zero'd unless real tmpl */ + for (tmpl = ut; tmpl->id.spi != 0; tmpl = tmpl + 1) + nr++; + } + else { + uctx = NULL; nr = ((len - sizeof(*p)) / sizeof(*ut)); if (nr > XFRM_MAX_DEPTH) return NULL; + } xp = xfrm_policy_alloc(GFP_KERNEL); if (xp == NULL) { @@ -1188,6 +1327,10 @@ static struct xfrm_policy *xfrm_compile_ copy_from_user_policy(xp, p); copy_templates(xp, ut, nr); + if (copy_sec_ctx(xp, uctx)) { + *dir = -EPERM; + return NULL; + } *dir = p->dir; @@ -1208,6 +1351,8 @@ static int build_polexpire(struct sk_buf copy_to_user_policy(xp, &upe->pol, dir); if (copy_to_user_tmpl(xp, skb) < 0) goto nlmsg_failure; + if (copy_to_user_sec_ctx(xp, skb, 2) < 0) + goto nlmsg_failure; upe->hard = !!hard; nlh->nlmsg_len = skb->tail - b; @@ -1225,6 +1370,7 @@ static int xfrm_send_policy_notify(struc len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_polexpire)); + len += xfrm_user_sec_ctx_size(xp); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; diff -puN security/dummy.c~lsm-xfrm-nethooks security/dummy.c --- linux-2.6.12-rc6-xfrm/security/dummy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/security/dummy.c 2005-06-13 13:22:59.000000000 -0400 @@ -811,6 +811,35 @@ static inline void dummy_sk_free_securit } #endif /* CONFIG_SECURITY_NETWORK */ +#ifdef CONFIG_SECURITY_NETWORK_XFRM +static int dummy_xfrm_policy_alloc_security(struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static inline int dummy_xfrm_policy_clone_security(struct xfrm_policy *old, struct xfrm_policy *new) +{ + return 0; +} + +static void dummy_xfrm_policy_free_security(struct xfrm_policy *xp) +{ +} + +static int dummy_xfrm_state_alloc_security(struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static void dummy_xfrm_state_free_security(struct xfrm_state *x) +{ +} + +static int dummy_xfrm_policy_lookup(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir) +{ + return 0; +} +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ static int dummy_register_security (const char *name, struct security_operations *ops) { return -EINVAL; @@ -992,5 +1021,13 @@ void security_fixup_ops (struct security set_to_dummy_if_null(ops, sk_alloc_security); set_to_dummy_if_null(ops, sk_free_security); #endif /* CONFIG_SECURITY_NETWORK */ +#ifdef CONFIG_SECURITY_NETWORK_XFRM + set_to_dummy_if_null(ops, xfrm_policy_alloc_security); + set_to_dummy_if_null(ops, xfrm_policy_clone_security); + set_to_dummy_if_null(ops, xfrm_policy_free_security); + set_to_dummy_if_null(ops, xfrm_state_alloc_security); + set_to_dummy_if_null(ops, xfrm_state_free_security); + set_to_dummy_if_null(ops, xfrm_policy_lookup); +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ } diff -puN security/Kconfig~lsm-xfrm-nethooks security/Kconfig --- linux-2.6.12-rc6-xfrm/security/Kconfig~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/security/Kconfig 2005-06-13 13:22:59.000000000 -0400 @@ -53,6 +53,19 @@ config SECURITY_NETWORK implement socket and networking access controls. If you are unsure how to answer this question, answer N. +config SECURITY_NETWORK_XFRM + bool "XFRM (IPSec) Networking Security Hooks" + depends on XFRM && SECURITY_NETWORK + help + This enables the XFRM (IPSec) networking security hooks. + If enabled, a security module can use these hooks to + implement per-packet access controls based on labels + derived from IPSec policy. Non-IPSec communications are + designated as unlabelled, and only sockets authorized + to communicate unlabelled data can send without using + IPSec. + If you are unsure how to answer this question, answer N. + config SECURITY_CAPABILITIES tristate "Default Linux Capabilities" depends on SECURITY _ From jaegert@us.ibm.com Fri Jun 17 08:19:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 08:19:35 -0700 (PDT) Received: from igw2.watson.ibm.com (igw2.watson.ibm.com [129.34.20.6]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HFJKH9001166 for ; Fri, 17 Jun 2005 08:19:26 -0700 Received: from sp1n294en1.watson.ibm.com (sp1n294en1.watson.ibm.com [129.34.20.40]) by igw2.watson.ibm.com (8.13.1/8.13.1/8.13.1-2005-04-25 igw) with ESMTP id j5HFJDBx019736; Fri, 17 Jun 2005 11:19:13 -0400 Received: from sp1n294en1.watson.ibm.com (localhost [127.0.0.1]) by sp1n294en1.watson.ibm.com (8.11.7-20030924/8.11.7/01-14-2004_2) with ESMTP id j5HFHop642678; Fri, 17 Jun 2005 11:17:50 -0400 Received: from [9.2.18.177] (dyn9002018177.watson.ibm.com [9.2.18.177]) by sp1n294en1.watson.ibm.com (8.11.7-20030924/8.11.7/01-14-2004_1) with ESMTP id j5HFHo3589964; Fri, 17 Jun 2005 11:17:50 -0400 Subject: [PATCH 1/2] Resend (Update): LSM-IPSec Networking Hooks From: jaegert To: netdev@oss.sgi.com, chrisw@osdl.org Content-Type: text/plain Message-Id: <1119017598.30404.2765.camel@dyn9002018177.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) Date: Fri, 17 Jun 2005 10:13:18 -0400 Content-Transfer-Encoding: 7bit X-archive-position: 2448 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jaegert@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 50083 Lines: 1463 Resend of patch update of this morning. The formatting of the email was incorrect for a patch. I apologize for the oversight. Regards, Trent. ============================================================= This patch series implements per packet access control via the extension of the Linux Security Modules (LSM) interface by hooks in the XFRM and pfkey subsystems that leverage IPSec security associations to label packets. Extensions to the SELinux LSM are included that leverage the patch for this purpose. This patch implements the changes necessary to the XFRM subsystem, pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a socket to use only authorized security associations (or no security association) to send/receive network packets. Patch purpose: The patch is designed to enable access control per packets based on the strongly authenticated IPSec security association. Such access controls augment the existing ones based on network interface and IP address. The former are very coarse-grained, and the latter can be spoofed. By using IPSec, the system can control access to remote hosts based on cryptographic keys generated using the IPSec mechanism. This enables access control on a per-machine basis or per-application if the remote machine is running the same mechanism and trusted to enforce the access control policy. Patch design approach: The overall approach is that policy (xfrm_policy) entries set by user-level programs (e.g., setkey for ipsec-tools) are extended with a security context that is used at policy selection time in the XFRM subsystem to restrict the sockets that can send/receive packets via security associations (xfrm_states) that are built from those policies. A presentation available at www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf from the SELinux symposium describes the overall approach. Patch implementation details: On output, the policy retrieved (via xfrm_policy_lookup or xfrm_sk_policy_lookup) must be authorized for the security context of the socket and the same security context is required for resultant security association (retrieved or negotiated via racoon in ipsec-tools). This is enforced in xfrm_state_find. On input, the policy retrieved must also be authorized for the socket (at __xfrm_policy_check), and the security context of the policy must also match the security association being used. The patch has virtually no impact on packets that do not use IPSec. The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as before. Also, if IPSec is used without security contexts, the impact is minimal. The LSM must allow such policies to be selected for the combination of socket and remote machine, but subsequent IPSec processing proceeds as in the original case. Testing: The pfkey interface is tested using the ipsec-tools. ipsec-tools have been modified (a separate ipsec-tools patch is available for version 0.5) that supports assignment of xfrm_policy entries and security associations with security contexts via setkey and the negotiation using the security contexts via racoon. The xfrm_user interface is tested via ad hoc programs that set security contexts. These programs are also available from me, and contain programs for setting, getting, and deleting policy for testing this interface. Testing of sa functions was done by tracing kernel behavior. --- include/linux/pfkeyv2.h | 13 +++ include/linux/security.h | 119 +++++++++++++++++++++++++++++++++++ include/linux/xfrm.h | 36 ++++++++++ include/net/flow.h | 5 - include/net/xfrm.h | 21 ++++++ net/core/flow.c | 4 - net/ipv4/xfrm4_policy.c | 2 net/ipv6/xfrm6_policy.c | 2 net/key/af_key.c | 150 +++++++++++++++++++++++++++++++++++++++++++- net/xfrm/xfrm_policy.c | 66 ++++++++++++------- net/xfrm/xfrm_state.c | 16 +++- net/xfrm/xfrm_user.c | 158 +++++++++++++++++++++++++++++++++++++++++++++-- security/Kconfig | 13 +++ security/dummy.c | 37 +++++++++++ 14 files changed, 599 insertions(+), 43 deletions(-) diff -puN include/linux/pfkeyv2.h~lsm-xfrm-nethooks include/linux/pfkeyv2.h --- linux-2.6.12-rc6-xfrm/include/linux/pfkeyv2.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/linux/pfkeyv2.h 2005-06-13 13:22:59.000000000 -0400 @@ -216,6 +216,16 @@ struct sadb_x_nat_t_port { } __attribute__((packed)); /* sizeof(struct sadb_x_nat_t_port) == 8 */ +/* Generic LSM security context */ +struct sadb_x_sec_ctx { + uint16_t sadb_x_sec_len; + uint16_t sadb_x_sec_exttype; + uint8_t sadb_x_ctx_alg; /* LSMs: e.g., selinux == 1 */ + uint8_t sadb_x_ctx_doi; + uint16_t sadb_x_ctx_len; +} __attribute__((packed)); +/* sizeof(struct sadb_sec_ctx) = 8 */ + /* Message types */ #define SADB_RESERVED 0 #define SADB_GETSPI 1 @@ -324,7 +334,8 @@ struct sadb_x_nat_t_port { #define SADB_X_EXT_NAT_T_SPORT 21 #define SADB_X_EXT_NAT_T_DPORT 22 #define SADB_X_EXT_NAT_T_OA 23 -#define SADB_EXT_MAX 23 +#define SADB_X_EXT_SEC_CTX 24 +#define SADB_EXT_MAX 24 /* Identity Extension values */ #define SADB_IDENTTYPE_RESERVED 0 diff -puN include/linux/security.h~lsm-xfrm-nethooks include/linux/security.h --- linux-2.6.12-rc6-xfrm/include/linux/security.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/linux/security.h 2005-06-13 13:22:59.000000000 -0400 @@ -58,6 +58,12 @@ struct sk_buff; struct sock; struct sockaddr; struct socket; +struct flowi; +struct dst_entry; +struct xfrm_selector; +struct xfrm_policy; +struct xfrm_state; +struct xfrm_user_sec_ctx; extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb); extern int cap_netlink_recv(struct sk_buff *skb); @@ -802,6 +808,50 @@ struct swap_info_struct; * @sk_free_security: * Deallocate security structure. * + * Security hooks for XFRM operations. + * + * @xfrm_policy_alloc_security: + * @xp contains the xfrm_policy being added to Security Policy Database + * used by the XFRM system. + * @sec_ctx contains the security context information being provided by + * the user-level policy update program (e.g., setkey). + * Allocate a security structure to the xp->selector.security field. + * The security field is initialized to NULL when the xfrm_policy is + * allocated. + * Return 0 if operation was successful (memory to allocate, legal context) + * @xfrm_policy_clone_security: + * @old contains an existing xfrm_policy in the SPD. + * @new contains a new xfrm_policy being cloned from old. + * Allocate a security structure to the new->selector.security field + * that contains the information from the old->selector.security field. + * Return 0 if operation was successful (memory to allocate). + * @xfrm_policy_free_security: + * @xp contains the xfrm_policy + * Deallocate xp->selector.security. + * @xfrm_state_alloc_security: + * @x contains the xfrm_state being added to the Security Association + * Database by the XFRM system. + * @sec_ctx contains the security context information being provided by + * the user-level SA generation program (e.g., setkey or racoon). + * Allocate a security structure to the x->sel.security field. The + * security field is initialized to NULL when the xfrm_state is + * allocated. + * Return 0 if operation was successful (memory to allocate, legal context). + * @xfrm_state_free_security: + * @x contains the xfrm_state. + * Deallocate x>sel.security. + * @xfrm_policy_lookup: + * @sk contains the sock that is requesting to either send or receive a + * network communication. + * @sel contains the selector that matches the communication end points of + * the network communication (source, destination, and ports). + * @fl contains the flowi that indicates the communication protocol. + * @dir contains the direction of the flow (input or output). + * Check permission when a sock selects a xfrm_policy for processing + * XFRMs on a packet. The hook is called when selecting either a + * per-socket policy or a generic xfrm policy. + * Return 0 if permission is granted. + * * Security hooks affecting all System V IPC operations. * * @ipc_permission: @@ -1243,6 +1293,15 @@ struct security_operations { int (*sk_alloc_security) (struct sock *sk, int family, int priority); void (*sk_free_security) (struct sock *sk); #endif /* CONFIG_SECURITY_NETWORK */ + +#ifdef CONFIG_SECURITY_NETWORK_XFRM + int (*xfrm_policy_alloc_security) (struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx); + int (*xfrm_policy_clone_security) (struct xfrm_policy *old, struct xfrm_policy *new); + void (*xfrm_policy_free_security) (struct xfrm_policy *xp); + int (*xfrm_state_alloc_security) (struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx); + void (*xfrm_state_free_security) (struct xfrm_state *x); + int (*xfrm_policy_lookup)(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir); +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ }; /* global variables */ @@ -2854,5 +2913,65 @@ static inline void security_sk_free(stru } #endif /* CONFIG_SECURITY_NETWORK */ +#ifdef CONFIG_SECURITY_NETWORK_XFRM +static inline int security_xfrm_policy_alloc(struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx) +{ + return security_ops->xfrm_policy_alloc_security(xp, sec_ctx); +} + +static inline int security_xfrm_policy_clone(struct xfrm_policy *old, struct xfrm_policy *new) +{ + return security_ops->xfrm_policy_clone_security(old, new); +} + +static inline void security_xfrm_policy_free(struct xfrm_policy *xp) +{ + security_ops->xfrm_policy_free_security(xp); +} + +static inline int security_xfrm_state_alloc(struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx) +{ + return security_ops->xfrm_state_alloc_security(x, sec_ctx); +} + +static inline void security_xfrm_state_free(struct xfrm_state *x) +{ + security_ops->xfrm_state_free_security(x); +} + +static inline int security_xfrm_policy_lookup(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir) +{ + return security_ops->xfrm_policy_lookup(sk, sel, fl, dir); +} +#else /* CONFIG_SECURITY_NETWORK_XFRM */ +static inline int security_xfrm_policy_alloc(struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static inline int security_xfrm_policy_clone(struct xfrm_policy *old, struct xfrm_policy *new) +{ + return 0; +} + +static inline void security_xfrm_policy_free(struct xfrm_policy *xp) +{ +} + +static inline int security_xfrm_state_alloc(struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static inline void security_xfrm_state_free(struct xfrm_state *x) +{ +} + +static inline int security_xfrm_policy_lookup(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir) +{ + return 0; +} +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ + #endif /* ! __LINUX_SECURITY_H */ diff -puN include/linux/xfrm.h~lsm-xfrm-nethooks include/linux/xfrm.h --- linux-2.6.12-rc6-xfrm/include/linux/xfrm.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/linux/xfrm.h 2005-06-13 13:22:59.000000000 -0400 @@ -27,6 +27,22 @@ struct xfrm_id __u8 proto; }; +struct xfrm_sec_ctx { + __u8 ctx_doi; + __u8 ctx_alg; + __u16 ctx_len; + __u32 ctx_sid; + char ctx_str[0]; +}; + +/* Security Context Domains of Interpretation */ +#define XFRM_SC_DOI_RESERVED 0 +#define XFRM_SC_DOI_LSM 1 + +/* Security Context Algorithms */ +#define XFRM_SC_ALG_RESERVED 0 +#define XFRM_SC_ALG_SELINUX 1 + /* Selector, used as selector both on policy rules (SPD) and SAs. */ struct xfrm_selector @@ -43,8 +59,15 @@ struct xfrm_selector __u8 proto; int ifindex; uid_t user; + struct xfrm_sec_ctx *security; }; +/* All but the security field */ +static inline int xfrm_selector_base_size(void) +{ + return sizeof(struct xfrm_selector) - sizeof(struct xfrm_sec_ctx *); +} + #define XFRM_INF (~(__u64)0) struct xfrm_lifetime_cfg @@ -146,6 +169,18 @@ enum { #define XFRM_NR_MSGTYPES (XFRM_MSG_MAX + 1 - XFRM_MSG_BASE) +/* + * Generic LSM security context for comunicating to user space + * NOTE: Same format as sadb_x_sec_ctx + */ +struct xfrm_user_sec_ctx { + __u16 len; + __u16 exttype; + __u8 ctx_alg; /* LSMs: e.g., selinux == 1 */ + __u8 ctx_doi; + __u16 ctx_len; +}; + struct xfrm_user_tmpl { struct xfrm_id id; __u16 family; @@ -173,6 +208,7 @@ enum xfrm_attr_type_t { XFRMA_ALG_CRYPT, /* struct xfrm_algo */ XFRMA_ALG_COMP, /* struct xfrm_algo */ XFRMA_ENCAP, /* struct xfrm_algo + struct xfrm_encap_tmpl */ + XFRMA_SEC_CTX, /* struct xfrm_sec_ctx */ XFRMA_TMPL, /* 1 or more struct xfrm_user_tmpl */ __XFRMA_MAX diff -puN include/net/flow.h~lsm-xfrm-nethooks include/net/flow.h --- linux-2.6.12-rc6-xfrm/include/net/flow.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/net/flow.h 2005-06-13 13:22:59.000000000 -0400 @@ -84,10 +84,11 @@ struct flowi { #define FLOW_DIR_OUT 1 #define FLOW_DIR_FWD 2 -typedef void (*flow_resolve_t)(struct flowi *key, u16 family, u8 dir, +struct sock; +typedef void (*flow_resolve_t)(struct flowi *key, struct sock *sk, u16 family, u8 dir, void **objp, atomic_t **obj_refp); -extern void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir, +extern void *flow_cache_lookup(struct flowi *key, struct sock *sk, u16 family, u8 dir, flow_resolve_t resolver); extern void flow_cache_flush(void); extern atomic_t flow_cache_genid; diff -puN include/net/xfrm.h~lsm-xfrm-nethooks include/net/xfrm.h --- linux-2.6.12-rc6-xfrm/include/net/xfrm.h~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/include/net/xfrm.h 2005-06-13 13:22:59.000000000 -0400 @@ -493,6 +493,27 @@ xfrm_selector_match(struct xfrm_selector return 0; } +/* If neither has a context --> match + Otherwise, both must have a context and the sids, doi, alg must match */ +static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct xfrm_sec_ctx *s2) +{ + return ((!s1 && !s2) || + (s1 && s2 && + (s1->ctx_sid == s2->ctx_sid) && + (s1->ctx_doi == s2->ctx_doi) && + (s1->ctx_alg == s2->ctx_alg))); +} + +static inline struct xfrm_sec_ctx *xfrm_policy_security(struct xfrm_policy *xp) +{ + return (xp ? xp->selector.security : NULL); +} + +static inline struct xfrm_sec_ctx *xfrm_state_security(struct xfrm_state *x) +{ + return (x ? x->sel.security : NULL); +} + /* A struct encoding bundle of transformations to apply to some set of flow. * * dst->child points to the next element of bundle. diff -puN net/core/flow.c~lsm-xfrm-nethooks net/core/flow.c --- linux-2.6.12-rc6-xfrm/net/core/flow.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/core/flow.c 2005-06-13 13:22:59.000000000 -0400 @@ -162,7 +162,7 @@ static int flow_key_compare(struct flowi return 0; } -void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir, +void *flow_cache_lookup(struct flowi *key, struct sock *sk, u16 family, u8 dir, flow_resolve_t resolver) { struct flow_cache_entry *fle, **head; @@ -221,7 +221,7 @@ nocache: void *obj; atomic_t *obj_ref; - resolver(key, family, dir, &obj, &obj_ref); + resolver(key, sk, family, dir, &obj, &obj_ref); if (fle) { fle->genid = atomic_read(&flow_cache_genid); diff -puN net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks net/ipv4/xfrm4_policy.c --- linux-2.6.12-rc6-xfrm/net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/ipv4/xfrm4_policy.c 2005-06-13 13:22:59.000000000 -0400 @@ -36,6 +36,8 @@ __xfrm4_find_bundle(struct flowi *fl, st if (xdst->u.rt.fl.oif == fl->oif && /*XXX*/ xdst->u.rt.fl.fl4_dst == fl->fl4_dst && xdst->u.rt.fl.fl4_src == fl->fl4_src && + xfrm_sec_ctx_match(xfrm_policy_security(policy), + xfrm_state_security(dst->xfrm)) && xfrm_bundle_ok(xdst, fl, AF_INET)) { dst_clone(dst); break; diff -puN net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks net/ipv6/xfrm6_policy.c --- linux-2.6.12-rc6-xfrm/net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/ipv6/xfrm6_policy.c 2005-06-13 13:22:59.000000000 -0400 @@ -54,6 +54,8 @@ __xfrm6_find_bundle(struct flowi *fl, st xdst->u.rt6.rt6i_src.plen); if (ipv6_addr_equal(&xdst->u.rt6.rt6i_dst.addr, &fl_dst_prefix) && ipv6_addr_equal(&xdst->u.rt6.rt6i_src.addr, &fl_src_prefix) && + xfrm_sec_ctx_match(xfrm_policy_security(policy), + xfrm_state_security(dst->xfrm)) && xfrm_bundle_ok(xdst, fl, AF_INET6)) { dst_clone(dst); break; diff -puN net/key/af_key.c~lsm-xfrm-nethooks net/key/af_key.c --- linux-2.6.12-rc6-xfrm/net/key/af_key.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/key/af_key.c 2005-06-16 14:48:27.000000000 -0400 @@ -336,6 +336,7 @@ static u8 sadb_ext_min_len[] = { [SADB_X_EXT_NAT_T_SPORT] = (u8) sizeof(struct sadb_x_nat_t_port), [SADB_X_EXT_NAT_T_DPORT] = (u8) sizeof(struct sadb_x_nat_t_port), [SADB_X_EXT_NAT_T_OA] = (u8) sizeof(struct sadb_address), + [SADB_X_EXT_SEC_CTX] = (u8) sizeof(struct sadb_x_sec_ctx), }; /* Verify sadb_address_{len,prefixlen} against sa_family. */ @@ -383,6 +384,40 @@ static int verify_address_len(void *p) return 0; } +static inline int verify_sec_ctx_len(void *p) +{ + struct sadb_x_sec_ctx *sec_ctx = (struct sadb_x_sec_ctx *)p; + int len = 0; + + len += sizeof(struct sadb_x_sec_ctx); + len += sec_ctx->sadb_x_ctx_len; + len += sizeof(uint64_t) - 1; + len /= sizeof(uint64_t); + + if (sec_ctx->sadb_x_sec_len != len) + return -EINVAL; + + return 0; +} + +static inline struct xfrm_user_sec_ctx *pfkey_sadb2xfrm_user_ctx(struct sadb_x_sec_ctx *sec_ctx) +{ + struct xfrm_user_sec_ctx *uctx = NULL; + + if (sec_ctx) { + int ctx_size = sec_ctx->sadb_x_ctx_len; + uctx = kmalloc((sizeof(*uctx)+ctx_size), GFP_KERNEL); + uctx->len = sec_ctx->sadb_x_sec_len; + uctx->exttype = sec_ctx->sadb_x_sec_exttype; + uctx->ctx_doi = sec_ctx->sadb_x_ctx_doi; + uctx->ctx_alg = sec_ctx->sadb_x_ctx_alg; + uctx->ctx_len = sec_ctx->sadb_x_ctx_len; + memcpy(uctx + 1, sec_ctx + 1, + uctx->ctx_len); + } + return uctx; +} + static int present_and_same_family(struct sadb_address *src, struct sadb_address *dst) { @@ -438,6 +473,10 @@ static int parse_exthdrs(struct sk_buff if (verify_address_len(p)) return -EINVAL; } + if (ext_type == SADB_X_EXT_SEC_CTX) { + if (verify_sec_ctx_len(p)) + return -EINVAL; + } ext_hdrs[ext_type-1] = p; } p += ext_len; @@ -586,6 +625,9 @@ static struct sk_buff * pfkey_xfrm_state struct sadb_key *key; struct sadb_x_sa2 *sa2; struct sockaddr_in *sin; + struct sadb_x_sec_ctx *sec_ctx; + struct xfrm_sec_ctx *xfrm_ctx; + int ctx_size = 0; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) struct sockaddr_in6 *sin6; #endif @@ -609,6 +651,12 @@ static struct sk_buff * pfkey_xfrm_state sizeof(struct sadb_address)*2 + sockaddr_size*2 + sizeof(struct sadb_x_sa2); + + if ((xfrm_ctx = xfrm_state_security(x))) { + ctx_size = PFKEY_ALIGN8(xfrm_ctx->ctx_len); + size += sizeof(struct sadb_x_sec_ctx) + ctx_size; + } + /* identity & sensitivity */ if ((x->props.family == AF_INET && @@ -892,6 +940,20 @@ static struct sk_buff * pfkey_xfrm_state n_port->sadb_x_nat_t_port_reserved = 0; } + /* security context */ + if (xfrm_ctx) { + sec_ctx = (struct sadb_x_sec_ctx *) skb_put(skb, + sizeof(struct sadb_x_sec_ctx) + ctx_size); + sec_ctx->sadb_x_sec_len = + (sizeof(struct sadb_x_sec_ctx) + ctx_size) / sizeof(uint64_t); + sec_ctx->sadb_x_sec_exttype = SADB_X_EXT_SEC_CTX; + sec_ctx->sadb_x_ctx_doi = xfrm_ctx->ctx_doi; + sec_ctx->sadb_x_ctx_alg = xfrm_ctx->ctx_alg; + sec_ctx->sadb_x_ctx_len = xfrm_ctx->ctx_len; + memcpy(sec_ctx + 1, xfrm_ctx->ctx_str, + xfrm_ctx->ctx_len); + } + return skb; } @@ -902,6 +964,7 @@ static struct xfrm_state * pfkey_msg2xfr struct sadb_lifetime *lifetime; struct sadb_sa *sa; struct sadb_key *key; + struct sadb_x_sec_ctx *sec_ctx; uint16_t proto; int err; @@ -984,6 +1047,17 @@ static struct xfrm_state * pfkey_msg2xfr x->lft.soft_add_expires_seconds = lifetime->sadb_lifetime_addtime; x->lft.soft_use_expires_seconds = lifetime->sadb_lifetime_usetime; } + + sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1]; + if (sec_ctx != NULL) { + struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_ctx(sec_ctx); + + err = security_xfrm_state_alloc(x, uctx); + kfree(uctx); + if (err) + goto out; + } + key = (struct sadb_key*) ext_hdrs[SADB_EXT_KEY_AUTH-1]; if (sa->sadb_sa_auth) { int keysize = 0; @@ -1634,6 +1708,18 @@ parse_ipsecrequests(struct xfrm_policy * return 0; } +static inline int pfkey_xfrm_policy2sec_ctx_size(struct xfrm_policy *xp) +{ + struct xfrm_sec_ctx *xfrm_ctx = xfrm_policy_security(xp); + + if (xfrm_ctx) { + int len = sizeof(struct sadb_x_sec_ctx); + len += xfrm_ctx->ctx_len; + return PFKEY_ALIGN8(len); + } + return 0; +} + static int pfkey_xfrm_policy2msg_size(struct xfrm_policy *xp) { int sockaddr_size = pfkey_sockaddr_size(xp->family); @@ -1647,7 +1733,8 @@ static int pfkey_xfrm_policy2msg_size(st (sockaddr_size * 2) + sizeof(struct sadb_x_policy) + (xp->xfrm_nr * (sizeof(struct sadb_x_ipsecrequest) + - (socklen * 2))); + (socklen * 2))) + + pfkey_xfrm_policy2sec_ctx_size(xp); } static struct sk_buff * pfkey_xfrm_policy2msg_prep(struct xfrm_policy *xp) @@ -1671,6 +1758,8 @@ static void pfkey_xfrm_policy2msg(struct struct sadb_lifetime *lifetime; struct sadb_x_policy *pol; struct sockaddr_in *sin; + struct sadb_x_sec_ctx *sec_ctx; + struct xfrm_sec_ctx *xfrm_ctx; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) struct sockaddr_in6 *sin6; #endif @@ -1855,19 +1944,35 @@ static void pfkey_xfrm_policy2msg(struct } } } + + /* security context */ + if ((xfrm_ctx = xfrm_policy_security(xp))) { + int ctx_size = pfkey_xfrm_policy2sec_ctx_size(xp); + + sec_ctx = (struct sadb_x_sec_ctx *) skb_put(skb, ctx_size); + sec_ctx->sadb_x_sec_len = ctx_size / sizeof(uint64_t); + sec_ctx->sadb_x_sec_exttype = SADB_X_EXT_SEC_CTX; + sec_ctx->sadb_x_ctx_doi = xfrm_ctx->ctx_doi; + sec_ctx->sadb_x_ctx_alg = xfrm_ctx->ctx_alg; + sec_ctx->sadb_x_ctx_len = xfrm_ctx->ctx_len; + memcpy(sec_ctx + 1, xfrm_ctx->ctx_str, + xfrm_ctx->ctx_len); + } + hdr->sadb_msg_len = size / sizeof(uint64_t); hdr->sadb_msg_reserved = atomic_read(&xp->refcnt); } static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs) { - int err; + int err = 0; struct sadb_lifetime *lifetime; struct sadb_address *sa; struct sadb_x_policy *pol; struct xfrm_policy *xp; struct sk_buff *out_skb; struct sadb_msg *out_hdr; + struct sadb_x_sec_ctx *sec_ctx; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -1914,6 +2019,18 @@ static int pfkey_spdadd(struct sock *sk, if (xp->selector.dport) xp->selector.dport_mask = ~0; + sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1]; + if (sec_ctx != NULL) { + struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_ctx(sec_ctx); + + err = security_xfrm_policy_alloc(xp, uctx); + kfree(uctx); + if (err) { + err = -EINVAL; + goto out; + } + } + xp->lft.soft_byte_limit = XFRM_INF; xp->lft.hard_byte_limit = XFRM_INF; xp->lft.soft_packet_limit = XFRM_INF; @@ -1963,6 +2080,7 @@ static int pfkey_spdadd(struct sock *sk, return 0; out: + security_xfrm_policy_free(xp); kfree(xp); return err; } @@ -1972,10 +2090,11 @@ static int pfkey_spddelete(struct sock * int err; struct sadb_address *sa; struct sadb_x_policy *pol; - struct xfrm_policy *xp; + struct xfrm_policy *xp, tmp; struct sk_buff *out_skb; struct sadb_msg *out_hdr; struct xfrm_selector sel; + struct sadb_x_sec_ctx *sec_ctx; if (!present_and_same_family(ext_hdrs[SADB_EXT_ADDRESS_SRC-1], ext_hdrs[SADB_EXT_ADDRESS_DST-1]) || @@ -2004,7 +2123,17 @@ static int pfkey_spddelete(struct sock * if (sel.dport) sel.dport_mask = ~0; - xp = xfrm_policy_bysel(pol->sadb_x_policy_dir-1, &sel, 1); + sec_ctx = (struct sadb_x_sec_ctx *) ext_hdrs[SADB_X_EXT_SEC_CTX-1]; + memcpy(&tmp.selector, &sel, sizeof(struct xfrm_selector)); + if (sec_ctx != NULL) { + err = security_xfrm_policy_alloc( + &tmp, (struct xfrm_user_sec_ctx *)sec_ctx); + if (err) + return err; + } + + xp = xfrm_policy_bysel(pol->sadb_x_policy_dir-1, &tmp.selector, 1); + security_xfrm_policy_free(&tmp); if (xp == NULL) return -ENOENT; @@ -2482,6 +2611,7 @@ static struct xfrm_policy *pfkey_compile { struct xfrm_policy *xp; struct sadb_x_policy *pol = (struct sadb_x_policy*)data; + struct sadb_x_sec_ctx *sec_ctx; switch (family) { case AF_INET: @@ -2531,10 +2661,22 @@ static struct xfrm_policy *pfkey_compile (*dir = parse_ipsecrequests(xp, pol)) < 0) goto out; + /* security context too */ + if (len >= (pol->sadb_x_policy_len*8 + + sizeof(struct sadb_x_sec_ctx))) { + char *p = (char *) pol; + p += pol->sadb_x_policy_len*8; + sec_ctx = (struct sadb_x_sec_ctx *) p; + if (security_xfrm_policy_alloc( + xp, (struct xfrm_user_sec_ctx *)sec_ctx)) + goto out; + } + *dir = pol->sadb_x_policy_dir-1; return xp; out: + security_xfrm_policy_free(xp); kfree(xp); return NULL; } diff -puN net/xfrm/xfrm_policy.c~lsm-xfrm-nethooks net/xfrm/xfrm_policy.c --- linux-2.6.12-rc6-xfrm/net/xfrm/xfrm_policy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/xfrm/xfrm_policy.c 2005-06-13 13:22:59.000000000 -0400 @@ -10,7 +10,7 @@ * YOSHIFUJI Hideaki * Split up af-specific portion * Derek Atkins Add the post_input processor - * + * */ #include @@ -257,6 +257,7 @@ void __xfrm_policy_destroy(struct xfrm_p if (del_timer(&policy->timer)) BUG(); + security_xfrm_policy_free(policy); kfree(policy); } EXPORT_SYMBOL(__xfrm_policy_destroy); @@ -396,7 +397,8 @@ struct xfrm_policy *xfrm_policy_bysel(in write_lock_bh(&xfrm_policy_lock); for (p = &xfrm_policy_list[dir]; (pol=*p)!=NULL; p = &pol->next) { - if (memcmp(sel, &pol->selector, sizeof(*sel)) == 0) { + if ((memcmp(sel, &pol->selector, xfrm_selector_base_size()) == 0) && + (xfrm_sec_ctx_match(sel->security, xfrm_policy_security(pol)))) { xfrm_pol_hold(pol); if (delete) *p = pol->next; @@ -492,7 +494,7 @@ EXPORT_SYMBOL(xfrm_policy_walk); /* Find policy to apply to this flow. */ -static void xfrm_policy_lookup(struct flowi *fl, u16 family, u8 dir, +static void xfrm_policy_lookup(struct flowi *fl, struct sock *sk, u16 family, u8 dir, void **objp, atomic_t **obj_refp) { struct xfrm_policy *pol; @@ -506,9 +508,12 @@ static void xfrm_policy_lookup(struct fl continue; match = xfrm_selector_match(sel, fl, family); + if (match) { - xfrm_pol_hold(pol); - break; + if (!security_xfrm_policy_lookup(sk, sel, fl, dir)) { + xfrm_pol_hold(pol); + break; + } } } read_unlock_bh(&xfrm_policy_lock); @@ -516,15 +521,38 @@ static void xfrm_policy_lookup(struct fl *obj_refp = &pol->refcnt; } +static inline int policy_to_flow_dir(int dir) +{ + if (XFRM_POLICY_IN == FLOW_DIR_IN && + XFRM_POLICY_OUT == FLOW_DIR_OUT && + XFRM_POLICY_FWD == FLOW_DIR_FWD) + return dir; + switch (dir) { + default: + case XFRM_POLICY_IN: + return FLOW_DIR_IN; + case XFRM_POLICY_OUT: + return FLOW_DIR_OUT; + case XFRM_POLICY_FWD: + return FLOW_DIR_FWD; + }; +} + static struct xfrm_policy *xfrm_sk_policy_lookup(struct sock *sk, int dir, struct flowi *fl) { struct xfrm_policy *pol; read_lock_bh(&xfrm_policy_lock); if ((pol = sk->sk_policy[dir]) != NULL) { - int match = xfrm_selector_match(&pol->selector, fl, + struct xfrm_selector *sel = &pol->selector; + int match = xfrm_selector_match(sel, fl, sk->sk_family); + int err = 0; + if (match) + err = security_xfrm_policy_lookup(sk, sel, fl, policy_to_flow_dir(dir)); + + if (match && !err) xfrm_pol_hold(pol); else pol = NULL; @@ -595,6 +623,10 @@ static struct xfrm_policy *clone_policy( if (newp) { newp->selector = old->selector; + if (security_xfrm_policy_clone(old, newp)) { + kfree(newp); + return NULL; /* ENOMEM */ + } newp->lft = old->lft; newp->curlft = old->curlft; newp->action = old->action; @@ -706,22 +738,6 @@ xfrm_bundle_create(struct xfrm_policy *p return err; } -static inline int policy_to_flow_dir(int dir) -{ - if (XFRM_POLICY_IN == FLOW_DIR_IN && - XFRM_POLICY_OUT == FLOW_DIR_OUT && - XFRM_POLICY_FWD == FLOW_DIR_FWD) - return dir; - switch (dir) { - default: - case XFRM_POLICY_IN: - return FLOW_DIR_IN; - case XFRM_POLICY_OUT: - return FLOW_DIR_OUT; - case XFRM_POLICY_FWD: - return FLOW_DIR_FWD; - }; -} static int stale_bundle(struct dst_entry *dst); @@ -751,7 +767,7 @@ restart: if ((dst_orig->flags & DST_NOXFRM) || !xfrm_policy_list[XFRM_POLICY_OUT]) return 0; - policy = flow_cache_lookup(fl, family, + policy = flow_cache_lookup(fl, sk, family, policy_to_flow_dir(XFRM_POLICY_OUT), xfrm_policy_lookup); } @@ -942,7 +958,7 @@ int __xfrm_policy_check(struct sock *sk, int i; for (i=skb->sp->len-1; i>=0; i--) { - struct sec_decap_state *xvec = &(skb->sp->x[i]); + struct sec_decap_state *xvec = &(skb->sp->x[i]); if (!xfrm_selector_match(&xvec->xvec->sel, &fl, family)) return 0; @@ -960,7 +976,7 @@ int __xfrm_policy_check(struct sock *sk, pol = xfrm_sk_policy_lookup(sk, dir, &fl); if (!pol) - pol = flow_cache_lookup(&fl, family, + pol = flow_cache_lookup(&fl, sk, family, policy_to_flow_dir(dir), xfrm_policy_lookup); diff -puN net/xfrm/xfrm_state.c~lsm-xfrm-nethooks net/xfrm/xfrm_state.c --- linux-2.6.12-rc6-xfrm/net/xfrm/xfrm_state.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/xfrm/xfrm_state.c 2005-06-13 13:22:59.000000000 -0400 @@ -10,7 +10,7 @@ * Split up af-specific functions * Derek Atkins * Add UDP Encapsulation - * + * */ #include @@ -74,6 +74,7 @@ static void xfrm_state_gc_destroy(struct x->type->destructor(x); xfrm_put_type(x->type); } + security_xfrm_state_free(x); kfree(x); } @@ -338,7 +339,8 @@ xfrm_state_find(xfrm_address_t *daddr, x selector. */ if (x->km.state == XFRM_STATE_VALID) { - if (!xfrm_selector_match(&x->sel, fl, family)) + if (!xfrm_selector_match(&x->sel, fl, family) || + !xfrm_sec_ctx_match(xfrm_policy_security(pol), xfrm_state_security(x))) continue; if (!best || best->km.dying > x->km.dying || @@ -349,7 +351,8 @@ xfrm_state_find(xfrm_address_t *daddr, x acquire_in_progress = 1; } else if (x->km.state == XFRM_STATE_ERROR || x->km.state == XFRM_STATE_EXPIRED) { - if (xfrm_selector_match(&x->sel, fl, family)) + if (xfrm_selector_match(&x->sel, fl, family) && + xfrm_sec_ctx_match(xfrm_policy_security(pol), xfrm_state_security(x))) error = -ESRCH; } } @@ -374,6 +377,13 @@ xfrm_state_find(xfrm_address_t *daddr, x xfrm_init_tempsel(x, fl, tmpl, daddr, saddr, family); if (km_query(x, tmpl, pol) == 0) { + if (!xfrm_sec_ctx_match(xfrm_policy_security(pol), xfrm_state_security(x))) { + x->km.state = XFRM_STATE_DEAD; + xfrm_state_put(x); + x = NULL; + error = -EPERM; + goto out; + } x->km.state = XFRM_STATE_ACQ; list_add_tail(&x->bydst, xfrm_state_bydst+h); xfrm_state_hold(x); diff -puN net/xfrm/xfrm_user.c~lsm-xfrm-nethooks net/xfrm/xfrm_user.c --- linux-2.6.12-rc6-xfrm/net/xfrm/xfrm_user.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/net/xfrm/xfrm_user.c 2005-06-16 14:39:56.000000000 -0400 @@ -7,7 +7,7 @@ * Kazunori MIYAZAWA @USAGI * Kunihiro Ishiguro * IPv6 support - * + * */ #include @@ -209,6 +209,30 @@ static int attach_encap_tmpl(struct xfrm return 0; } + +static inline int xfrm_user_sec_ctx_size(struct xfrm_policy *xp) +{ + struct xfrm_sec_ctx *xfrm_ctx = xfrm_policy_security(xp); + int len = 0; + + if (xfrm_ctx) { + len += sizeof(struct xfrm_user_sec_ctx); + len += xfrm_ctx->ctx_len; + } + return len; +} + +static int attach_sec_ctx(struct xfrm_state *x, struct rtattr *u_arg) +{ + struct xfrm_user_sec_ctx *uxsc = RTA_DATA(u_arg); + + if (uxsc) { + return security_xfrm_state_alloc(x, uxsc); + } + + return 0; +} + static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) { memcpy(&x->id, &p->id, sizeof(x->id)); @@ -258,6 +282,9 @@ static struct xfrm_state *xfrm_state_con if (err) goto error; + if ((err = attach_sec_ctx(x, xfrma[XFRMA_SEC_CTX-1]))) + goto error; + x->curlft.add_time = (unsigned long) xtime.tv_sec; x->km.state = XFRM_STATE_VALID; x->km.seq = p->seq; @@ -344,6 +371,27 @@ struct xfrm_dump_info { int this_idx; }; +static int dump_one_sec_ctx(struct xfrm_sec_ctx *ctx, struct xfrm_user_sec_ctx *uctx, struct sk_buff *skb, int ctx_size) +{ + if (!ctx) + return -1; + + uctx->exttype = XFRMA_SEC_CTX; + uctx->len = ctx_size; + uctx->ctx_doi = ctx->ctx_doi; + uctx->ctx_alg = ctx->ctx_alg; + uctx->ctx_len = ctx->ctx_len; + + memcpy(uctx + 1, ctx->ctx_str, ctx->ctx_len); + + RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size, uctx); + + return 0; + +rtattr_failure: + return -1; +} + static int dump_one_state(struct xfrm_state *x, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -352,6 +400,7 @@ static int dump_one_state(struct xfrm_st struct xfrm_usersa_info *p; struct nlmsghdr *nlh; unsigned char *b = skb->tail; + struct xfrm_sec_ctx *xfrm_ctx; if (sp->this_idx < sp->start_idx) goto out; @@ -376,6 +425,18 @@ static int dump_one_state(struct xfrm_st if (x->encap) RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x->encap), x->encap); + if ((xfrm_ctx = xfrm_state_security(x))) { + int ctx_size = sizeof(struct xfrm_user_sec_ctx) + + xfrm_ctx->ctx_len + 1; + struct xfrm_user_sec_ctx *uctx = kmalloc(ctx_size, GFP_KERNEL); + int err; + + err = dump_one_sec_ctx(xfrm_ctx, uctx, skb, ctx_size); + kfree(uctx); + + if (err < 0) + goto rtattr_failure; + } nlh->nlmsg_len = skb->tail - b; out: sp->this_idx++; @@ -589,6 +650,25 @@ static int verify_newpolicy_info(struct return verify_policy_dir(p->dir); } +static int copy_sec_ctx(struct xfrm_policy *pol, struct xfrm_user_sec_ctx *uctx) +{ + int err = 0; + + if (uctx) { + err = security_xfrm_policy_alloc(pol, uctx); + } + + return err; +} + +static int copy_from_user_sec_ctx(struct xfrm_policy *pol, struct rtattr **xfrma) +{ + struct rtattr *rt = xfrma[XFRMA_SEC_CTX-1]; + struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); + + return copy_sec_ctx(pol, uctx); +} + static void copy_templates(struct xfrm_policy *xp, struct xfrm_user_tmpl *ut, int nr) { @@ -667,7 +747,10 @@ static struct xfrm_policy *xfrm_policy_c } copy_from_user_policy(xp, p); - err = copy_from_user_tmpl(xp, xfrma); + + if (!(err = copy_from_user_tmpl(xp, xfrma))) + err = copy_from_user_sec_ctx(xp, xfrma); + if (err) { *errp = err; kfree(xp); @@ -737,6 +820,27 @@ rtattr_failure: return -1; } +static int copy_to_user_sec_ctx(struct xfrm_policy *xp, struct sk_buff *skb, int src) +{ + int err = 0; + struct xfrm_sec_ctx *xfrm_ctx = xfrm_policy_security(xp); + + if (xfrm_ctx) { + int ctx_size = sizeof(struct xfrm_user_sec_ctx) + + xfrm_ctx->ctx_len; + struct xfrm_user_sec_ctx *uctx = kmalloc(ctx_size, GFP_KERNEL); + + if (!uctx) + return -ENOMEM; + + err = dump_one_sec_ctx(xfrm_ctx, uctx, skb, + ctx_size); + kfree(uctx); + } + + return err; +} + static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -758,6 +862,8 @@ static int dump_one_policy(struct xfrm_p copy_to_user_policy(xp, p, dir); if (copy_to_user_tmpl(xp, skb) < 0) goto nlmsg_failure; + if (copy_to_user_sec_ctx(xp, skb, 0) < 0) + goto nlmsg_failure; nlh->nlmsg_len = skb->tail - b; out: @@ -813,7 +919,7 @@ static struct sk_buff *xfrm_policy_netli static int xfrm_get_policy(struct sk_buff *skb, struct nlmsghdr *nlh, void **xfrma) { - struct xfrm_policy *xp; + struct xfrm_policy *xp, tmp; struct xfrm_userpolicy_id *p; int err; int delete; @@ -827,8 +933,20 @@ static int xfrm_get_policy(struct sk_buf if (p->index) xp = xfrm_policy_byid(p->dir, p->index, delete); - else - xp = xfrm_policy_bysel(p->dir, &p->sel, delete); + else { + struct rtattr **rtattrs = (struct rtattr **) xfrma; + struct rtattr *rt = rtattrs[XFRMA_SEC_CTX-1]; + + memcpy(&tmp.selector, &p->sel, sizeof(struct xfrm_selector)); + if (rt) { + struct xfrm_user_sec_ctx *uxsc = RTA_DATA(rt); + + if ((err = security_xfrm_policy_alloc(&tmp, uxsc))) + return err; + } + xp = xfrm_policy_bysel(p->dir, &tmp.selector, delete); + security_xfrm_policy_free(&tmp); + } if (xp == NULL) return -ENOENT; @@ -1110,6 +1228,8 @@ static int build_acquire(struct sk_buff if (copy_to_user_tmpl(xp, skb) < 0) goto nlmsg_failure; + if (copy_to_user_sec_ctx(xp, skb, 1) < 0) + goto nlmsg_failure; nlh->nlmsg_len = skb->tail - b; return skb->len; @@ -1127,6 +1247,7 @@ static int xfrm_send_acquire(struct xfrm len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_acquire)); + len += RTA_SPACE(xfrm_user_sec_ctx_size(xp)); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -1147,8 +1268,9 @@ static struct xfrm_policy *xfrm_compile_ { struct xfrm_userpolicy_info *p = (struct xfrm_userpolicy_info *)data; struct xfrm_user_tmpl *ut = (struct xfrm_user_tmpl *) (p + 1); + struct xfrm_user_sec_ctx *uctx; struct xfrm_policy *xp; - int nr; + int nr = 0; switch (family) { case AF_INET: @@ -1176,9 +1298,26 @@ static struct xfrm_policy *xfrm_compile_ verify_newpolicy_info(p)) return NULL; + if (len > (sizeof(*p) + (XFRM_MAX_DEPTH * + sizeof(struct xfrm_user_tmpl)))) { + struct xfrm_user_tmpl *tmpl; + uctx = (struct xfrm_user_sec_ctx *) (ut + XFRM_MAX_DEPTH); + + if (len != sizeof(*p) + + (XFRM_MAX_DEPTH * sizeof(struct xfrm_user_tmpl)) + + uctx->len) + return NULL; + + /* spi must be zero'd unless real tmpl */ + for (tmpl = ut; tmpl->id.spi != 0; tmpl = tmpl + 1) + nr++; + } + else { + uctx = NULL; nr = ((len - sizeof(*p)) / sizeof(*ut)); if (nr > XFRM_MAX_DEPTH) return NULL; + } xp = xfrm_policy_alloc(GFP_KERNEL); if (xp == NULL) { @@ -1188,6 +1327,10 @@ static struct xfrm_policy *xfrm_compile_ copy_from_user_policy(xp, p); copy_templates(xp, ut, nr); + if (copy_sec_ctx(xp, uctx)) { + *dir = -EPERM; + return NULL; + } *dir = p->dir; @@ -1208,6 +1351,8 @@ static int build_polexpire(struct sk_buf copy_to_user_policy(xp, &upe->pol, dir); if (copy_to_user_tmpl(xp, skb) < 0) goto nlmsg_failure; + if (copy_to_user_sec_ctx(xp, skb, 2) < 0) + goto nlmsg_failure; upe->hard = !!hard; nlh->nlmsg_len = skb->tail - b; @@ -1225,6 +1370,7 @@ static int xfrm_send_policy_notify(struc len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_polexpire)); + len += xfrm_user_sec_ctx_size(xp); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; diff -puN security/dummy.c~lsm-xfrm-nethooks security/dummy.c --- linux-2.6.12-rc6-xfrm/security/dummy.c~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/security/dummy.c 2005-06-13 13:22:59.000000000 -0400 @@ -811,6 +811,35 @@ static inline void dummy_sk_free_securit } #endif /* CONFIG_SECURITY_NETWORK */ +#ifdef CONFIG_SECURITY_NETWORK_XFRM +static int dummy_xfrm_policy_alloc_security(struct xfrm_policy *xp, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static inline int dummy_xfrm_policy_clone_security(struct xfrm_policy *old, struct xfrm_policy *new) +{ + return 0; +} + +static void dummy_xfrm_policy_free_security(struct xfrm_policy *xp) +{ +} + +static int dummy_xfrm_state_alloc_security(struct xfrm_state *x, struct xfrm_user_sec_ctx *sec_ctx) +{ + return 0; +} + +static void dummy_xfrm_state_free_security(struct xfrm_state *x) +{ +} + +static int dummy_xfrm_policy_lookup(struct sock *sk, struct xfrm_selector *sel, struct flowi *fl, u8 dir) +{ + return 0; +} +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ static int dummy_register_security (const char *name, struct security_operations *ops) { return -EINVAL; @@ -992,5 +1021,13 @@ void security_fixup_ops (struct security set_to_dummy_if_null(ops, sk_alloc_security); set_to_dummy_if_null(ops, sk_free_security); #endif /* CONFIG_SECURITY_NETWORK */ +#ifdef CONFIG_SECURITY_NETWORK_XFRM + set_to_dummy_if_null(ops, xfrm_policy_alloc_security); + set_to_dummy_if_null(ops, xfrm_policy_clone_security); + set_to_dummy_if_null(ops, xfrm_policy_free_security); + set_to_dummy_if_null(ops, xfrm_state_alloc_security); + set_to_dummy_if_null(ops, xfrm_state_free_security); + set_to_dummy_if_null(ops, xfrm_policy_lookup); +#endif /* CONFIG_SECURITY_NETWORK_XFRM */ } diff -puN security/Kconfig~lsm-xfrm-nethooks security/Kconfig --- linux-2.6.12-rc6-xfrm/security/Kconfig~lsm-xfrm-nethooks 2005-06-13 13:22:59.000000000 -0400 +++ linux-2.6.12-rc6-xfrm-root/security/Kconfig 2005-06-13 13:22:59.000000000 -0400 @@ -53,6 +53,19 @@ config SECURITY_NETWORK implement socket and networking access controls. If you are unsure how to answer this question, answer N. +config SECURITY_NETWORK_XFRM + bool "XFRM (IPSec) Networking Security Hooks" + depends on XFRM && SECURITY_NETWORK + help + This enables the XFRM (IPSec) networking security hooks. + If enabled, a security module can use these hooks to + implement per-packet access controls based on labels + derived from IPSec policy. Non-IPSec communications are + designated as unlabelled, and only sockets authorized + to communicate unlabelled data can send without using + IPSec. + If you are unsure how to answer this question, answer N. + config SECURITY_CAPABILITIES tristate "Default Linux Capabilities" depends on SECURITY _ From maca02@atlas.cz Fri Jun 17 11:59:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 11:59:08 -0700 (PDT) Received: from localhost.localdomain (maca.fortech.cz [213.250.192.50]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HIx4H9013880 for ; Fri, 17 Jun 2005 11:59:06 -0700 Received: from localhost (localhost [127.0.0.1]) by localhost.localdomain (8.12.11/8.12.8) with ESMTP id j5HIvnfd026846 for ; Fri, 17 Jun 2005 19:57:49 +0100 Date: Fri, 17 Jun 2005 20:57:49 +0200 (CEST) From: =?ISO-8859-2?Q?Tom=E1=B9_Macek?= X-X-Sender: root@localhost.localdomain To: netdev@oss.sgi.com Subject: Re: receive only one record from the routing table In-Reply-To: <20050617141527.GN22463@postel.suug.ch> Message-ID: References: <20050617141527.GN22463@postel.suug.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2449 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: maca02@atlas.cz Precedence: bulk X-list: netdev Content-Length: 914 Lines: 27 Part of my routing table is here: 3.3.0.0 * 255.255.0.0 U 0 0 0 eth1 default meric 0.0.0.0 UG 0 0 0 eth0 Ommiting NLM_F_DUMP and typing './a.out 3.3.0.0' gives Error in recieved packet: Success Read From Socket Failed... and I don't see the reason why... I think, it should write something with the 3.3.0.0 destination, but writes the error above instead On Fri, 17 Jun 2005, Thomas Graf wrote: > * Tom?? Macek 2005-06-17 14:51 >> nlMsg->nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)); // Length of message. >> nlMsg->nlmsg_type = RTM_GETROUTE; // Get the routes from kernel routing table . >> >> nlMsg->nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; // The message is a request for dump. > > Omit NLM_F_DUMP and you'll be fine, see rfc3549. > > > > > From tgraf@suug.ch Fri Jun 17 12:14:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 12:14:38 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HJEYH9014750 for ; Fri, 17 Jun 2005 12:14:36 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 66BCC1C0EB; Fri, 17 Jun 2005 21:13:40 +0200 (CEST) Date: Fri, 17 Jun 2005 21:13:40 +0200 From: Thomas Graf To: =?iso-8859-1?B?VG9t4T8=?= Macek Cc: netdev@oss.sgi.com Subject: Re: receive only one record from the routing table Message-ID: <20050617191340.GO22463@postel.suug.ch> References: <20050617141527.GN22463@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 2450 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 552 Lines: 14 * Tom?? Macek 2005-06-17 20:57 > Part of my routing table is here: > > 3.3.0.0 * 255.255.0.0 U 0 0 0 eth1 > default meric 0.0.0.0 UG 0 0 0 eth0 > > Ommiting NLM_F_DUMP and typing './a.out 3.3.0.0' gives > > Error in recieved packet: Success > Read From Socket Failed... Bcause you don't set rtm_dst_len to the prefix length or 32, and rtm_family (AF_INET). You could also use libnl, probably easier to use. From jmoyer@redhat.com Fri Jun 17 12:57:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 12:57:56 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HJvqH9021019 for ; Fri, 17 Jun 2005 12:57:52 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5HJuaVI013466; Fri, 17 Jun 2005 15:56:36 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5HJuau29080; Fri, 17 Jun 2005 15:56:36 -0400 Received: from segfault.boston.redhat.com (segfault.boston.redhat.com [172.16.80.57]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j5HJuZCv008021; Fri, 17 Jun 2005 15:56:35 -0400 Received: from segfault.boston.redhat.com (localhost.localdomain [127.0.0.1]) by segfault.boston.redhat.com (8.13.1/8.13.1) with ESMTP id j5HJuZj1029439; Fri, 17 Jun 2005 15:56:35 -0400 Received: (from jmoyer@localhost) by segfault.boston.redhat.com (8.13.1/8.13.1/Submit) id j5HJuZIj029436; Fri, 17 Jun 2005 15:56:35 -0400 From: Jeff Moyer MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17075.10995.498758.773092@segfault.boston.redhat.com> Date: Fri, 17 Jun 2005 15:56:35 -0400 To: mpm@selenic.com CC: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: netpoll and the bonding driver X-Mailer: VM 7.17 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid Reply-To: jmoyer@redhat.com X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? X-archive-position: 2451 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmoyer@redhat.com Precedence: bulk X-list: netdev Content-Length: 2140 Lines: 48 Hi, I'm trying to implement a netpoll hook for the bonding driver. In doing so, I ran into the following problem: netpoll_send_skb calls the device's hard_start_xmit routine. In this case, it will be one of the bonding driver's xmit routines. Each of these ends up calling bond_dev_queue_xmit, which in turn calls dev_queue_xmit. Now, for netconsole, the code disables interrupts before calling netpoll_send_udp: local_irq_save(flags); for(left = len; left; ) { frag = min(left, MAX_PRINT_CHUNK); netpoll_send_udp(&np, msg, frag); msg += frag; left -= frag; } local_irq_restore(flags); Note that if you did an alt-sysrq-t, then you would enter this code path in interrupt context as well, and herein lies the problem. It seems that dev_queue_xmit is not supposed to be called with interrupts disabled. The immediate affect of this is that the WARN_ON in local_bh_enable triggers (called at the end of dev_queue_xmit), causing us to loop infinitely printing out stack traces. So, my question is this: how in the world do we fit the bonding driver into the generic netpoll infrastructure? In the case of every other driver, netpoll simply calls the hard_start_xmit routine[1], and this approach simply doesn't work for the bonding driver, for the reasons I described above. So, one way to make the bonding driver fit into this model is to modify it to not call dev_queue_xmit when called from netpoll. This can be done, I suppose, by adding another start_xmit routine that is specific to netpoll. This doesn't feel good to me, but I'm not sure how else you would solve the problem (and netpoll already gets its own poll interface, so is one more all that bad?). The other approach to take is to put bonding specific logic into netpoll. I think we can all agree that is a bad idea. -Jeff [1] Note that netpoll does not perform any of the checks that dev_queue_xmit does. This either means that a) in the netpoll case, this is an okay thing to do (since it's been working for this long), or b) netpoll has a bug. From mallikarjuna.chilakala@intel.com Fri Jun 17 16:58:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 16:58:49 -0700 (PDT) Received: from fmsfmr001.fm.intel.com (fmr13.intel.com [192.55.52.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HNwgH9002670 for ; Fri, 17 Jun 2005 16:58:42 -0700 Received: from fmsfmr101.fm.intel.com (fmsfmr101.fm.intel.com [10.253.24.21]) by fmsfmr001.fm.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j5HNvO0j013698; Fri, 17 Jun 2005 23:57:24 GMT Received: from fmsmsxvs042.fm.intel.com (fmsmsxvs042.fm.intel.com [132.233.42.128]) by fmsfmr101.fm.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j5HNvAaf007918; Fri, 17 Jun 2005 23:57:23 GMT Received: from [134.134.3.107] ([134.134.3.107]) by fmsmsxvs042.fm.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005061716572307553 ; Fri, 17 Jun 2005 16:57:23 -0700 Date: Fri, 17 Jun 2005 16:54:36 -0700 (PDT) From: Malli Chilakala To: "jgarzik@pobox.com" cc: netdev Subject: [PATCH net-drivers-2.6 0/9] ixgb: driver update Message-ID: ReplyTo: "Malli Chilakala" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2452 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mallikarjuna.chilakala@intel.com Precedence: bulk X-list: netdev Content-Length: 651 Lines: 15 ixgb: driver update Signed-off-by: Mallikarjuna R Chilakala Signed-off-by: Ganesh Venkatesan Signed-off-by: John Ronciak 1. Set RXDCTL:PTHRESH/HTHRESH to zero 2. Fix unnecessary link state messages 3. Use netdev_priv() instead of netdev->priv 4. Fix Broadcast/Multicast packets received statistics 5. Fix data output by ethtool -d 6. Ethtool cleanup patch from Stephen Hemminger 7. Remove unused functions, render some variable static instead of global 8. Redefined buffer_info-dma to be dma_addr_t instead of uint64 9. Driver version & white space fixes From mallikarjuna.chilakala@intel.com Fri Jun 17 16:59:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 16:59:17 -0700 (PDT) Received: from orsfmr002.jf.intel.com (fmr17.intel.com [134.134.136.16]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5HNxFH9002749 for ; Fri, 17 Jun 2005 16:59:15 -0700 Received: from orsfmr100.jf.intel.com (orsfmr100.jf.intel.com [10.7.209.16]) by orsfmr002.jf.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j5HNvwt2015116; Fri, 17 Jun 2005 23:57:58 GMT Received: from orsmsxvs040.jf.intel.com (orsmsxvs040.jf.intel.com [192.168.65.206]) by orsfmr100.jf.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j5HNvu9x030678; Fri, 17 Jun 2005 23:57:58 GMT Received: from [134.134.3.107] ([134.134.3.107]) by orsmsxvs040.jf.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005061716575017424 ; Fri, 17 Jun 2005 16:57:50 -0700 Date: Fri, 17 Jun 2005 16:55:04 -0700 (PDT) From: Malli Chilakala To: "jgarzik@pobox.com" cc: netdev Subject: [PATCH net-drivers-2.6 1/9] ixgb: Set RXDCTL:PTHRESH/HTHRESH to zero Message-ID: ReplyTo: "Malli Chilakala" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2453 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mallikarjuna.chilakala@intel.com Precedence: bulk X-list: netdev Content-Length: 1196 Lines: 25 Set RXDCTL:PTHRESH/HTHRESH to zero Signed-off-by: Mallikarjuna R Chilakala Signed-off-by: Ganesh Venkatesan Signed-off-by: John Ronciak diff -up netdev-2.6/drivers/net/ixgb/ixgb_main.c netdev-2.6/drivers/net/ixgb.new/ixgb_main.c --- netdev-2.6/drivers/net/ixgb/ixgb_main.c 2005-05-25 12:26:48.000000000 -0700 +++ netdev-2.6/drivers/net/ixgb.new/ixgb_main.c 2005-05-25 12:27:01.000000000 -0700 @@ -142,10 +142,12 @@ static void ixgb_netpoll(struct net_devi MODULE_LICENSE("GPL"); /* some defines for controlling descriptor fetches in h/w */ -#define RXDCTL_PTHRESH_DEFAULT 128 /* chip considers prefech below this */ -#define RXDCTL_HTHRESH_DEFAULT 16 /* chip will only prefetch if tail is - pushed this many descriptors from head */ #define RXDCTL_WTHRESH_DEFAULT 16 /* chip writes back at this many or RXT0 */ +#define RXDCTL_PTHRESH_DEFAULT 0 /* chip considers prefech below + * this */ +#define RXDCTL_HTHRESH_DEFAULT 0 /* chip will only prefetch if tail + * is pushed this many descriptors + * from head */ /** * ixgb_init_module - Driver Registration Routine From mallikarjuna.chilakala@intel.com Fri Jun 17 17:02:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 17 Jun 2005 17:02:14 -0700 (PDT) Received: from fmsfmr002.fm.intel.com (fmr14.intel.com [192.55.52.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5I02BH9003198 for ; Fri, 17 Jun 2005 17:02:11 -0700 Received: from fmsfmr101.fm.intel.com (fmsfmr101.fm.intel.com [10.253.24.21]) by fmsfmr002.fm.intel.com (8.12.10/8.12.10/d: major-outer.mc,v 1.1 2004/09/17 17:50:56 root Exp $) with ESMTP id j5I00iDe010766; Sat, 18 Jun 2005 00:00:48 GMT Received: from fmsmsxvs041.fm.intel.com (fmsmsxvs041.fm.intel.com [132.233.42.126]) by fmsfmr101.fm.intel.com (8.12.10/8.12.10/d: major-inner.mc,v 1.2 2004/09/17 18:05:01 root Exp $) with SMTP id j5I00daX011383; Sat, 18 Jun 2005 00:00:44 GMT Received: from [134.134.3.107] ([134.134.3.107]) by fmsmsxvs041.fm.intel.com (SAVSMTP 3.1.7.47) with SMTP id M2005061717004324313 ; Fri, 17 Jun 2005 17:00:43 -0700 Date: Fri, 17 Jun 2005 16:57:56 -0700 (PDT) From: Malli Chilakala To: "jgarzik@pobox.com" cc: netdev Subject: [PATCH net-drivers-2.6 2/9] ixgb: Fix unnecessary link state messages Message-ID: ReplyTo: "Malli Chilakala" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.44 X-archive-position: 2454 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mallikarjuna.chilakala@intel.com Precedence: bulk X-list: netdev Content-Length: 1934 Lines: 58 Fix unnecessary link state messages Signed-off-by: Mallikarjuna R Chilakala Signed-off-by: Ganesh Venkatesan Signed-off-by: John Ronciak diff -up netdev-2.6/drivers/net/ixgb/ixgb_ethtool.c netdev-2.6/drivers/net/ixgb.new/ixgb_ethtool.c --- netdev-2.6/drivers/net/ixgb/ixgb_ethtool.c 2005-05-25 12:26:48.000000000 -0700 +++ netdev-2.6/drivers/net/ixgb.new/ixgb_ethtool.c 2005-05-25 12:26:56.000000000 -0700 @@ -130,6 +130,12 @@ ixgb_get_settings(struct net_device *net ixgb_down(adapter, TRUE); ixgb_reset(adapter); ixgb_up(adapter); + /* be optimistic about our link, since we were up before */ + adapter->link_speed = 10000; + adapter->link_duplex = FULL_DUPLEX; + netif_carrier_on(netdev); + netif_wake_queue(netdev); + } else ixgb_reset(adapter); @@ -177,6 +181,11 @@ ixgb_set_pauseparam(struct net_device *n if(netif_running(adapter->netdev)) { ixgb_down(adapter, TRUE); ixgb_up(adapter); + /* be optimistic about our link, since we were up before */ + adapter->link_speed = 10000; + adapter->link_duplex = FULL_DUPLEX; + netif_carrier_on(netdev); + netif_wake_queue(netdev); } else ixgb_reset(adapter); @@ -199,6 +181,11 @@ if(netif_running(netdev)) { ixgb_down(adapter,TRUE); ixgb_up(adapter); + /* be optimistic about our link, since we were up before */ + adapter->link_speed = 10000; + adapter->link_duplex = FULL_DUPLEX; + netif_carrier_on(netdev); + netif_wake_queue(netdev); } else ixgb_reset(adapter); return 0; @@ -573,6 +573,11 @@ ixgb_set_ringparam(struct net_device *ne adapter->tx_ring = tx_new; if((err = ixgb_up(adapter))) return err; + /* be optimistic about our link, since we were up before */ + adapter->link_speed = 10000; + adapter->link_duplex = FULL_DUPLEX; + netif_carrier_on(netdev); + netif_wake_queue(netdev); } return 0; From romieu@fr.zoreil.com Sat Jun 18 03:31:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 03:32:01 -0700 (PDT) Received: from fr.zoreil.com (electric-eye.fr.zoreil.com [213.41.134.224]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5IAVuH9008816 for ; Sat, 18 Jun 2005 03:31:57 -0700 Received: from electric-eye.fr.zoreil.com (localhost.localdomain [127.0.0.1]) by fr.zoreil.com (8.13.1/8.12.1) with ESMTP id j5IASpgI025542; Sat, 18 Jun 2005 12:28:51 +0200 Received: (from romieu@localhost) by electric-eye.fr.zoreil.com (8.13.1/8.13.1/Submit) id j5IASj8v025541; Sat, 18 Jun 2005 12:28:45 +0200 Date: Sat, 18 Jun 2005 12:28:45 +0200 From: Francois Romieu To: Malli Chilakala Cc: "jgarzik@pobox.com" , netdev Subject: Re: [PATCH net-drivers-2.6 2/9] ixgb: Fix unnecessary link state messages Message-ID: <20050618102845.GA25471@electric-eye.fr.zoreil.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-Organisation: Land of Sunshine Inc. X-archive-position: 2455 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: romieu@fr.zoreil.com Precedence: bulk X-list: netdev Content-Length: 401 Lines: 12 Malli Chilakala : > Fix unnecessary link state messages > > Signed-off-by: Mallikarjuna R Chilakala > Signed-off-by: Ganesh Venkatesan > Signed-off-by: John Ronciak The patch duplicates a lot of code. Do you really want these parts to be modified independantly ? -- Ueimor From manfred@colorfullife.com Sat Jun 18 07:28:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 07:28:58 -0700 (PDT) Received: from dbl.q-ag.de (dbl.q-ag.de [213.172.117.3]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5IESrH9022994 for ; Sat, 18 Jun 2005 07:28:55 -0700 Received: from [127.0.0.2] (dbl [127.0.0.1]) by dbl.q-ag.de (8.13.3/8.13.3/Debian-6) with ESMTP id j5IETRii032331; Sat, 18 Jun 2005 16:29:28 +0200 Message-ID: <42B42F47.2090105@colorfullife.com> Date: Sat, 18 Jun 2005 16:27:19 +0200 From: Manfred Spraul User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.7.8) Gecko/20050513 Fedora/1.7.8-1.3.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeff Garzik CC: Netdev Subject: [PATCH] forcedeth: Poll for link changes Content-Type: multipart/mixed; boundary="------------080805060801050701020906" X-archive-position: 2456 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: manfred@colorfullife.com Precedence: bulk X-list: netdev Content-Length: 5055 Lines: 127 This is a multi-part message in MIME format. --------------080805060801050701020906 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hi Jeff, Several users reported that the link speed detection is unreliable with nForce 3 nics. The result is either a completely dead network connection, or the connection dies after a few minutes. The attached patch enables a timer that polls the PHY for link speed changes. The code already exists for nForce 1/2, the patch enables it for all nForce versions. Signed-Off-By: Manfred Spraul --------------080805060801050701020906 Content-Type: text/plain; name="patch-forcedeth-linktimer" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch-forcedeth-linktimer" --- 2.6/drivers/net/forcedeth.c 2005-06-18 16:21:47.000000000 +0200 +++ build-2.6/drivers/net/forcedeth.c 2005-06-18 15:03:55.000000000 +0200 @@ -82,7 +82,8 @@ * 0.31: 14 Nov 2004: ethtool support for getting/setting link * capabilities. * 0.32: 16 Apr 2005: RX_ERROR4 handling added. - * 0.33: 16 Mai 2005: Support for MCP51 added. + * 0.33: 16 May 2005: Support for MCP51 added. + * 0.34: 18 Jun 2005: Add DEV_NEED_LINKTIMER to all nForce nics. * * Known bugs: * We suspect that on some hardware no TX done interrupts are generated. @@ -94,7 +95,7 @@ * DEV_NEED_TIMERIRQ will not harm you on sane hardware, only generating a few * superfluous timer interrupts from the nic. */ -#define FORCEDETH_VERSION "0.33" +#define FORCEDETH_VERSION "0.34" #define DRV_NAME "forcedeth" #include @@ -2218,70 +2219,70 @@ .device = PCI_DEVICE_ID_NVIDIA_NVENET_4, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* nForce3 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_5, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* nForce3 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_6, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* nForce3 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_7, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* CK804 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_8, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* CK804 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_9, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* MCP04 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_10, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* MCP04 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_11, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* MCP51 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_12, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, { /* MCP51 Ethernet Controller */ .vendor = PCI_VENDOR_ID_NVIDIA, .device = PCI_DEVICE_ID_NVIDIA_NVENET_13, .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID, - .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ, + .driver_data = DEV_NEED_LASTPACKET1|DEV_IRQMASK_2|DEV_NEED_TIMERIRQ|DEV_NEED_LINKTIMER, }, {0,}, }; --------------080805060801050701020906-- From maca02@atlas.cz Sat Jun 18 11:57:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 11:57:04 -0700 (PDT) Received: from localhost.localdomain (maca.fortech.cz [213.250.192.50]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5IIutH9005064 for ; Sat, 18 Jun 2005 11:57:01 -0700 Received: from localhost (localhost [127.0.0.1]) by localhost.localdomain (8.12.11/8.12.8) with ESMTP id j5IItZsS029853 for ; Sat, 18 Jun 2005 19:55:36 +0100 Date: Sat, 18 Jun 2005 20:55:35 +0200 (CEST) From: =?ISO-8859-2?Q?Tom=E1=B9_Macek?= X-X-Sender: root@localhost.localdomain To: netdev@oss.sgi.com Subject: Re: receive only one record from the routing table In-Reply-To: <20050617191340.GO22463@postel.suug.ch> Message-ID: References: <20050617141527.GN22463@postel.suug.ch> <20050617191340.GO22463@postel.suug.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2457 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: maca02@atlas.cz Precedence: bulk X-list: netdev Content-Length: 2575 Lines: 75 Thanks for your answers. I just tried to compile the libnl, but some errors occured, so I'm just continuing without it. But I've looked at the html documentation and it seems to be very good. I repaired the main() function by adding the rtMsg->rtm_family = AF_INET; rtMsg->rtm_dst_len = 16; but every request on any dst address in the routing table gives me this output: Destination Gateway Interface Source Netmask 127.0.0.1 *.*.*.* lo 255.255.255.255 The 'rtm_dst_len = 16' should mean the mask of the route I'm looking for, correct? The whole code before sending the packet is below: /* Create Socket */ if((sock = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)) < 0) perror("Socket Creation: "); /* Initialize the buffer */ memset(msgBuf, 0, BUFSIZE); /* point the header and the msg structure pointers into the buffer */ nlMsg = (struct nlmsghdr *)msgBuf; rtMsg = (struct rtmsg *)NLMSG_DATA(nlMsg); rtMsg->rtm_family = AF_INET; rtMsg->rtm_dst_len = 16; /* Fill in the nlmsg header*/ nlMsg->nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)); // Length of message. nlMsg->nlmsg_type = RTM_GETROUTE; // Get the routes from kernel routing table . nlMsg->nlmsg_flags = NLM_F_REQUEST; // The message is a request for dump. nlMsg->nlmsg_seq = msgSeq++; // Sequence of the message packet. nlMsg->nlmsg_pid = getpid(); // PID of process sending the request. char *cp; unsigned int xx[4]; int i = 0; unsigned char *ap = (unsigned char *)xx; for (cp = argv[1], i = 0; *cp; cp++) { if (*cp <= '9' && *cp >= '0') { ap[i] = 10*ap[i] + (*cp-'0'); continue; } if (*cp == '.' && ++i <= 3) continue; return -1; } NetlinkAddAttr(nlMsg, sizeof(nlMsg), RTA_DST, &xx, 4); On Fri, 17 Jun 2005, Thomas Graf wrote: > * Tom?? Macek 2005-06-17 20:57 >> Part of my routing table is here: >> >> 3.3.0.0 * 255.255.0.0 U 0 0 0 eth1 >> default meric 0.0.0.0 UG 0 0 0 eth0 >> >> Ommiting NLM_F_DUMP and typing './a.out 3.3.0.0' gives >> >> Error in recieved packet: Success >> Read From Socket Failed... > > Bcause you don't set rtm_dst_len to the prefix length or 32, > and rtm_family (AF_INET). You could also use libnl, probably > easier to use. > > > > > From tgraf@suug.ch Sat Jun 18 13:24:56 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 13:24:58 -0700 (PDT) Received: from postel.suug.ch (postel.suug.ch [195.134.158.23]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5IKOtH9011873 for ; Sat, 18 Jun 2005 13:24:55 -0700 Received: by postel.suug.ch (Postfix, from userid 10001) id 5C0781C0EB; Sat, 18 Jun 2005 22:23:59 +0200 (CEST) Date: Sat, 18 Jun 2005 22:23:59 +0200 From: Thomas Graf To: =?iso-8859-1?B?VG9t4T8=?= Macek Cc: netdev@oss.sgi.com Subject: Re: receive only one record from the routing table Message-ID: <20050618202359.GP22463@postel.suug.ch> References: <20050617141527.GN22463@postel.suug.ch> <20050617191340.GO22463@postel.suug.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 2458 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: tgraf@suug.ch Precedence: bulk X-list: netdev Content-Length: 2208 Lines: 65 * Tom?? Macek 2005-06-18 20:55 > The 'rtm_dst_len = 16' should mean the mask of the route I'm looking for, correct? Yes. > The whole code before sending the packet is below: > > > /* Create Socket */ > if((sock = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)) < 0) > perror("Socket Creation: "); > > /* Initialize the buffer */ > memset(msgBuf, 0, BUFSIZE); > > /* point the header and the msg structure pointers into the buffer */ > nlMsg = (struct nlmsghdr *)msgBuf; > rtMsg = (struct rtmsg *)NLMSG_DATA(nlMsg); > rtMsg->rtm_family = AF_INET; > rtMsg->rtm_dst_len = 16; > > /* Fill in the nlmsg header*/ > nlMsg->nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)); // Length of message. > nlMsg->nlmsg_type = RTM_GETROUTE; // Get the routes from kernel routing table . > nlMsg->nlmsg_flags = NLM_F_REQUEST; // The message is a request for dump. > nlMsg->nlmsg_seq = msgSeq++; // Sequence of the message packet. > nlMsg->nlmsg_pid = getpid(); // PID of process sending the request. > > char *cp; > unsigned int xx[4]; int i = 0; > unsigned char *ap = (unsigned char *)xx; > for (cp = argv[1], i = 0; *cp; cp++) { > if (*cp <= '9' && *cp >= '0') { > ap[i] = 10*ap[i] + (*cp-'0'); > continue; > } > if (*cp == '.' && ++i <= 3) > continue; > return -1; > } > > NetlinkAddAttr(nlMsg, sizeof(nlMsg), RTA_DST, &xx, 4); This looks good but your NetlinkAddAttr is bogus, it should be something like this: int nl_msg_append_tlv(struct nlmsghdr *n, int type, void *data, size_t len) { int tlen; struct rtattr *rta; tlen = NLMSG_ALIGN(n->nlmsg_len) + RTA_LENGTH(NLMSG_ALIGN(len)); rta = (struct rtattr *) NLMSG_TAIL(n); rta->rta_type = type; rta->rta_len = RTA_LENGTH(NLMSG_ALIGN(len)); memcpy(RTA_DATA(rta), data, len); n->nlmsg_len = tlen; return 0; } Your code is missing various alignment requirements. I can't tell whether this is the last bug. I recommend you to read ip/iproute.c in the iproute2 source or give libnl a second chance. From hch@lst.de Sat Jun 18 16:29:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:29:38 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INTWH9023091 for ; Sat, 18 Jun 2005 16:29:33 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INRu6t010195 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:27:56 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INRuhI010193; Sun, 19 Jun 2005 01:27:56 +0200 Date: Sun, 19 Jun 2005 01:27:56 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 6/9] orinoco: scanning support Message-ID: <20050618232756.GG9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2467 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 22444 Lines: 727 Patch from Pavel Roskin Index: linux-2.6/drivers/net/wireless/orinoco.c =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.c 2005-06-19 01:03:47.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.c 2005-06-19 01:03:51.000000000 +0200 @@ -514,6 +514,10 @@ /* Internal constants */ /********************************************************************/ +/* 802.2 LLC/SNAP header used for Ethernet encapsulation over 802.11 */ +static const u8 encaps_hdr[] = {0xaa, 0xaa, 0x03, 0x00, 0x00, 0x00}; +#define ENCAPS_OVERHEAD (sizeof(encaps_hdr) + 2) + #define ORINOCO_MIN_MTU 256 #define ORINOCO_MAX_MTU (IEEE802_11_DATA_LEN - ENCAPS_OVERHEAD) @@ -579,25 +583,42 @@ /* Data types */ /********************************************************************/ -struct header_struct { - /* 802.3 */ - u8 dest[ETH_ALEN]; - u8 src[ETH_ALEN]; - u16 len; - /* 802.2 */ +/* Used in Event handling. + * We avoid nested structres as they break on ARM -- Moustafa */ +struct hermes_tx_descriptor_802_11 { + /* hermes_tx_descriptor */ + u16 status; + u16 reserved1; + u16 reserved2; + u32 sw_support; + u8 retry_count; + u8 tx_rate; + u16 tx_control; + + /* ieee802_11_hdr */ + u16 frame_ctl; + u16 duration_id; + u8 addr1[ETH_ALEN]; + u8 addr2[ETH_ALEN]; + u8 addr3[ETH_ALEN]; + u16 seq_ctl; + u8 addr4[ETH_ALEN]; + u16 data_len; + + /* ethhdr */ + unsigned char h_dest[ETH_ALEN]; /* destination eth addr */ + unsigned char h_source[ETH_ALEN]; /* source ether addr */ + unsigned short h_proto; /* packet type ID field */ + + /* p8022_hdr */ u8 dsap; u8 ssap; u8 ctrl; - /* SNAP */ u8 oui[3]; + u16 ethertype; } __attribute__ ((packed)); -/* 802.2 LLC/SNAP header used for Ethernet encapsulation over 802.11 */ -u8 encaps_hdr[] = {0xaa, 0xaa, 0x03, 0x00, 0x00, 0x00}; - -#define ENCAPS_OVERHEAD (sizeof(encaps_hdr) + 2) - struct hermes_rx_descriptor { u16 status; u32 time; @@ -958,26 +979,55 @@ struct orinoco_private *priv = netdev_priv(dev); struct net_device_stats *stats = &priv->stats; u16 fid = hermes_read_regn(hw, TXCOMPLFID); - struct hermes_tx_descriptor desc; + struct hermes_tx_descriptor_802_11 hdr; int err = 0; if (fid == DUMMY_FID) return; /* Nothing's really happened */ - err = hermes_bap_pread(hw, IRQ_BAP, &desc, sizeof(desc), fid, 0); + /* Read the frame header */ + err = hermes_bap_pread(hw, IRQ_BAP, &hdr, + sizeof(struct hermes_tx_descriptor) + + sizeof(struct ieee80211_hdr), + fid, 0); + + hermes_write_regn(hw, TXCOMPLFID, DUMMY_FID); + stats->tx_errors++; + if (err) { printk(KERN_WARNING "%s: Unable to read descriptor on Tx error " "(FID=%04X error %d)\n", dev->name, fid, err); - } else { - DEBUG(1, "%s: Tx error, status %d\n", - dev->name, le16_to_cpu(desc.status)); + return; } - stats->tx_errors++; + DEBUG(1, "%s: Tx error, err %d (FID=%04X)\n", dev->name, + err, fid); + + /* We produce a TXDROP event only for retry or lifetime + * exceeded, because that's the only status that really mean + * that this particular node went away. + * Other errors means that *we* screwed up. - Jean II */ + hdr.status = le16_to_cpu(hdr.status); + if (hdr.status & (HERMES_TXSTAT_RETRYERR | HERMES_TXSTAT_AGEDERR)) { + union iwreq_data wrqu; + + /* Copy 802.11 dest address. + * We use the 802.11 header because the frame may + * not be 802.3 or may be mangled... + * In Ad-Hoc mode, it will be the node address. + * In managed mode, it will be most likely the AP addr + * User space will figure out how to convert it to + * whatever it needs (IP address or else). + * - Jean II */ + memcpy(wrqu.addr.sa_data, hdr.addr1, ETH_ALEN); + wrqu.addr.sa_family = ARPHRD_ETHER; + + /* Send event to user space */ + wireless_send_event(dev, IWEVTXDROP, &wrqu, NULL); + } netif_wake_queue(dev); - hermes_write_regn(hw, TXCOMPLFID, DUMMY_FID); } static void orinoco_tx_timeout(struct net_device *dev) @@ -1316,6 +1366,30 @@ orinoco_unlock(priv, &flags); } +/* Send new BSSID to userspace */ +static void orinoco_send_wevents(struct net_device *dev) +{ + struct orinoco_private *priv = netdev_priv(dev); + struct hermes *hw = &priv->hw; + union iwreq_data wrqu; + int err; + unsigned long flags; + + if (orinoco_lock(priv, &flags) != 0) + return; + + err = hermes_read_ltv(hw, IRQ_BAP, HERMES_RID_CURRENTBSSID, + ETH_ALEN, NULL, wrqu.ap_addr.sa_data); + if (err != 0) + return; + + wrqu.ap_addr.sa_family = ARPHRD_ETHER; + + /* Send event to user space */ + wireless_send_event(dev, SIOCGIWAP, &wrqu, NULL); + orinoco_unlock(priv, &flags); +} + static void __orinoco_ev_info(struct net_device *dev, hermes_t *hw) { struct orinoco_private *priv = netdev_priv(dev); @@ -1395,6 +1469,15 @@ break; newstatus = le16_to_cpu(linkstatus.linkstatus); + /* Symbol firmware uses "out of range" to signal that + * the hostscan frame can be requested. */ + if (newstatus == HERMES_LINKSTATUS_AP_OUT_OF_RANGE && + priv->firmware_type == FIRMWARE_TYPE_SYMBOL && + priv->has_hostscan && priv->scan_inprogress) { + hermes_inquire(hw, HERMES_INQ_HOSTSCAN_SYMBOL); + break; + } + connected = (newstatus == HERMES_LINKSTATUS_CONNECTED) || (newstatus == HERMES_LINKSTATUS_AP_CHANGE) || (newstatus == HERMES_LINKSTATUS_AP_IN_RANGE); @@ -1404,12 +1487,89 @@ else if (!ignore_disconnect) netif_carrier_off(dev); - if (newstatus != priv->last_linkstatus) + if (newstatus != priv->last_linkstatus) { + priv->last_linkstatus = newstatus; print_linkstatus(dev, newstatus); + /* The info frame contains only one word which is the + * status (see hermes.h). The status is pretty boring + * in itself, that's why we export the new BSSID... + * Jean II */ + schedule_work(&priv->wevent_work); + } + } + break; + case HERMES_INQ_SCAN: + if (!priv->scan_inprogress && priv->bssid_fixed && + priv->firmware_type == FIRMWARE_TYPE_INTERSIL) { + schedule_work(&priv->join_work); + break; + } + /* fall through */ + case HERMES_INQ_HOSTSCAN: + case HERMES_INQ_HOSTSCAN_SYMBOL: { + /* Result of a scanning. Contains information about + * cells in the vicinity - Jean II */ + union iwreq_data wrqu; + unsigned char *buf; + + /* Sanity check */ + if (len > 4096) { + printk(KERN_WARNING "%s: Scan results too large (%d bytes)\n", + dev->name, len); + break; + } + + /* We are a strict producer. If the previous scan results + * have not been consumed, we just have to drop this + * frame. We can't remove the previous results ourselves, + * that would be *very* racy... Jean II */ + if (priv->scan_result != NULL) { + printk(KERN_WARNING "%s: Previous scan results not consumed, dropping info frame.\n", dev->name); + break; + } + + /* Allocate buffer for results */ + buf = kmalloc(len, GFP_ATOMIC); + if (buf == NULL) + /* No memory, so can't printk()... */ + break; + + /* Read scan data */ + err = hermes_bap_pread(hw, IRQ_BAP, (void *) buf, len, + infofid, sizeof(info)); + if (err) + break; - priv->last_linkstatus = newstatus; +#ifdef ORINOCO_DEBUG + { + int i; + printk(KERN_DEBUG "Scan result [%02X", buf[0]); + for(i = 1; i < (len * 2); i++) + printk(":%02X", buf[i]); + printk("]\n"); + } +#endif /* ORINOCO_DEBUG */ + + /* Allow the clients to access the results */ + priv->scan_len = len; + priv->scan_result = buf; + + /* Send an empty event to user space. + * We don't send the received data on the event because + * it would require us to do complex transcoding, and + * we want to minimise the work done in the irq handler + * Use a request to extract the data - Jean II */ + wrqu.data.length = 0; + wrqu.data.flags = 0; + wireless_send_event(dev, SIOCGIWSCAN, &wrqu, NULL); } break; + case HERMES_INQ_SEC_STAT_AGERE: + /* Security status (Agere specific) */ + /* Ignore this frame for now */ + if (priv->firmware_type == FIRMWARE_TYPE_AGERE) + break; + /* fall through */ default: printk(KERN_DEBUG "%s: Unknown information frame received: " "type 0x%04x, length %d\n", dev->name, type, len); @@ -2010,6 +2170,11 @@ orinoco_unlock(priv, &flags); + /* Scanning support: Cleanup of driver struct */ + kfree(priv->scan_result); + priv->scan_result = NULL; + priv->scan_inprogress = 0; + if (priv->hard_reset) { err = (*priv->hard_reset)(priv); if (err) { @@ -2248,6 +2413,7 @@ priv->has_mwo = (firmver >= 0x60000); priv->has_pm = (firmver >= 0x40020); /* Don't work in 7.52 ? */ priv->ibss_port = 1; + priv->has_hostscan = (firmver >= 0x8000a); /* Tested with Agere firmware : * 1.16 ; 4.08 ; 4.52 ; 6.04 ; 6.16 ; 7.28 => Jean II @@ -2293,6 +2459,8 @@ priv->ibss_port = 4; priv->broken_disableport = (firmver == 0x25013) || (firmver >= 0x30000 && firmver <= 0x31000); + priv->has_hostscan = (firmver >= 0x31001) || + (firmver >= 0x29057 && firmver < 0x30000); /* Tested with Intel firmware : 0x20015 => Jean II */ /* Tested with 3Com firmware : 0x15012 & 0x22001 => Jean II */ break; @@ -2312,6 +2480,7 @@ priv->has_ibss = (firmver >= 0x000700); /* FIXME */ priv->has_big_wep = priv->has_wep = (firmver >= 0x000800); priv->has_pm = (firmver >= 0x000700); + priv->has_hostscan = (firmver >= 0x010301); if (firmver >= 0x000800) priv->ibss_port = 0; @@ -2539,6 +2708,7 @@ * hardware */ INIT_WORK(&priv->reset_work, (void (*)(void *))orinoco_reset, dev); INIT_WORK(&priv->join_work, (void (*)(void *))orinoco_join_ap, dev); + INIT_WORK(&priv->wevent_work, (void (*)(void *))orinoco_send_wevents, dev); netif_carrier_off(dev); priv->last_linkstatus = 0xffff; @@ -2549,6 +2719,9 @@ void free_orinocodev(struct net_device *dev) { + struct orinoco_private *priv = netdev_priv(dev); + + kfree(priv->scan_result); free_netdev(dev); } @@ -3967,6 +4140,332 @@ return 0; } +/* Trigger a scan (look for other cells in the vicinity */ +static int orinoco_ioctl_setscan(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *srq, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + hermes_t *hw = &priv->hw; + int err = 0; + unsigned long flags; + + /* Note : you may have realised that, as this is a SET operation, + * this is priviledged and therefore a normal user can't + * perform scanning. + * This is not an error, while the device perform scanning, + * traffic doesn't flow, so it's a perfect DoS... + * Jean II */ + + if (orinoco_lock(priv, &flags) != 0) + return -EBUSY; + + /* Scanning with port 0 disabled would fail */ + if (!netif_running(dev)) { + err = -ENETDOWN; + goto out; + } + + /* In monitor mode, the scan results are always empty. + * Probe responses are passed to the driver as received + * frames and could be processed in software. */ + if (priv->iw_mode == IW_MODE_MONITOR) { + err = -EOPNOTSUPP; + goto out; + } + + /* Note : because we don't lock out the irq handler, the way + * we access scan variables in priv is critical. + * o scan_inprogress : not touched by irq handler + * o scan_mode : not touched by irq handler + * o scan_result : irq is strict producer, non-irq is strict + * consumer. + * o scan_len : synchronised with scan_result + * Before modifying anything on those variables, please think hard ! + * Jean II */ + + /* If there is still some left-over scan results, get rid of it */ + if (priv->scan_result != NULL) { + /* What's likely is that a client did crash or was killed + * between triggering the scan request and reading the + * results, so we need to reset everything. + * Some clients that are too slow may suffer from that... + * Jean II */ + kfree(priv->scan_result); + priv->scan_result = NULL; + } + + /* Save flags */ + priv->scan_mode = srq->flags; + + /* Always trigger scanning, even if it's in progress. + * This way, if the info frame get lost, we will recover somewhat + * gracefully - Jean II */ + + if (priv->has_hostscan) { + switch (priv->firmware_type) { + case FIRMWARE_TYPE_SYMBOL: + err = hermes_write_wordrec(hw, USER_BAP, + HERMES_RID_CNFHOSTSCAN_SYMBOL, + HERMES_HOSTSCAN_SYMBOL_ONCE | + HERMES_HOSTSCAN_SYMBOL_BCAST); + break; + case FIRMWARE_TYPE_INTERSIL: { + u16 req[3]; + + req[0] = cpu_to_le16(0x3fff); /* All channels */ + req[1] = cpu_to_le16(0x0001); /* rate 1 Mbps */ + req[2] = 0; /* Any ESSID */ + err = HERMES_WRITE_RECORD(hw, USER_BAP, + HERMES_RID_CNFHOSTSCAN, &req); + } + break; + case FIRMWARE_TYPE_AGERE: + err = hermes_write_wordrec(hw, USER_BAP, + HERMES_RID_CNFSCANSSID_AGERE, + 0); /* Any ESSID */ + if (err) + break; + + err = hermes_inquire(hw, HERMES_INQ_SCAN); + break; + } + } else + err = hermes_inquire(hw, HERMES_INQ_SCAN); + + /* One more client */ + if (! err) + priv->scan_inprogress = 1; + + out: + orinoco_unlock(priv, &flags); + return err; +} + +/* Translate scan data returned from the card to a card independant + * format that the Wireless Tools will understand - Jean II */ +static inline int orinoco_translate_scan(struct net_device *dev, + char *buffer, + char *scan, + int scan_len) +{ + struct orinoco_private *priv = netdev_priv(dev); + int offset; /* In the scan data */ + union hermes_scan_info *atom; + int atom_len; + u16 capabilities; + u16 channel; + struct iw_event iwe; /* Temporary buffer */ + char * current_ev = buffer; + char * end_buf = buffer + IW_SCAN_MAX_DATA; + + switch (priv->firmware_type) { + case FIRMWARE_TYPE_AGERE: + atom_len = sizeof(struct agere_scan_apinfo); + offset = 0; + break; + case FIRMWARE_TYPE_SYMBOL: + /* Lack of documentation necessitates this hack. + * Different firmwares have 68 or 76 byte long atoms. + * We try modulo first. If the length divides by both, + * we check what would be the channel in the second + * frame for a 68-byte atom. 76-byte atoms have 0 there. + * Valid channel cannot be 0. */ + if (scan_len % 76) + atom_len = 68; + else if (scan_len % 68) + atom_len = 76; + else if (scan_len >= 1292 && scan[68] == 0) + atom_len = 76; + else + atom_len = 68; + offset = 0; + break; + case FIRMWARE_TYPE_INTERSIL: + offset = 4; + if (priv->has_hostscan) + atom_len = scan[0] + (scan[1] << 8); + else + atom_len = offsetof(struct prism2_scan_apinfo, atim); + break; + default: + return 0; + } + + /* Check that we got an whole number of atoms */ + if ((scan_len - offset) % atom_len) { + printk(KERN_ERR "%s: Unexpected scan data length %d, " + "atom_len %d, offset %d\n", dev->name, scan_len, + atom_len, offset); + return 0; + } + + /* Read the entries one by one */ + for (; offset + atom_len <= scan_len; offset += atom_len) { + /* Get next atom */ + atom = (union hermes_scan_info *) (scan + offset); + + /* First entry *MUST* be the AP MAC address */ + iwe.cmd = SIOCGIWAP; + iwe.u.ap_addr.sa_family = ARPHRD_ETHER; + memcpy(iwe.u.ap_addr.sa_data, atom->a.bssid, ETH_ALEN); + current_ev = iwe_stream_add_event(current_ev, end_buf, &iwe, IW_EV_ADDR_LEN); + + /* Other entries will be displayed in the order we give them */ + + /* Add the ESSID */ + iwe.u.data.length = le16_to_cpu(atom->a.essid_len); + if (iwe.u.data.length > 32) + iwe.u.data.length = 32; + iwe.cmd = SIOCGIWESSID; + iwe.u.data.flags = 1; + current_ev = iwe_stream_add_point(current_ev, end_buf, &iwe, atom->a.essid); + + /* Add mode */ + iwe.cmd = SIOCGIWMODE; + capabilities = le16_to_cpu(atom->a.capabilities); + if (capabilities & 0x3) { + if (capabilities & 0x1) + iwe.u.mode = IW_MODE_MASTER; + else + iwe.u.mode = IW_MODE_ADHOC; + current_ev = iwe_stream_add_event(current_ev, end_buf, &iwe, IW_EV_UINT_LEN); + } + + channel = atom->s.channel; + if ( (channel >= 1) && (channel <= NUM_CHANNELS) ) { + /* Add frequency */ + iwe.cmd = SIOCGIWFREQ; + iwe.u.freq.m = channel_frequency[channel-1] * 100000; + iwe.u.freq.e = 1; + current_ev = iwe_stream_add_event(current_ev, end_buf, + &iwe, IW_EV_FREQ_LEN); + } + + /* Add quality statistics */ + iwe.cmd = IWEVQUAL; + iwe.u.qual.updated = 0x10; /* no link quality */ + iwe.u.qual.level = (__u8) le16_to_cpu(atom->a.level) - 0x95; + iwe.u.qual.noise = (__u8) le16_to_cpu(atom->a.noise) - 0x95; + /* Wireless tools prior to 27.pre22 will show link quality + * anyway, so we provide a reasonable value. */ + if (iwe.u.qual.level > iwe.u.qual.noise) + iwe.u.qual.qual = iwe.u.qual.level - iwe.u.qual.noise; + else + iwe.u.qual.qual = 0; + current_ev = iwe_stream_add_event(current_ev, end_buf, &iwe, IW_EV_QUAL_LEN); + + /* Add encryption capability */ + iwe.cmd = SIOCGIWENCODE; + if (capabilities & 0x10) + iwe.u.data.flags = IW_ENCODE_ENABLED | IW_ENCODE_NOKEY; + else + iwe.u.data.flags = IW_ENCODE_DISABLED; + iwe.u.data.length = 0; + current_ev = iwe_stream_add_point(current_ev, end_buf, &iwe, atom->a.essid); + + /* Bit rate is not available in Lucent/Agere firmwares */ + if (priv->firmware_type != FIRMWARE_TYPE_AGERE) { + char * current_val = current_ev + IW_EV_LCP_LEN; + int i; + int step; + + if (priv->firmware_type == FIRMWARE_TYPE_SYMBOL) + step = 2; + else + step = 1; + + iwe.cmd = SIOCGIWRATE; + /* Those two flags are ignored... */ + iwe.u.bitrate.fixed = iwe.u.bitrate.disabled = 0; + /* Max 10 values */ + for (i = 0; i < 10; i += step) { + /* NULL terminated */ + if (atom->p.rates[i] == 0x0) + break; + /* Bit rate given in 500 kb/s units (+ 0x80) */ + iwe.u.bitrate.value = ((atom->p.rates[i] & 0x7f) * 500000); + current_val = iwe_stream_add_value(current_ev, current_val, + end_buf, &iwe, + IW_EV_PARAM_LEN); + } + /* Check if we added any event */ + if ((current_val - current_ev) > IW_EV_LCP_LEN) + current_ev = current_val; + } + + /* The other data in the scan result are not really + * interesting, so for now drop it - Jean II */ + } + return current_ev - buffer; +} + +/* Return results of a scan */ +static int orinoco_ioctl_getscan(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *srq, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + int err = 0; + unsigned long flags; + + if (orinoco_lock(priv, &flags) != 0) + return -EBUSY; + + /* If no results yet, ask to try again later */ + if (priv->scan_result == NULL) { + if (priv->scan_inprogress) + /* Important note : we don't want to block the caller + * until results are ready for various reasons. + * First, managing wait queues is complex and racy. + * Second, we grab some rtnetlink lock before comming + * here (in dev_ioctl()). + * Third, we generate an Wireless Event, so the + * caller can wait itself on that - Jean II */ + err = -EAGAIN; + else + /* Client error, no scan results... + * The caller need to restart the scan. */ + err = -ENODATA; + } else { + /* We have some results to push back to user space */ + + /* Translate to WE format */ + srq->length = orinoco_translate_scan(dev, extra, + priv->scan_result, + priv->scan_len); + + /* Return flags */ + srq->flags = (__u16) priv->scan_mode; + + /* Results are here, so scan no longer in progress */ + priv->scan_inprogress = 0; + + /* In any case, Scan results will be cleaned up in the + * reset function and when exiting the driver. + * The person triggering the scanning may never come to + * pick the results, so we need to do it in those places. + * Jean II */ + +#ifdef SCAN_SINGLE_READ + /* If you enable this option, only one client (the first + * one) will be able to read the result (and only one + * time). If there is multiple concurent clients that + * want to read scan results, this behavior is not + * advisable - Jean II */ + kfree(priv->scan_result); + priv->scan_result = NULL; +#endif /* SCAN_SINGLE_READ */ + /* Here, if too much time has elapsed since last scan, + * we may want to clean up scan results... - Jean II */ + } + + orinoco_unlock(priv, &flags); + return err; +} + /* Commit handler, called after set operations */ static int orinoco_ioctl_commit(struct net_device *dev, struct iw_request_info *info, @@ -4060,6 +4559,8 @@ [SIOCGIWSPY -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getspy, [SIOCSIWAP -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setwap, [SIOCGIWAP -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getwap, + [SIOCSIWSCAN -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setscan, + [SIOCGIWSCAN -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getscan, [SIOCSIWESSID -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setessid, [SIOCGIWESSID -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getessid, [SIOCSIWNICKN -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setnick, Index: linux-2.6/drivers/net/wireless/orinoco.h =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.h 2005-06-19 01:03:47.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.h 2005-06-19 01:03:51.000000000 +0200 @@ -32,6 +32,20 @@ char data[ORINOCO_MAX_KEY_SIZE]; } __attribute__ ((packed)); +struct header_struct { + /* 802.3 */ + u8 dest[ETH_ALEN]; + u8 src[ETH_ALEN]; + u16 len; + /* 802.2 */ + u8 dsap; + u8 ssap; + u8 ctrl; + /* SNAP */ + u8 oui[3]; + u16 ethertype; +} __attribute__ ((packed)); + typedef enum { FIRMWARE_TYPE_AGERE, FIRMWARE_TYPE_INTERSIL, @@ -51,6 +65,7 @@ int open; u16 last_linkstatus; struct work_struct join_work; + struct work_struct wevent_work; /* Net device stuff */ struct net_device *ndev; @@ -77,6 +92,7 @@ unsigned int has_pm:1; unsigned int has_preamble:1; unsigned int has_sensitivity:1; + unsigned int has_hostscan:1; unsigned int broken_disableport:1; /* Configuration paramaters */ @@ -103,6 +119,12 @@ /* Configuration dependent variables */ int port_type, createibss; int promiscuous, mc_count; + + /* Scanning support */ + int scan_inprogress; /* Scan pending... */ + u32 scan_mode; /* Type of scan done */ + char * scan_result; /* Result of previous scan */ + int scan_len; /* Lenght of result */ }; #ifdef ORINOCO_DEBUG From hch@lst.de Sat Jun 18 16:29:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:29:08 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INT3H9022767 for ; Sat, 18 Jun 2005 16:29:04 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INRe6t010166 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:27:40 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INReTU010164; Sun, 19 Jun 2005 01:27:40 +0200 Date: Sun, 19 Jun 2005 01:27:40 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 4/9] orinoco: basic ethtool support Message-ID: <20050618232740.GE9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2463 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 2239 Lines: 59 I completely reimplemented this based on ethtool_ops, CVS has an ioctl-based version. Index: linux-2.6/drivers/net/wireless/orinoco.c =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.c 2005-06-19 01:03:42.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.c 2005-06-19 01:03:43.000000000 +0200 @@ -462,6 +462,7 @@ #include #include #include +#include #include #include #include @@ -542,6 +543,7 @@ #define MAX_RID_LEN 1024 static const struct iw_handler_def orinoco_handler_def; +static struct ethtool_ops orinoco_ethtool_ops; /********************************************************************/ /* Data tables */ @@ -2412,6 +2414,7 @@ dev->tx_timeout = orinoco_tx_timeout; dev->watchdog_timeo = HZ; /* 1 second timeout */ dev->get_stats = orinoco_get_stats; + dev->ethtool_ops = &orinoco_ethtool_ops; dev->get_wireless_stats = orinoco_get_wireless_stats; dev->wireless_handlers = (struct iw_handler_def *)&orinoco_handler_def; dev->change_mtu = orinoco_change_mtu; @@ -3930,6 +3933,27 @@ .private_args = orinoco_privtab, }; +static void orinoco_get_drvinfo(struct net_device *dev, + struct ethtool_drvinfo *info) +{ + struct orinoco_private *priv = netdev_priv(dev); + + strncpy(info->driver, DRIVER_NAME, sizeof(info->driver) - 1); + strncpy(info->version, DRIVER_VERSION, sizeof(info->version) - 1); + strncpy(info->fw_version, priv->fw_name, sizeof(info->fw_version) - 1); + if (dev->class_dev.dev) + strncpy(info->bus_info, dev->class_dev.dev->bus_id, + sizeof(info->bus_info) - 1); + else + snprintf(info->bus_info, sizeof(info->bus_info) - 1, + "PCMCIA %p", priv->hw.iobase); +} + +static struct ethtool_ops orinoco_ethtool_ops = { + .get_drvinfo = orinoco_get_drvinfo, + .get_link = ethtool_op_get_link, +}; + /********************************************************************/ /* Debugging */ /********************************************************************/ From hch@lst.de Sat Jun 18 16:29:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:29:36 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INTUH9023069 for ; Sat, 18 Jun 2005 16:29:31 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INS66t010223 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:28:06 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INS6PH010221; Sun, 19 Jun 2005 01:28:06 +0200 Date: Sun, 19 Jun 2005 01:28:06 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 8/9] orinoco: monitor mode support Message-ID: <20050618232806.GI9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2466 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 9036 Lines: 337 Patch from Pavel Roskin Index: linux-2.6/drivers/net/wireless/orinoco.c =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.c 2005-06-19 01:04:28.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.c 2005-06-19 01:14:43.000000000 +0200 @@ -499,6 +499,10 @@ module_param(ignore_disconnect, int, 0644); MODULE_PARM_DESC(ignore_disconnect, "Don't report lost link to the network layer"); +static int force_monitor; /* = 0 */ +module_param(force_monitor, int, 0644); +MODULE_PARM_DESC(force_monitor, "Allow monitor mode for all firmware versions"); + /********************************************************************/ /* Compile time configuration and compatibility stuff */ /********************************************************************/ @@ -670,6 +674,10 @@ priv->createibss = 1; } break; + case IW_MODE_MONITOR: + priv->port_type = 3; + priv->createibss = 0; + break; default: printk(KERN_ERR "%s: Invalid priv->iw_mode in set_port_type()\n", priv->ndev->name); @@ -856,7 +864,7 @@ return 1; } - if (! netif_carrier_ok(dev)) { + if (! netif_carrier_ok(dev) || (priv->iw_mode == IW_MODE_MONITOR)) { /* Oops, the firmware hasn't established a connection, silently drop the packet (this seems to be the safest approach). */ @@ -1118,6 +1126,117 @@ } } +/* + * orinoco_rx_monitor - handle received monitor frames. + * + * Arguments: + * dev network device + * rxfid received FID + * desc rx descriptor of the frame + * + * Call context: interrupt + */ +static void orinoco_rx_monitor(struct net_device *dev, u16 rxfid, + struct hermes_rx_descriptor *desc) +{ + u32 hdrlen = 30; /* return full header by default */ + u32 datalen = 0; + u16 fc; + int err; + int len; + struct sk_buff *skb; + struct orinoco_private *priv = netdev_priv(dev); + struct net_device_stats *stats = &priv->stats; + hermes_t *hw = &priv->hw; + + len = le16_to_cpu(desc->data_len); + + /* Determine the size of the header and the data */ + fc = le16_to_cpu(desc->frame_ctl); + switch (fc & IEEE80211_FCTL_FTYPE) { + case IEEE80211_FTYPE_DATA: + if ((fc & IEEE80211_FCTL_TODS) + && (fc & IEEE80211_FCTL_FROMDS)) + hdrlen = 30; + else + hdrlen = 24; + datalen = len; + break; + case IEEE80211_FTYPE_MGMT: + hdrlen = 24; + datalen = len; + break; + case IEEE80211_FTYPE_CTL: + switch (fc & IEEE80211_FCTL_STYPE) { + case IEEE80211_STYPE_PSPOLL: + case IEEE80211_STYPE_RTS: + case IEEE80211_STYPE_CFEND: + case IEEE80211_STYPE_CFENDACK: + hdrlen = 16; + break; + case IEEE80211_STYPE_CTS: + case IEEE80211_STYPE_ACK: + hdrlen = 10; + break; + } + break; + default: + /* Unknown frame type */ + break; + } + + /* sanity check the length */ + if (datalen > IEEE80211_DATA_LEN + 12) { + printk(KERN_DEBUG "%s: oversized monitor frame, " + "data length = %d\n", dev->name, datalen); + err = -EIO; + stats->rx_length_errors++; + goto update_stats; + } + + skb = dev_alloc_skb(hdrlen + datalen); + if (!skb) { + printk(KERN_WARNING "%s: Cannot allocate skb for monitor frame\n", + dev->name); + err = -ENOMEM; + goto drop; + } + + /* Copy the 802.11 header to the skb */ + memcpy(skb_put(skb, hdrlen), &(desc->frame_ctl), hdrlen); + skb->mac.raw = skb->data; + + /* If any, copy the data from the card to the skb */ + if (datalen > 0) { + err = hermes_bap_pread(hw, IRQ_BAP, skb_put(skb, datalen), + ALIGN(datalen, 2), rxfid, + HERMES_802_2_OFFSET); + if (err) { + printk(KERN_ERR "%s: error %d reading monitor frame\n", + dev->name, err); + goto drop; + } + } + + skb->dev = dev; + skb->ip_summed = CHECKSUM_NONE; + skb->pkt_type = PACKET_OTHERHOST; + skb->protocol = __constant_htons(ETH_P_802_2); + + dev->last_rx = jiffies; + stats->rx_packets++; + stats->rx_bytes += skb->len; + + netif_rx(skb); + return; + + drop: + dev_kfree_skb_irq(skb); + update_stats: + stats->rx_errors++; + stats->rx_dropped++; +} + static void __orinoco_ev_rx(struct net_device *dev, hermes_t *hw) { struct orinoco_private *priv = netdev_priv(dev); @@ -1137,24 +1256,29 @@ if (err) { printk(KERN_ERR "%s: error %d reading Rx descriptor. " "Frame dropped.\n", dev->name, err); - stats->rx_errors++; - goto drop; + goto update_stats; } status = le16_to_cpu(desc.status); - if (status & HERMES_RXSTAT_ERR) { - if (status & HERMES_RXSTAT_UNDECRYPTABLE) { - wstats->discard.code++; - DEBUG(1, "%s: Undecryptable frame on Rx. Frame dropped.\n", - dev->name); - } else { - stats->rx_crc_errors++; - DEBUG(1, "%s: Bad CRC on Rx. Frame dropped.\n", dev->name); - } + if (status & HERMES_RXSTAT_BADCRC) { + DEBUG(1, "%s: Bad CRC on Rx. Frame dropped.\n", + dev->name); + stats->rx_crc_errors++; + goto update_stats; + } - stats->rx_errors++; - goto drop; + /* Handle frames in monitor mode */ + if (priv->iw_mode == IW_MODE_MONITOR) { + orinoco_rx_monitor(dev, rxfid, &desc); + return; + } + + if (status & HERMES_RXSTAT_UNDECRYPTABLE) { + DEBUG(1, "%s: Undecryptable frame on Rx. Frame dropped.\n", + dev->name); + wstats->discard.code++; + goto update_stats; } length = le16_to_cpu(desc.data_len); @@ -1165,15 +1289,13 @@ /* At least on Symbol firmware with PCF we get quite a lot of these legitimately - Poll frames with no data. */ - stats->rx_dropped++; - goto drop; + return; } if (length > IEEE802_11_DATA_LEN) { printk(KERN_WARNING "%s: Oversized frame received (%d bytes)\n", dev->name, length); stats->rx_length_errors++; - stats->rx_errors++; - goto drop; + goto update_stats; } /* We need space for the packet data itself, plus an ethernet @@ -1185,7 +1307,7 @@ if (!skb) { printk(KERN_WARNING "%s: Can't allocate skb for Rx\n", dev->name); - goto drop; + goto update_stats; } /* We'll prepend the header, so reserve space for it. The worst @@ -1199,7 +1321,6 @@ if (err) { printk(KERN_ERR "%s: error %d reading frame. " "Frame dropped.\n", dev->name, err); - stats->rx_errors++; goto drop; } @@ -1245,11 +1366,10 @@ return; drop: + dev_kfree_skb_irq(skb); + update_stats: + stats->rx_errors++; stats->rx_dropped++; - - if (skb) - dev_kfree_skb_irq(skb); - return; } /********************************************************************/ @@ -2065,6 +2185,20 @@ } } + if (priv->iw_mode == IW_MODE_MONITOR) { + /* Enable monitor mode */ + dev->type = ARPHRD_IEEE80211; + err = hermes_docmd_wait(hw, HERMES_CMD_TEST | + HERMES_TEST_MONITOR, 0, NULL); + } else { + /* Disable monitor mode */ + dev->type = ARPHRD_ETHER; + err = hermes_docmd_wait(hw, HERMES_CMD_TEST | + HERMES_TEST_STOP, 0, NULL); + } + if (err) + return err; + /* Set promiscuity / multicast*/ priv->promiscuous = 0; priv->mc_count = 0; @@ -2413,6 +2547,7 @@ priv->has_pm = (firmver >= 0x40020); /* Don't work in 7.52 ? */ priv->ibss_port = 1; priv->has_hostscan = (firmver >= 0x8000a); + priv->broken_monitor = (firmver >= 0x80000); /* Tested with Agere firmware : * 1.16 ; 4.08 ; 4.52 ; 6.04 ; 6.16 ; 7.28 => Jean II @@ -2980,6 +3115,15 @@ case IW_MODE_INFRA: break; + case IW_MODE_MONITOR: + if (priv->broken_monitor && !force_monitor) { + printk(KERN_WARNING "%s: Monitor mode support is " + "buggy in this firmware, not enabling\n", + dev->name); + err = -EOPNOTSUPP; + } + break; + default: err = -EOPNOTSUPP; break; @@ -3355,11 +3499,9 @@ unsigned long flags; int err = -EINPROGRESS; /* Call commit handler */ - /* We can only use this in Ad-Hoc demo mode to set the operating - * frequency, or in IBSS mode to set the frequency where the IBSS - * will be created - Jean II */ - if (priv->iw_mode != IW_MODE_ADHOC) - return -EOPNOTSUPP; + /* In infrastructure mode the AP sets the channel */ + if (priv->iw_mode == IW_MODE_INFRA) + return -EBUSY; if ( (frq->e == 0) && (frq->m <= 1000) ) { /* Setting by channel number */ @@ -3383,7 +3525,15 @@ if (orinoco_lock(priv, &flags) != 0) return -EBUSY; + priv->channel = chan; + if (priv->iw_mode == IW_MODE_MONITOR) { + /* Fast channel change - no commit if successful */ + hermes_t *hw = &priv->hw; + err = hermes_docmd_wait(hw, HERMES_CMD_TEST | + HERMES_TEST_SET_CHANNEL, + chan, NULL); + } orinoco_unlock(priv, &flags); return err; Index: linux-2.6/drivers/net/wireless/orinoco.h =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.h 2005-06-19 01:03:51.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.h 2005-06-19 01:04:43.000000000 +0200 @@ -94,6 +94,7 @@ unsigned int has_sensitivity:1; unsigned int has_hostscan:1; unsigned int broken_disableport:1; + unsigned int broken_monitor:1; /* Configuration paramaters */ u32 iw_mode; From hch@lst.de Sat Jun 18 16:28:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:28:56 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INSqH9022687 for ; Sat, 18 Jun 2005 16:28:52 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INRS6t010142 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:27:28 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INRS4p010140; Sun, 19 Jun 2005 01:27:28 +0200 Date: Sun, 19 Jun 2005 01:27:28 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 2/9] orinoco: include Message-ID: <20050618232728.GC9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2461 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 526 Lines: 15 We need constants from this header in the next patches. Index: linux-2.6/drivers/net/wireless/orinoco.c =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.c 2005-06-19 00:59:34.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.c 2005-06-19 01:02:24.000000000 +0200 @@ -463,6 +463,7 @@ #include #include #include +#include #include #include From hch@lst.de Sat Jun 18 16:29:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:29:20 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INTFH9022973 for ; Sat, 18 Jun 2005 16:29:16 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INRp6t010183 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:27:52 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INRppE010180; Sun, 19 Jun 2005 01:27:51 +0200 Date: Sun, 19 Jun 2005 01:27:51 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 5/9] orinoco: manual roaming for Symbol and Intersilfirmware Message-ID: <20050618232751.GF9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2464 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 7129 Lines: 247 Patch from Pavel Roskin Index: linux-2.6/drivers/net/wireless/orinoco.h =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.h 2005-06-19 01:03:36.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.h 2005-06-19 01:03:47.000000000 +0200 @@ -22,6 +22,8 @@ #define WIRELESS_SPY // enable iwspy support +#define MAX_SCAN_LEN 4096 + #define ORINOCO_MAX_KEY_SIZE 14 #define ORINOCO_MAX_KEYS 4 @@ -48,6 +50,7 @@ /* driver state */ int open; u16 last_linkstatus; + struct work_struct join_work; /* Net device stuff */ struct net_device *ndev; @@ -84,6 +87,8 @@ int bitratemode; char nick[IW_ESSID_MAX_SIZE+1]; char desired_essid[IW_ESSID_MAX_SIZE+1]; + char desired_bssid[ETH_ALEN]; + int bssid_fixed; u16 frag_thresh, mwo_robust; u16 channel; u16 ap_density, rts_thresh; Index: linux-2.6/drivers/net/wireless/orinoco.c =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.c 2005-06-19 01:03:43.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.c 2005-06-19 01:03:47.000000000 +0200 @@ -1247,6 +1247,75 @@ dev->name, s, status); } +/* Search scan results for requested BSSID, join it if found */ +static void orinoco_join_ap(struct net_device *dev) +{ + struct orinoco_private *priv = netdev_priv(dev); + struct hermes *hw = &priv->hw; + int err; + unsigned long flags; + struct join_req { + u8 bssid[ETH_ALEN]; + u16 channel; + } __attribute__ ((packed)) req; + const int atom_len = offsetof(struct prism2_scan_apinfo, atim); + struct prism2_scan_apinfo *atom; + int offset = 4; + u8 *buf; + u16 len; + + /* Allocate buffer for scan results */ + buf = kmalloc(MAX_SCAN_LEN, GFP_KERNEL); + if (! buf) + return; + + if (orinoco_lock(priv, &flags) != 0) + goto out; + + /* Sanity checks in case user changed something in the meantime */ + if (! priv->bssid_fixed) + goto out; + + if (strlen(priv->desired_essid) == 0) + goto out; + + /* Read scan results from the firmware */ + err = hermes_read_ltv(hw, USER_BAP, + HERMES_RID_SCANRESULTSTABLE, + MAX_SCAN_LEN, &len, buf); + if (err) { + printk(KERN_ERR "%s: Cannot read scan results\n", + dev->name); + goto out; + } + + len = HERMES_RECLEN_TO_BYTES(len); + + /* Go through the scan results looking for the channel of the AP + * we were requested to join */ + for (; offset + atom_len <= len; offset += atom_len) { + atom = (struct prism2_scan_apinfo *) (buf + offset); + if (memcmp(&atom->bssid, priv->desired_bssid, ETH_ALEN) == 0) + goto found; + } + + DEBUG(1, "%s: Requested AP not found in scan results\n", + dev->name); + goto out; + + found: + memcpy(req.bssid, priv->desired_bssid, ETH_ALEN); + req.channel = atom->channel; /* both are little-endian */ + err = HERMES_WRITE_RECORD(hw, USER_BAP, HERMES_RID_CNFJOINREQUEST, + &req); + if (err) + printk(KERN_ERR "%s: Error issuing join request\n", dev->name); + + out: + kfree(buf); + orinoco_unlock(priv, &flags); +} + static void __orinoco_ev_info(struct net_device *dev, hermes_t *hw) { struct orinoco_private *priv = netdev_priv(dev); @@ -1477,6 +1546,36 @@ return err; } +/* Set fixed AP address */ +static int __orinoco_hw_set_wap(struct orinoco_private *priv) +{ + int roaming_flag; + int err = 0; + hermes_t *hw = &priv->hw; + + switch (priv->firmware_type) { + case FIRMWARE_TYPE_AGERE: + /* not supported */ + break; + case FIRMWARE_TYPE_INTERSIL: + if (priv->bssid_fixed) + roaming_flag = 2; + else + roaming_flag = 1; + + err = hermes_write_wordrec(hw, USER_BAP, + HERMES_RID_CNFROAMINGMODE, + roaming_flag); + break; + case FIRMWARE_TYPE_SYMBOL: + err = HERMES_WRITE_RECORD(hw, USER_BAP, + HERMES_RID_CNFMANDATORYBSSID_SYMBOL, + &priv->desired_bssid); + break; + } + return err; +} + /* Change the WEP keys and/or the current keys. Can be called * either from __orinoco_hw_setup_wep() or directly from * orinoco_ioctl_setiwencode(). In the later case the association @@ -1662,6 +1761,13 @@ } } + /* Set the desired BSSID */ + err = __orinoco_hw_set_wap(priv); + if (err) { + printk(KERN_ERR "%s: Error %d setting AP address\n", + dev->name, err); + return err; + } /* Set the desired ESSID */ idbuf.len = cpu_to_le16(strlen(priv->desired_essid)); memcpy(&idbuf.val, priv->desired_essid, sizeof(idbuf.val)); @@ -2432,6 +2538,7 @@ * before anything else touches the * hardware */ INIT_WORK(&priv->reset_work, (void (*)(void *))orinoco_reset, dev); + INIT_WORK(&priv->join_work, (void (*)(void *))orinoco_join_ap, dev); netif_carrier_off(dev); priv->last_linkstatus = 0xffff; @@ -2593,6 +2700,67 @@ return 0; } +static int orinoco_ioctl_setwap(struct net_device *dev, + struct iw_request_info *info, + struct sockaddr *ap_addr, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + int err = -EINPROGRESS; /* Call commit handler */ + unsigned long flags; + static const u8 off_addr[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; + static const u8 any_addr[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; + + if (orinoco_lock(priv, &flags) != 0) + return -EBUSY; + + /* Enable automatic roaming - no sanity checks are needed */ + if (memcmp(&ap_addr->sa_data, off_addr, ETH_ALEN) == 0 || + memcmp(&ap_addr->sa_data, any_addr, ETH_ALEN) == 0) { + priv->bssid_fixed = 0; + memset(priv->desired_bssid, 0, ETH_ALEN); + + /* "off" means keep existing connection */ + if (ap_addr->sa_data[0] == 0) { + __orinoco_hw_set_wap(priv); + err = 0; + } + goto out; + } + + if (priv->firmware_type == FIRMWARE_TYPE_AGERE) { + printk(KERN_WARNING "%s: Lucent/Agere firmware doesn't " + "support manual roaming\n", + dev->name); + err = -EOPNOTSUPP; + goto out; + } + + if (priv->iw_mode != IW_MODE_INFRA) { + printk(KERN_WARNING "%s: Manual roaming supported only in " + "managed mode\n", dev->name); + err = -EOPNOTSUPP; + goto out; + } + + /* Intersil firmware hangs without Desired ESSID */ + if (priv->firmware_type == FIRMWARE_TYPE_INTERSIL && + strlen(priv->desired_essid) == 0) { + printk(KERN_WARNING "%s: Desired ESSID must be set for " + "manual roaming\n", dev->name); + err = -EOPNOTSUPP; + goto out; + } + + /* Finally, enable manual roaming */ + priv->bssid_fixed = 1; + memcpy(priv->desired_bssid, &ap_addr->sa_data, ETH_ALEN); + + out: + orinoco_unlock(priv, &flags); + return err; +} + static int orinoco_ioctl_getwap(struct net_device *dev, struct iw_request_info *info, struct sockaddr *ap_addr, @@ -3890,6 +4058,7 @@ [SIOCGIWRANGE -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getiwrange, [SIOCSIWSPY -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setspy, [SIOCGIWSPY -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getspy, + [SIOCSIWAP -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setwap, [SIOCGIWAP -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getwap, [SIOCSIWESSID -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setessid, [SIOCGIWESSID -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getessid, From hch@lst.de Sat Jun 18 16:29:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:29:06 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INSxH9022734 for ; Sat, 18 Jun 2005 16:29:00 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INRX6t010154 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:27:33 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INRX23010152; Sun, 19 Jun 2005 01:27:33 +0200 Date: Sun, 19 Jun 2005 01:27:33 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 3/9] orinoco: wireless API 15 support Message-ID: <20050618232733.GD9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2462 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 41192 Lines: 1526 (patch from Moustafa Youssef, updated by Jim Carter and Pavel Roskin). Index: linux-2.6/drivers/net/wireless/orinoco.c =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.c 2005-06-19 01:02:24.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.c 2005-06-19 01:03:24.000000000 +0200 @@ -463,6 +463,7 @@ #include #include #include +#include #include #include @@ -538,6 +539,10 @@ | HERMES_EV_WTERR | HERMES_EV_INFO \ | HERMES_EV_INFDROP ) +#define MAX_RID_LEN 1024 + +static const struct iw_handler_def orinoco_handler_def; + /********************************************************************/ /* Data tables */ /********************************************************************/ @@ -605,7 +610,6 @@ /* Function prototypes */ /********************************************************************/ -static int orinoco_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); static int __orinoco_program_rids(struct net_device *dev); static void __orinoco_set_multicast_list(struct net_device *dev); @@ -1870,55 +1874,6 @@ dev->flags &= ~IFF_PROMISC; } -static int orinoco_reconfigure(struct net_device *dev) -{ - struct orinoco_private *priv = netdev_priv(dev); - struct hermes *hw = &priv->hw; - unsigned long flags; - int err = 0; - - if (priv->broken_disableport) { - schedule_work(&priv->reset_work); - return 0; - } - - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; - - err = hermes_disable_port(hw, 0); - if (err) { - printk(KERN_WARNING "%s: Unable to disable port while reconfiguring card\n", - dev->name); - priv->broken_disableport = 1; - goto out; - } - - err = __orinoco_program_rids(dev); - if (err) { - printk(KERN_WARNING "%s: Unable to reconfigure card\n", - dev->name); - goto out; - } - - err = hermes_enable_port(hw, 0); - if (err) { - printk(KERN_WARNING "%s: Unable to enable port while reconfiguring card\n", - dev->name); - goto out; - } - - out: - if (err) { - printk(KERN_WARNING "%s: Resetting instead...\n", dev->name); - schedule_work(&priv->reset_work); - err = 0; - } - - orinoco_unlock(priv, &flags); - return err; - -} - /* This must be called from user context, without locks held - use * schedule_work() */ static void orinoco_reset(struct net_device *dev) @@ -2458,7 +2413,7 @@ dev->watchdog_timeo = HZ; /* 1 second timeout */ dev->get_stats = orinoco_get_stats; dev->get_wireless_stats = orinoco_get_wireless_stats; - dev->do_ioctl = orinoco_ioctl; + dev->wireless_handlers = (struct iw_handler_def *)&orinoco_handler_def; dev->change_mtu = orinoco_change_mtu; dev->set_multicast_list = orinoco_set_multicast_list; /* we use the default eth_mac_addr for setting the MAC addr */ @@ -2491,24 +2446,6 @@ /* Wireless extensions */ /********************************************************************/ -static int orinoco_hw_get_bssid(struct orinoco_private *priv, - char buf[ETH_ALEN]) -{ - hermes_t *hw = &priv->hw; - int err = 0; - unsigned long flags; - - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; - - err = hermes_read_ltv(hw, USER_BAP, HERMES_RID_CURRENTBSSID, - ETH_ALEN, NULL, buf); - - orinoco_unlock(priv, &flags); - - return err; -} - static int orinoco_hw_get_essid(struct orinoco_private *priv, int *active, char buf[IW_ESSID_MAX_SIZE+1]) { @@ -2634,140 +2571,201 @@ return 0; } -static int orinoco_ioctl_getiwrange(struct net_device *dev, struct iw_point *rrq) +static int orinoco_ioctl_getname(struct net_device *dev, + struct iw_request_info *info, + char *name, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - int err = 0; - int mode; - struct iw_range range; int numrates; - int i, k; + int err; + + err = orinoco_hw_get_bitratelist(priv, &numrates, NULL, 0); + + if (!err && (numrates > 2)) + strcpy(name, "IEEE 802.11b"); + else + strcpy(name, "IEEE 802.11-DS"); + + return 0; +} + +static int orinoco_ioctl_getwap(struct net_device *dev, + struct iw_request_info *info, + struct sockaddr *ap_addr, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + + hermes_t *hw = &priv->hw; + int err = 0; unsigned long flags; - TRACE_ENTER(dev->name); + if (orinoco_lock(priv, &flags) != 0) + return -EBUSY; - if (!access_ok(VERIFY_WRITE, rrq->pointer, sizeof(range))) - return -EFAULT; + ap_addr->sa_family = ARPHRD_ETHER; + err = hermes_read_ltv(hw, USER_BAP, HERMES_RID_CURRENTBSSID, + ETH_ALEN, NULL, ap_addr->sa_data); + + orinoco_unlock(priv, &flags); + + return err; +} + +static int orinoco_ioctl_setmode(struct net_device *dev, + struct iw_request_info *info, + u32 *mode, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + int err = -EINPROGRESS; /* Call commit handler */ + unsigned long flags; - rrq->length = sizeof(range); + if (priv->iw_mode == *mode) + return 0; if (orinoco_lock(priv, &flags) != 0) return -EBUSY; - mode = priv->iw_mode; + switch (*mode) { + case IW_MODE_ADHOC: + if (!priv->has_ibss && !priv->has_port3) + err = -EOPNOTSUPP; + break; + + case IW_MODE_INFRA: + break; + + default: + err = -EOPNOTSUPP; + break; + } + + if (err == -EINPROGRESS) { + priv->iw_mode = *mode; + set_port_type(priv); + } + orinoco_unlock(priv, &flags); - memset(&range, 0, sizeof(range)); + return err; +} + +static int orinoco_ioctl_getmode(struct net_device *dev, + struct iw_request_info *info, + u32 *mode, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + + *mode = priv->iw_mode; + return 0; +} + +static int orinoco_ioctl_getiwrange(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *rrq, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + int err = 0; + struct iw_range *range = (struct iw_range *) extra; + int numrates; + int i, k; + + TRACE_ENTER(dev->name); - /* Much of this shamelessly taken from wvlan_cs.c. No idea - * what it all means -dgibson */ - range.we_version_compiled = WIRELESS_EXT; - range.we_version_source = 11; + rrq->length = sizeof(struct iw_range); + memset(range, 0, sizeof(struct iw_range)); - range.min_nwid = range.max_nwid = 0; /* We don't use nwids */ + range->we_version_compiled = WIRELESS_EXT; + range->we_version_source = 14; /* Set available channels/frequencies */ - range.num_channels = NUM_CHANNELS; + range->num_channels = NUM_CHANNELS; k = 0; for (i = 0; i < NUM_CHANNELS; i++) { if (priv->channel_mask & (1 << i)) { - range.freq[k].i = i + 1; - range.freq[k].m = channel_frequency[i] * 100000; - range.freq[k].e = 1; + range->freq[k].i = i + 1; + range->freq[k].m = channel_frequency[i] * 100000; + range->freq[k].e = 1; k++; } if (k >= IW_MAX_FREQUENCIES) break; } - range.num_frequency = k; + range->num_frequency = k; + range->sensitivity = 3; - range.sensitivity = 3; + if (priv->has_wep) { + range->max_encoding_tokens = ORINOCO_MAX_KEYS; + range->encoding_size[0] = SMALL_KEY_SIZE; + range->num_encoding_sizes = 1; + + if (priv->has_big_wep) { + range->encoding_size[1] = LARGE_KEY_SIZE; + range->num_encoding_sizes = 2; + } + } - if ((mode == IW_MODE_ADHOC) && (priv->spy_number == 0)){ + if ((priv->iw_mode == IW_MODE_ADHOC) && (priv->spy_number == 0)){ /* Quality stats meaningless in ad-hoc mode */ - range.max_qual.qual = 0; - range.max_qual.level = 0; - range.max_qual.noise = 0; - range.avg_qual.qual = 0; - range.avg_qual.level = 0; - range.avg_qual.noise = 0; } else { - range.max_qual.qual = 0x8b - 0x2f; - range.max_qual.level = 0x2f - 0x95 - 1; - range.max_qual.noise = 0x2f - 0x95 - 1; + range->max_qual.qual = 0x8b - 0x2f; + range->max_qual.level = 0x2f - 0x95 - 1; + range->max_qual.noise = 0x2f - 0x95 - 1; /* Need to get better values */ - range.avg_qual.qual = 0x24; - range.avg_qual.level = 0xC2; - range.avg_qual.noise = 0x9E; + range->avg_qual.qual = 0x24; + range->avg_qual.level = 0xC2; + range->avg_qual.noise = 0x9E; } err = orinoco_hw_get_bitratelist(priv, &numrates, - range.bitrate, IW_MAX_BITRATES); + range->bitrate, IW_MAX_BITRATES); if (err) return err; - range.num_bitrates = numrates; - + range->num_bitrates = numrates; + /* Set an indication of the max TCP throughput in bit/s that we can * expect using this interface. May be use for QoS stuff... * Jean II */ - if(numrates > 2) - range.throughput = 5 * 1000 * 1000; /* ~5 Mb/s */ + if (numrates > 2) + range->throughput = 5 * 1000 * 1000; /* ~5 Mb/s */ else - range.throughput = 1.5 * 1000 * 1000; /* ~1.5 Mb/s */ - - range.min_rts = 0; - range.max_rts = 2347; - range.min_frag = 256; - range.max_frag = 2346; - - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; - if (priv->has_wep) { - range.max_encoding_tokens = ORINOCO_MAX_KEYS; - - range.encoding_size[0] = SMALL_KEY_SIZE; - range.num_encoding_sizes = 1; - - if (priv->has_big_wep) { - range.encoding_size[1] = LARGE_KEY_SIZE; - range.num_encoding_sizes = 2; - } - } else { - range.num_encoding_sizes = 0; - range.max_encoding_tokens = 0; - } - orinoco_unlock(priv, &flags); - - range.min_pmp = 0; - range.max_pmp = 65535000; - range.min_pmt = 0; - range.max_pmt = 65535 * 1000; /* ??? */ - range.pmp_flags = IW_POWER_PERIOD; - range.pmt_flags = IW_POWER_TIMEOUT; - range.pm_capa = IW_POWER_PERIOD | IW_POWER_TIMEOUT | IW_POWER_UNICAST_R; - - range.num_txpower = 1; - range.txpower[0] = 15; /* 15dBm */ - range.txpower_capa = IW_TXPOW_DBM; - - range.retry_capa = IW_RETRY_LIMIT | IW_RETRY_LIFETIME; - range.retry_flags = IW_RETRY_LIMIT; - range.r_time_flags = IW_RETRY_LIFETIME; - range.min_retry = 0; - range.max_retry = 65535; /* ??? */ - range.min_r_time = 0; - range.max_r_time = 65535 * 1000; /* ??? */ + range->throughput = 1.5 * 1000 * 1000; /* ~1.5 Mb/s */ - if (copy_to_user(rrq->pointer, &range, sizeof(range))) - return -EFAULT; + range->min_rts = 0; + range->max_rts = 2347; + range->min_frag = 256; + range->max_frag = 2346; + + range->min_pmp = 0; + range->max_pmp = 65535000; + range->min_pmt = 0; + range->max_pmt = 65535 * 1000; /* ??? */ + range->pmp_flags = IW_POWER_PERIOD; + range->pmt_flags = IW_POWER_TIMEOUT; + range->pm_capa = IW_POWER_PERIOD | IW_POWER_TIMEOUT | IW_POWER_UNICAST_R; + + range->retry_capa = IW_RETRY_LIMIT | IW_RETRY_LIFETIME; + range->retry_flags = IW_RETRY_LIMIT; + range->r_time_flags = IW_RETRY_LIFETIME; + range->min_retry = 0; + range->max_retry = 65535; /* ??? */ + range->min_r_time = 0; + range->max_r_time = 65535 * 1000; /* ??? */ TRACE_EXIT(dev->name); return 0; } -static int orinoco_ioctl_setiwencode(struct net_device *dev, struct iw_point *erq) +static int orinoco_ioctl_setiwencode(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *erq, + char *keybuf) { struct orinoco_private *priv = netdev_priv(dev); int index = (erq->flags & IW_ENCODE_INDEX) - 1; @@ -2775,8 +2773,7 @@ int enable = priv->wep_on; int restricted = priv->wep_restrict; u16 xlen = 0; - int err = 0; - char keybuf[ORINOCO_MAX_KEY_SIZE]; + int err = -EINPROGRESS; /* Call commit handler */ unsigned long flags; if (! priv->has_wep) @@ -2789,9 +2786,6 @@ if ( (erq->length > SMALL_KEY_SIZE) && !priv->has_big_wep ) return -E2BIG; - - if (copy_from_user(keybuf, erq->pointer, erq->length)) - return -EFAULT; } if (orinoco_lock(priv, &flags) != 0) @@ -2865,12 +2859,14 @@ return err; } -static int orinoco_ioctl_getiwencode(struct net_device *dev, struct iw_point *erq) +static int orinoco_ioctl_getiwencode(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *erq, + char *keybuf) { struct orinoco_private *priv = netdev_priv(dev); int index = (erq->flags & IW_ENCODE_INDEX) - 1; u16 xlen = 0; - char keybuf[ORINOCO_MAX_KEY_SIZE]; unsigned long flags; if (! priv->has_wep) @@ -2899,51 +2895,47 @@ memcpy(keybuf, priv->keys[index].data, ORINOCO_MAX_KEY_SIZE); orinoco_unlock(priv, &flags); - - if (erq->pointer) { - if (copy_to_user(erq->pointer, keybuf, xlen)) - return -EFAULT; - } - return 0; } -static int orinoco_ioctl_setessid(struct net_device *dev, struct iw_point *erq) +static int orinoco_ioctl_setessid(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *erq, + char *essidbuf) { struct orinoco_private *priv = netdev_priv(dev); - char essidbuf[IW_ESSID_MAX_SIZE+1]; unsigned long flags; /* Note : ESSID is ignored in Ad-Hoc demo mode, but we can set it * anyway... - Jean II */ - memset(&essidbuf, 0, sizeof(essidbuf)); - - if (erq->flags) { - /* iwconfig includes the NUL in the specified length */ - if (erq->length > IW_ESSID_MAX_SIZE+1) - return -E2BIG; - - if (copy_from_user(&essidbuf, erq->pointer, erq->length)) - return -EFAULT; - - essidbuf[IW_ESSID_MAX_SIZE] = '\0'; - } + /* Hum... Should not use Wireless Extension constant (may change), + * should use our own... - Jean II */ + if (erq->length > IW_ESSID_MAX_SIZE) + return -E2BIG; if (orinoco_lock(priv, &flags) != 0) return -EBUSY; - memcpy(priv->desired_essid, essidbuf, sizeof(priv->desired_essid)); + /* NULL the string (for NULL termination & ESSID = ANY) - Jean II */ + memset(priv->desired_essid, 0, sizeof(priv->desired_essid)); + + /* If not ANY, get the new ESSID */ + if (erq->flags) { + memcpy(priv->desired_essid, essidbuf, erq->length); + } orinoco_unlock(priv, &flags); - return 0; + return -EINPROGRESS; /* Call commit handler */ } -static int orinoco_ioctl_getessid(struct net_device *dev, struct iw_point *erq) +static int orinoco_ioctl_getessid(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *erq, + char *essidbuf) { struct orinoco_private *priv = netdev_priv(dev); - char essidbuf[IW_ESSID_MAX_SIZE+1]; int active; int err = 0; unsigned long flags; @@ -2957,51 +2949,46 @@ } else { if (orinoco_lock(priv, &flags) != 0) return -EBUSY; - memcpy(essidbuf, priv->desired_essid, sizeof(essidbuf)); + memcpy(essidbuf, priv->desired_essid, IW_ESSID_MAX_SIZE + 1); orinoco_unlock(priv, &flags); } erq->flags = 1; erq->length = strlen(essidbuf) + 1; - if (erq->pointer) - if (copy_to_user(erq->pointer, essidbuf, erq->length)) - return -EFAULT; TRACE_EXIT(dev->name); return 0; } -static int orinoco_ioctl_setnick(struct net_device *dev, struct iw_point *nrq) +static int orinoco_ioctl_setnick(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *nrq, + char *nickbuf) { struct orinoco_private *priv = netdev_priv(dev); - char nickbuf[IW_ESSID_MAX_SIZE+1]; unsigned long flags; if (nrq->length > IW_ESSID_MAX_SIZE) return -E2BIG; - memset(nickbuf, 0, sizeof(nickbuf)); - - if (copy_from_user(nickbuf, nrq->pointer, nrq->length)) - return -EFAULT; - - nickbuf[nrq->length] = '\0'; - if (orinoco_lock(priv, &flags) != 0) return -EBUSY; - memcpy(priv->nick, nickbuf, sizeof(priv->nick)); + memset(priv->nick, 0, sizeof(priv->nick)); + memcpy(priv->nick, nickbuf, nrq->length); orinoco_unlock(priv, &flags); - return 0; + return -EINPROGRESS; /* Call commit handler */ } -static int orinoco_ioctl_getnick(struct net_device *dev, struct iw_point *nrq) +static int orinoco_ioctl_getnick(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *nrq, + char *nickbuf) { struct orinoco_private *priv = netdev_priv(dev); - char nickbuf[IW_ESSID_MAX_SIZE+1]; unsigned long flags; if (orinoco_lock(priv, &flags) != 0) @@ -3012,17 +2999,18 @@ nrq->length = strlen(nickbuf)+1; - if (copy_to_user(nrq->pointer, nickbuf, sizeof(nickbuf))) - return -EFAULT; - return 0; } -static int orinoco_ioctl_setfreq(struct net_device *dev, struct iw_freq *frq) +static int orinoco_ioctl_setfreq(struct net_device *dev, + struct iw_request_info *info, + struct iw_freq *frq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); int chan = -1; unsigned long flags; + int err = -EINPROGRESS; /* Call commit handler */ /* We can only use this in Ad-Hoc demo mode to set the operating * frequency, or in IBSS mode to set the frequency where the IBSS @@ -3055,10 +3043,33 @@ priv->channel = chan; orinoco_unlock(priv, &flags); + return err; +} + +static int orinoco_ioctl_getfreq(struct net_device *dev, + struct iw_request_info *info, + struct iw_freq *frq, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + int tmp; + + /* Locking done in there */ + tmp = orinoco_hw_get_freq(priv); + if (tmp < 0) { + return tmp; + } + + frq->m = tmp; + frq->e = 1; + return 0; } -static int orinoco_ioctl_getsens(struct net_device *dev, struct iw_param *srq) +static int orinoco_ioctl_getsens(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *srq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); hermes_t *hw = &priv->hw; @@ -3084,7 +3095,10 @@ return 0; } -static int orinoco_ioctl_setsens(struct net_device *dev, struct iw_param *srq) +static int orinoco_ioctl_setsens(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *srq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); int val = srq->value; @@ -3101,10 +3115,13 @@ priv->ap_density = val; orinoco_unlock(priv, &flags); - return 0; + return -EINPROGRESS; /* Call commit handler */ } -static int orinoco_ioctl_setrts(struct net_device *dev, struct iw_param *rrq) +static int orinoco_ioctl_setrts(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *rrq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); int val = rrq->value; @@ -3122,13 +3139,30 @@ priv->rts_thresh = val; orinoco_unlock(priv, &flags); + return -EINPROGRESS; /* Call commit handler */ +} + +static int orinoco_ioctl_getrts(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *rrq, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + + rrq->value = priv->rts_thresh; + rrq->disabled = (rrq->value == 2347); + rrq->fixed = 1; + return 0; } -static int orinoco_ioctl_setfrag(struct net_device *dev, struct iw_param *frq) +static int orinoco_ioctl_setfrag(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *frq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - int err = 0; + int err = -EINPROGRESS; /* Call commit handler */ unsigned long flags; if (orinoco_lock(priv, &flags) != 0) @@ -3160,11 +3194,14 @@ return err; } -static int orinoco_ioctl_getfrag(struct net_device *dev, struct iw_param *frq) +static int orinoco_ioctl_getfrag(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *frq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); hermes_t *hw = &priv->hw; - int err = 0; + int err; u16 val; unsigned long flags; @@ -3197,10 +3234,12 @@ return err; } -static int orinoco_ioctl_setrate(struct net_device *dev, struct iw_param *rrq) +static int orinoco_ioctl_setrate(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *rrq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - int err = 0; int ratemode = -1; int bitrate; /* 100s of kilobits */ int i; @@ -3236,10 +3275,13 @@ priv->bitratemode = ratemode; orinoco_unlock(priv, &flags); - return err; + return -EINPROGRESS; } -static int orinoco_ioctl_getrate(struct net_device *dev, struct iw_param *rrq) +static int orinoco_ioctl_getrate(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *rrq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); hermes_t *hw = &priv->hw; @@ -3304,10 +3346,13 @@ return err; } -static int orinoco_ioctl_setpower(struct net_device *dev, struct iw_param *prq) +static int orinoco_ioctl_setpower(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *prq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - int err = 0; + int err = -EINPROGRESS; /* Call commit handler */ unsigned long flags; if (orinoco_lock(priv, &flags) != 0) @@ -3356,7 +3401,10 @@ return err; } -static int orinoco_ioctl_getpower(struct net_device *dev, struct iw_param *prq) +static int orinoco_ioctl_getpower(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *prq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); hermes_t *hw = &priv->hw; @@ -3404,7 +3452,10 @@ return err; } -static int orinoco_ioctl_getretry(struct net_device *dev, struct iw_param *rrq) +static int orinoco_ioctl_getretry(struct net_device *dev, + struct iw_request_info *info, + struct iw_param *rrq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); hermes_t *hw = &priv->hw; @@ -3455,10 +3506,38 @@ return err; } -static int orinoco_ioctl_setibssport(struct net_device *dev, struct iwreq *wrq) +static int orinoco_ioctl_reset(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + + if (! capable(CAP_NET_ADMIN)) + return -EPERM; + + if (info->cmd == (SIOCIWFIRSTPRIV + 0x1)) { + printk(KERN_DEBUG "%s: Forcing reset!\n", dev->name); + + /* Firmware reset */ + orinoco_reset(dev); + } else { + printk(KERN_DEBUG "%s: Force scheduling reset!\n", dev->name); + + schedule_work(&priv->reset_work); + } + + return 0; +} + +static int orinoco_ioctl_setibssport(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) + { struct orinoco_private *priv = netdev_priv(dev); - int val = *( (int *) wrq->u.name ); + int val = *( (int *) extra ); unsigned long flags; if (orinoco_lock(priv, &flags) != 0) @@ -3470,28 +3549,28 @@ set_port_type(priv); orinoco_unlock(priv, &flags); - return 0; + return -EINPROGRESS; /* Call commit handler */ } -static int orinoco_ioctl_getibssport(struct net_device *dev, struct iwreq *wrq) +static int orinoco_ioctl_getibssport(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - int *val = (int *)wrq->u.name; - unsigned long flags; - - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; + int *val = (int *) extra; *val = priv->ibss_port; - orinoco_unlock(priv, &flags); - return 0; } -static int orinoco_ioctl_setport3(struct net_device *dev, struct iwreq *wrq) +static int orinoco_ioctl_setport3(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - int val = *( (int *) wrq->u.name ); + int val = *( (int *) extra ); int err = 0; unsigned long flags; @@ -3520,51 +3599,131 @@ err = -EINVAL; } - if (! err) + if (! err) { /* Actually update the mode we are using */ set_port_type(priv); + err = -EINPROGRESS; + } orinoco_unlock(priv, &flags); return err; } -static int orinoco_ioctl_getport3(struct net_device *dev, struct iwreq *wrq) +static int orinoco_ioctl_getport3(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + int *val = (int *) extra; + + *val = priv->prefer_port3; + return 0; +} + +static int orinoco_ioctl_setpreamble(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - int *val = (int *)wrq->u.name; unsigned long flags; + int val; + + if (! priv->has_preamble) + return -EOPNOTSUPP; + + /* 802.11b has recently defined some short preamble. + * Basically, the Phy header has been reduced in size. + * This increase performance, especially at high rates + * (the preamble is transmitted at 1Mb/s), unfortunately + * this give compatibility troubles... - Jean II */ + val = *( (int *) extra ); if (orinoco_lock(priv, &flags) != 0) return -EBUSY; - *val = priv->prefer_port3; + if (val) + priv->preamble = 1; + else + priv->preamble = 0; + orinoco_unlock(priv, &flags); + + return -EINPROGRESS; /* Call commit handler */ +} + +static int orinoco_ioctl_getpreamble(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + int *val = (int *) extra; + + if (! priv->has_preamble) + return -EOPNOTSUPP; + + *val = priv->preamble; return 0; } +/* ioctl interface to hermes_read_ltv() + * To use with iwpriv, pass the RID as the token argument, e.g. + * iwpriv get_rid [0xfc00] + * At least Wireless Tools 25 is required to use iwpriv. + * For Wireless Tools 25 and 26 append "dummy" are the end. */ +static int orinoco_ioctl_getrid(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *data, + char *extra) +{ + struct orinoco_private *priv = netdev_priv(dev); + hermes_t *hw = &priv->hw; + int rid = data->flags; + u16 length; + int err; + unsigned long flags; + + /* It's a "get" function, but we don't want users to access the + * WEP key and other raw firmware data */ + if (! capable(CAP_NET_ADMIN)) + return -EPERM; + + if (rid < 0xfc00 || rid > 0xffff) + return -EINVAL; + + if (orinoco_lock(priv, &flags) != 0) + return -EBUSY; + + err = hermes_read_ltv(hw, USER_BAP, rid, MAX_RID_LEN, &length, + extra); + if (err) + goto out; + + data->length = min_t(u16, HERMES_RECLEN_TO_BYTES(length), + MAX_RID_LEN); + + out: + orinoco_unlock(priv, &flags); + return err; +} + /* Spy is used for link quality/strength measurements in Ad-Hoc mode * Jean II */ -static int orinoco_ioctl_setspy(struct net_device *dev, struct iw_point *srq) +static int orinoco_ioctl_setspy(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *srq, + char *extra) + { struct orinoco_private *priv = netdev_priv(dev); - struct sockaddr address[IW_MAX_SPY]; + struct sockaddr *address = (struct sockaddr *) extra; int number = srq->length; int i; - int err = 0; unsigned long flags; - /* Check the number of addresses */ - if (number > IW_MAX_SPY) - return -E2BIG; - - /* Get the data in the driver */ - if (srq->pointer) { - if (copy_from_user(address, srq->pointer, - sizeof(struct sockaddr) * number)) - return -EFAULT; - } - /* Make sure nobody mess with the structure while we do */ if (orinoco_lock(priv, &flags) != 0) return -EBUSY; @@ -3588,14 +3747,17 @@ /* Now, let the others play */ orinoco_unlock(priv, &flags); - return err; + /* Do NOT call commit handler */ + return 0; } -static int orinoco_ioctl_getspy(struct net_device *dev, struct iw_point *srq) +static int orinoco_ioctl_getspy(struct net_device *dev, + struct iw_request_info *info, + struct iw_point *srq, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - struct sockaddr address[IW_MAX_SPY]; - struct iw_quality spy_stat[IW_MAX_SPY]; + struct sockaddr *address = (struct sockaddr *) extra; int number; int i; unsigned long flags; @@ -3604,7 +3766,12 @@ return -EBUSY; number = priv->spy_number; - if ((number > 0) && (srq->pointer)) { + /* Create address struct */ + for (i = 0; i < number; i++) { + memcpy(address[i].sa_data, priv->spy_address[i], ETH_ALEN); + address[i].sa_family = AF_UNIX; + } + if (number > 0) { /* Create address struct */ for (i = 0; i < number; i++) { memcpy(address[i].sa_data, priv->spy_address[i], @@ -3615,344 +3782,153 @@ /* In theory, we should disable irqs while copying the stats * because the rx path might update it in the middle... * Bah, who care ? - Jean II */ - memcpy(&spy_stat, priv->spy_stat, - sizeof(struct iw_quality) * IW_MAX_SPY); - for (i=0; i < number; i++) - priv->spy_stat[i].updated = 0; + memcpy(extra + (sizeof(struct sockaddr) * number), + priv->spy_stat, sizeof(struct iw_quality) * number); } + /* Reset updated flags. */ + for (i = 0; i < number; i++) + priv->spy_stat[i].updated = 0; orinoco_unlock(priv, &flags); - /* Push stuff to user space */ srq->length = number; - if(copy_to_user(srq->pointer, address, - sizeof(struct sockaddr) * number)) - return -EFAULT; - if(copy_to_user(srq->pointer + (sizeof(struct sockaddr)*number), - &spy_stat, sizeof(struct iw_quality) * number)) - return -EFAULT; return 0; } -static int -orinoco_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) +/* Commit handler, called after set operations */ +static int orinoco_ioctl_commit(struct net_device *dev, + struct iw_request_info *info, + void *wrqu, + char *extra) { struct orinoco_private *priv = netdev_priv(dev); - struct iwreq *wrq = (struct iwreq *)rq; - int err = 0; - int tmp; - int changed = 0; + struct hermes *hw = &priv->hw; unsigned long flags; + int err = 0; - TRACE_ENTER(dev->name); - - /* In theory, we could allow most of the the SET stuff to be - * done. In practice, the lapse of time at startup when the - * card is not ready is very short, so why bother... Note - * that netif_device_present is different from up/down - * (ifconfig), when the device is not yet up, it is usually - * already ready... Jean II */ - if (! netif_device_present(dev)) - return -ENODEV; - - switch (cmd) { - case SIOCGIWNAME: - strcpy(wrq->u.name, "IEEE 802.11-DS"); - break; - - case SIOCGIWAP: - wrq->u.ap_addr.sa_family = ARPHRD_ETHER; - err = orinoco_hw_get_bssid(priv, wrq->u.ap_addr.sa_data); - break; - - case SIOCGIWRANGE: - err = orinoco_ioctl_getiwrange(dev, &wrq->u.data); - break; - - case SIOCSIWMODE: - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; - switch (wrq->u.mode) { - case IW_MODE_ADHOC: - if (! (priv->has_ibss || priv->has_port3) ) - err = -EINVAL; - else { - priv->iw_mode = IW_MODE_ADHOC; - changed = 1; - } - break; - - case IW_MODE_INFRA: - priv->iw_mode = IW_MODE_INFRA; - changed = 1; - break; - - default: - err = -EINVAL; - break; - } - set_port_type(priv); - orinoco_unlock(priv, &flags); - break; - - case SIOCGIWMODE: - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; - wrq->u.mode = priv->iw_mode; - orinoco_unlock(priv, &flags); - break; - - case SIOCSIWENCODE: - err = orinoco_ioctl_setiwencode(dev, &wrq->u.encoding); - if (! err) - changed = 1; - break; - - case SIOCGIWENCODE: - if (! capable(CAP_NET_ADMIN)) { - err = -EPERM; - break; - } - - err = orinoco_ioctl_getiwencode(dev, &wrq->u.encoding); - break; - - case SIOCSIWESSID: - err = orinoco_ioctl_setessid(dev, &wrq->u.essid); - if (! err) - changed = 1; - break; - - case SIOCGIWESSID: - err = orinoco_ioctl_getessid(dev, &wrq->u.essid); - break; - - case SIOCSIWNICKN: - err = orinoco_ioctl_setnick(dev, &wrq->u.data); - if (! err) - changed = 1; - break; - - case SIOCGIWNICKN: - err = orinoco_ioctl_getnick(dev, &wrq->u.data); - break; - - case SIOCGIWFREQ: - tmp = orinoco_hw_get_freq(priv); - if (tmp < 0) { - err = tmp; - } else { - wrq->u.freq.m = tmp; - wrq->u.freq.e = 1; - } - break; - - case SIOCSIWFREQ: - err = orinoco_ioctl_setfreq(dev, &wrq->u.freq); - if (! err) - changed = 1; - break; - - case SIOCGIWSENS: - err = orinoco_ioctl_getsens(dev, &wrq->u.sens); - break; - - case SIOCSIWSENS: - err = orinoco_ioctl_setsens(dev, &wrq->u.sens); - if (! err) - changed = 1; - break; - - case SIOCGIWRTS: - wrq->u.rts.value = priv->rts_thresh; - wrq->u.rts.disabled = (wrq->u.rts.value == 2347); - wrq->u.rts.fixed = 1; - break; - - case SIOCSIWRTS: - err = orinoco_ioctl_setrts(dev, &wrq->u.rts); - if (! err) - changed = 1; - break; - - case SIOCSIWFRAG: - err = orinoco_ioctl_setfrag(dev, &wrq->u.frag); - if (! err) - changed = 1; - break; - - case SIOCGIWFRAG: - err = orinoco_ioctl_getfrag(dev, &wrq->u.frag); - break; - - case SIOCSIWRATE: - err = orinoco_ioctl_setrate(dev, &wrq->u.bitrate); - if (! err) - changed = 1; - break; - - case SIOCGIWRATE: - err = orinoco_ioctl_getrate(dev, &wrq->u.bitrate); - break; - - case SIOCSIWPOWER: - err = orinoco_ioctl_setpower(dev, &wrq->u.power); - if (! err) - changed = 1; - break; - - case SIOCGIWPOWER: - err = orinoco_ioctl_getpower(dev, &wrq->u.power); - break; + if (!priv->open) + return 0; - case SIOCGIWTXPOW: - /* The card only supports one tx power, so this is easy */ - wrq->u.txpower.value = 15; /* dBm */ - wrq->u.txpower.fixed = 1; - wrq->u.txpower.disabled = 0; - wrq->u.txpower.flags = IW_TXPOW_DBM; - break; + if (priv->broken_disableport) { + orinoco_reset(dev); + return 0; + } - case SIOCSIWRETRY: - err = -EOPNOTSUPP; - break; + if (orinoco_lock(priv, &flags) != 0) + return err; - case SIOCGIWRETRY: - err = orinoco_ioctl_getretry(dev, &wrq->u.retry); - break; + err = hermes_disable_port(hw, 0); + if (err) { + printk(KERN_WARNING "%s: Unable to disable port " + "while reconfiguring card\n", dev->name); + priv->broken_disableport = 1; + goto out; + } - case SIOCSIWSPY: - err = orinoco_ioctl_setspy(dev, &wrq->u.data); - break; + err = __orinoco_program_rids(dev); + if (err) { + printk(KERN_WARNING "%s: Unable to reconfigure card\n", + dev->name); + goto out; + } - case SIOCGIWSPY: - err = orinoco_ioctl_getspy(dev, &wrq->u.data); - break; - - case SIOCGIWPRIV: - if (wrq->u.data.pointer) { - struct iw_priv_args privtab[] = { - { SIOCIWFIRSTPRIV + 0x0, 0, 0, "force_reset" }, - { SIOCIWFIRSTPRIV + 0x1, 0, 0, "card_reset" }, - { SIOCIWFIRSTPRIV + 0x2, - IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, - 0, "set_port3" }, - { SIOCIWFIRSTPRIV + 0x3, 0, - IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, - "get_port3" }, - { SIOCIWFIRSTPRIV + 0x4, - IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, - 0, "set_preamble" }, - { SIOCIWFIRSTPRIV + 0x5, 0, - IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, - "get_preamble" }, - { SIOCIWFIRSTPRIV + 0x6, - IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, - 0, "set_ibssport" }, - { SIOCIWFIRSTPRIV + 0x7, 0, - IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, - "get_ibssport" }, - }; - - wrq->u.data.length = sizeof(privtab) / sizeof(privtab[0]); - if (copy_to_user(wrq->u.data.pointer, privtab, sizeof(privtab))) - err = -EFAULT; - } - break; - - case SIOCIWFIRSTPRIV + 0x0: /* force_reset */ - case SIOCIWFIRSTPRIV + 0x1: /* card_reset */ - if (! capable(CAP_NET_ADMIN)) { - err = -EPERM; - break; - } - - printk(KERN_DEBUG "%s: Force scheduling reset!\n", dev->name); + err = hermes_enable_port(hw, 0); + if (err) { + printk(KERN_WARNING "%s: Unable to enable port while reconfiguring card\n", + dev->name); + goto out; + } + out: + if (err) { + printk(KERN_WARNING "%s: Resetting instead...\n", dev->name); schedule_work(&priv->reset_work); - break; - - case SIOCIWFIRSTPRIV + 0x2: /* set_port3 */ - if (! capable(CAP_NET_ADMIN)) { - err = -EPERM; - break; - } - - err = orinoco_ioctl_setport3(dev, wrq); - if (! err) - changed = 1; - break; - - case SIOCIWFIRSTPRIV + 0x3: /* get_port3 */ - err = orinoco_ioctl_getport3(dev, wrq); - break; - - case SIOCIWFIRSTPRIV + 0x4: /* set_preamble */ - if (! capable(CAP_NET_ADMIN)) { - err = -EPERM; - break; - } - - /* 802.11b has recently defined some short preamble. - * Basically, the Phy header has been reduced in size. - * This increase performance, especially at high rates - * (the preamble is transmitted at 1Mb/s), unfortunately - * this give compatibility troubles... - Jean II */ - if(priv->has_preamble) { - int val = *( (int *) wrq->u.name ); - - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; - if (val) - priv->preamble = 1; - else - priv->preamble = 0; - orinoco_unlock(priv, &flags); - changed = 1; - } else - err = -EOPNOTSUPP; - break; + err = 0; + } - case SIOCIWFIRSTPRIV + 0x5: /* get_preamble */ - if(priv->has_preamble) { - int *val = (int *)wrq->u.name; - - if (orinoco_lock(priv, &flags) != 0) - return -EBUSY; - *val = priv->preamble; - orinoco_unlock(priv, &flags); - } else - err = -EOPNOTSUPP; - break; - case SIOCIWFIRSTPRIV + 0x6: /* set_ibssport */ - if (! capable(CAP_NET_ADMIN)) { - err = -EPERM; - break; - } + orinoco_unlock(priv, &flags); + return err; +} - err = orinoco_ioctl_setibssport(dev, wrq); - if (! err) - changed = 1; - break; +static const struct iw_priv_args orinoco_privtab[] = { + { SIOCIWFIRSTPRIV + 0x0, 0, 0, "force_reset" }, + { SIOCIWFIRSTPRIV + 0x1, 0, 0, "card_reset" }, + { SIOCIWFIRSTPRIV + 0x2, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, + 0, "set_port3" }, + { SIOCIWFIRSTPRIV + 0x3, 0, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, + "get_port3" }, + { SIOCIWFIRSTPRIV + 0x4, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, + 0, "set_preamble" }, + { SIOCIWFIRSTPRIV + 0x5, 0, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, + "get_preamble" }, + { SIOCIWFIRSTPRIV + 0x6, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, + 0, "set_ibssport" }, + { SIOCIWFIRSTPRIV + 0x7, 0, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, + "get_ibssport" }, + { SIOCIWFIRSTPRIV + 0x9, 0, IW_PRIV_TYPE_BYTE | MAX_RID_LEN, + "get_rid" }, +}; - case SIOCIWFIRSTPRIV + 0x7: /* get_ibssport */ - err = orinoco_ioctl_getibssport(dev, wrq); - break; - default: - err = -EOPNOTSUPP; - } - - if (! err && changed && netif_running(dev)) { - err = orinoco_reconfigure(dev); - } +/* + * Structures to export the Wireless Handlers + */ + +static const iw_handler orinoco_handler[] = { + [SIOCSIWCOMMIT-SIOCIWFIRST] (iw_handler) orinoco_ioctl_commit, + [SIOCGIWNAME -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getname, + [SIOCSIWFREQ -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setfreq, + [SIOCGIWFREQ -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getfreq, + [SIOCSIWMODE -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setmode, + [SIOCGIWMODE -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getmode, + [SIOCSIWSENS -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setsens, + [SIOCGIWSENS -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getsens, + [SIOCGIWRANGE -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getiwrange, + [SIOCSIWSPY -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setspy, + [SIOCGIWSPY -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getspy, + [SIOCGIWAP -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getwap, + [SIOCSIWESSID -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setessid, + [SIOCGIWESSID -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getessid, + [SIOCSIWNICKN -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setnick, + [SIOCGIWNICKN -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getnick, + [SIOCSIWRATE -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setrate, + [SIOCGIWRATE -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getrate, + [SIOCSIWRTS -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setrts, + [SIOCGIWRTS -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getrts, + [SIOCSIWFRAG -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setfrag, + [SIOCGIWFRAG -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getfrag, + [SIOCGIWRETRY -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getretry, + [SIOCSIWENCODE-SIOCIWFIRST] (iw_handler) orinoco_ioctl_setiwencode, + [SIOCGIWENCODE-SIOCIWFIRST] (iw_handler) orinoco_ioctl_getiwencode, + [SIOCSIWPOWER -SIOCIWFIRST] (iw_handler) orinoco_ioctl_setpower, + [SIOCGIWPOWER -SIOCIWFIRST] (iw_handler) orinoco_ioctl_getpower, +}; - TRACE_EXIT(dev->name); - return err; -} +/* + Added typecasting since we no longer use iwreq_data -- Moustafa + */ +static const iw_handler orinoco_private_handler[] = { + [0] (iw_handler) orinoco_ioctl_reset, + [1] (iw_handler) orinoco_ioctl_reset, + [2] (iw_handler) orinoco_ioctl_setport3, + [3] (iw_handler) orinoco_ioctl_getport3, + [4] (iw_handler) orinoco_ioctl_setpreamble, + [5] (iw_handler) orinoco_ioctl_getpreamble, + [6] (iw_handler) orinoco_ioctl_setibssport, + [7] (iw_handler) orinoco_ioctl_getibssport, + [9] (iw_handler) orinoco_ioctl_getrid, +}; +static const struct iw_handler_def orinoco_handler_def = { + .num_standard = ARRAY_SIZE(orinoco_handler), + .num_private = ARRAY_SIZE(orinoco_private_handler), + .num_private_args = ARRAY_SIZE(orinoco_privtab), + .standard = orinoco_handler, + .private = orinoco_private_handler, + .private_args = orinoco_privtab, +}; /********************************************************************/ /* Debugging */ From hch@lst.de Sat Jun 18 16:28:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:28:54 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INSkH9022661 for ; Sat, 18 Jun 2005 16:28:46 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INRK6t010117 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:27:20 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INRKqP010115; Sun, 19 Jun 2005 01:27:20 +0200 Date: Sun, 19 Jun 2005 01:27:20 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 1/9] bring over ieee80211.h from mainline Message-ID: <20050618232720.GB9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2460 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 26510 Lines: 891 the prototypes and inlines aren't actually needed, but let's not diverge from -mm too far. Index: linux-2.6/include/net/ieee80211.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/include/net/ieee80211.h 2005-06-19 01:01:14.000000000 +0200 @@ -0,0 +1,882 @@ +/* + * Merged with mainline ieee80211.h in Aug 2004. Original ieee802_11 + * remains copyright by the original authors + * + * Portions of the merged code are based on Host AP (software wireless + * LAN access point) driver for Intersil Prism2/2.5/3. + * + * Copyright (c) 2001-2002, SSH Communications Security Corp and Jouni Malinen + * + * Copyright (c) 2002-2003, Jouni Malinen + * + * Adaption to a generic IEEE 802.11 stack by James Ketrenos + * + * Copyright (c) 2004, Intel Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. See README and COPYING for + * more details. + */ +#ifndef IEEE80211_H +#define IEEE80211_H + +#include /* ETH_ALEN */ +#include /* ARRAY_SIZE */ + +#if WIRELESS_EXT < 17 +#define IW_QUAL_QUAL_INVALID 0x10 +#define IW_QUAL_LEVEL_INVALID 0x20 +#define IW_QUAL_NOISE_INVALID 0x40 +#define IW_QUAL_QUAL_UPDATED 0x1 +#define IW_QUAL_LEVEL_UPDATED 0x2 +#define IW_QUAL_NOISE_UPDATED 0x4 +#endif + +#define IEEE80211_DATA_LEN 2304 +/* Maximum size for the MA-UNITDATA primitive, 802.11 standard section + 6.2.1.1.2. + + The figure in section 7.1.2 suggests a body size of up to 2312 + bytes is allowed, which is a bit confusing, I suspect this + represents the 2304 bytes of real data, plus a possible 8 bytes of + WEP IV and ICV. (this interpretation suggested by Ramiro Barreiro) */ + + +#define IEEE80211_HLEN 30 +#define IEEE80211_FRAME_LEN (IEEE80211_DATA_LEN + IEEE80211_HLEN) + +struct ieee80211_hdr { + u16 frame_ctl; + u16 duration_id; + u8 addr1[ETH_ALEN]; + u8 addr2[ETH_ALEN]; + u8 addr3[ETH_ALEN]; + u16 seq_ctl; + u8 addr4[ETH_ALEN]; +} __attribute__ ((packed)); + +struct ieee80211_hdr_3addr { + u16 frame_ctl; + u16 duration_id; + u8 addr1[ETH_ALEN]; + u8 addr2[ETH_ALEN]; + u8 addr3[ETH_ALEN]; + u16 seq_ctl; +} __attribute__ ((packed)); + +enum eap_type { + EAP_PACKET = 0, + EAPOL_START, + EAPOL_LOGOFF, + EAPOL_KEY, + EAPOL_ENCAP_ASF_ALERT +}; + +static const char *eap_types[] = { + [EAP_PACKET] = "EAP-Packet", + [EAPOL_START] = "EAPOL-Start", + [EAPOL_LOGOFF] = "EAPOL-Logoff", + [EAPOL_KEY] = "EAPOL-Key", + [EAPOL_ENCAP_ASF_ALERT] = "EAPOL-Encap-ASF-Alert" +}; + +static inline const char *eap_get_type(int type) +{ + return (type >= ARRAY_SIZE(eap_types)) ? "Unknown" : eap_types[type]; +} + +struct eapol { + u8 snap[6]; + u16 ethertype; + u8 version; + u8 type; + u16 length; +} __attribute__ ((packed)); + +#define IEEE80211_3ADDR_LEN 24 +#define IEEE80211_4ADDR_LEN 30 +#define IEEE80211_FCS_LEN 4 + +#define MIN_FRAG_THRESHOLD 256U +#define MAX_FRAG_THRESHOLD 2346U + +/* Frame control field constants */ +#define IEEE80211_FCTL_VERS 0x0002 +#define IEEE80211_FCTL_FTYPE 0x000c +#define IEEE80211_FCTL_STYPE 0x00f0 +#define IEEE80211_FCTL_TODS 0x0100 +#define IEEE80211_FCTL_FROMDS 0x0200 +#define IEEE80211_FCTL_MOREFRAGS 0x0400 +#define IEEE80211_FCTL_RETRY 0x0800 +#define IEEE80211_FCTL_PM 0x1000 +#define IEEE80211_FCTL_MOREDATA 0x2000 +#define IEEE80211_FCTL_WEP 0x4000 +#define IEEE80211_FCTL_ORDER 0x8000 + +#define IEEE80211_FTYPE_MGMT 0x0000 +#define IEEE80211_FTYPE_CTL 0x0004 +#define IEEE80211_FTYPE_DATA 0x0008 + +/* management */ +#define IEEE80211_STYPE_ASSOC_REQ 0x0000 +#define IEEE80211_STYPE_ASSOC_RESP 0x0010 +#define IEEE80211_STYPE_REASSOC_REQ 0x0020 +#define IEEE80211_STYPE_REASSOC_RESP 0x0030 +#define IEEE80211_STYPE_PROBE_REQ 0x0040 +#define IEEE80211_STYPE_PROBE_RESP 0x0050 +#define IEEE80211_STYPE_BEACON 0x0080 +#define IEEE80211_STYPE_ATIM 0x0090 +#define IEEE80211_STYPE_DISASSOC 0x00A0 +#define IEEE80211_STYPE_AUTH 0x00B0 +#define IEEE80211_STYPE_DEAUTH 0x00C0 + +/* control */ +#define IEEE80211_STYPE_PSPOLL 0x00A0 +#define IEEE80211_STYPE_RTS 0x00B0 +#define IEEE80211_STYPE_CTS 0x00C0 +#define IEEE80211_STYPE_ACK 0x00D0 +#define IEEE80211_STYPE_CFEND 0x00E0 +#define IEEE80211_STYPE_CFENDACK 0x00F0 + +/* data */ +#define IEEE80211_STYPE_DATA 0x0000 +#define IEEE80211_STYPE_DATA_CFACK 0x0010 +#define IEEE80211_STYPE_DATA_CFPOLL 0x0020 +#define IEEE80211_STYPE_DATA_CFACKPOLL 0x0030 +#define IEEE80211_STYPE_NULLFUNC 0x0040 +#define IEEE80211_STYPE_CFACK 0x0050 +#define IEEE80211_STYPE_CFPOLL 0x0060 +#define IEEE80211_STYPE_CFACKPOLL 0x0070 + +#define IEEE80211_SCTL_FRAG 0x000F +#define IEEE80211_SCTL_SEQ 0xFFF0 + + +/* debug macros */ + +#ifdef CONFIG_IEEE80211_DEBUG +extern u32 ieee80211_debug_level; +#define IEEE80211_DEBUG(level, fmt, args...) \ +do { if (ieee80211_debug_level & (level)) \ + printk(KERN_DEBUG "ieee80211: %c %s " fmt, \ + in_interrupt() ? 'I' : 'U', __FUNCTION__ , ## args); } while (0) +#else +#define IEEE80211_DEBUG(level, fmt, args...) do {} while (0) +#endif /* CONFIG_IEEE80211_DEBUG */ + +/* + * To use the debug system; + * + * If you are defining a new debug classification, simply add it to the #define + * list here in the form of: + * + * #define IEEE80211_DL_xxxx VALUE + * + * shifting value to the left one bit from the previous entry. xxxx should be + * the name of the classification (for example, WEP) + * + * You then need to either add a IEEE80211_xxxx_DEBUG() macro definition for your + * classification, or use IEEE80211_DEBUG(IEEE80211_DL_xxxx, ...) whenever you want + * to send output to that classification. + * + * To add your debug level to the list of levels seen when you perform + * + * % cat /proc/net/ipw/debug_level + * + * you simply need to add your entry to the ipw_debug_levels array. + * + * If you do not see debug_level in /proc/net/ipw then you do not have + * CONFIG_IEEE80211_DEBUG defined in your kernel configuration + * + */ + +#define IEEE80211_DL_INFO (1<<0) +#define IEEE80211_DL_WX (1<<1) +#define IEEE80211_DL_SCAN (1<<2) +#define IEEE80211_DL_STATE (1<<3) +#define IEEE80211_DL_MGMT (1<<4) +#define IEEE80211_DL_FRAG (1<<5) +#define IEEE80211_DL_EAP (1<<6) +#define IEEE80211_DL_DROP (1<<7) + +#define IEEE80211_DL_TX (1<<8) +#define IEEE80211_DL_RX (1<<9) + +#define IEEE80211_ERROR(f, a...) printk(KERN_ERR "ieee80211: " f, ## a) +#define IEEE80211_WARNING(f, a...) printk(KERN_WARNING "ieee80211: " f, ## a) +#define IEEE80211_DEBUG_INFO(f, a...) IEEE80211_DEBUG(IEEE80211_DL_INFO, f, ## a) + +#define IEEE80211_DEBUG_WX(f, a...) IEEE80211_DEBUG(IEEE80211_DL_WX, f, ## a) +#define IEEE80211_DEBUG_SCAN(f, a...) IEEE80211_DEBUG(IEEE80211_DL_SCAN, f, ## a) +#define IEEE80211_DEBUG_STATE(f, a...) IEEE80211_DEBUG(IEEE80211_DL_STATE, f, ## a) +#define IEEE80211_DEBUG_MGMT(f, a...) IEEE80211_DEBUG(IEEE80211_DL_MGMT, f, ## a) +#define IEEE80211_DEBUG_FRAG(f, a...) IEEE80211_DEBUG(IEEE80211_DL_FRAG, f, ## a) +#define IEEE80211_DEBUG_EAP(f, a...) IEEE80211_DEBUG(IEEE80211_DL_EAP, f, ## a) +#define IEEE80211_DEBUG_DROP(f, a...) IEEE80211_DEBUG(IEEE80211_DL_DROP, f, ## a) +#define IEEE80211_DEBUG_TX(f, a...) IEEE80211_DEBUG(IEEE80211_DL_TX, f, ## a) +#define IEEE80211_DEBUG_RX(f, a...) IEEE80211_DEBUG(IEEE80211_DL_RX, f, ## a) +#include +#include +#include /* ARPHRD_ETHER */ + +#ifndef WIRELESS_SPY +#define WIRELESS_SPY // enable iwspy support +#endif +#include // new driver API + +#ifndef ETH_P_PAE +#define ETH_P_PAE 0x888E /* Port Access Entity (IEEE 802.1X) */ +#endif /* ETH_P_PAE */ + +#define ETH_P_PREAUTH 0x88C7 /* IEEE 802.11i pre-authentication */ + +#ifndef ETH_P_80211_RAW +#define ETH_P_80211_RAW (ETH_P_ECONET + 1) +#endif + +/* IEEE 802.11 defines */ + +#define P80211_OUI_LEN 3 + +struct ieee80211_snap_hdr { + + u8 dsap; /* always 0xAA */ + u8 ssap; /* always 0xAA */ + u8 ctrl; /* always 0x03 */ + u8 oui[P80211_OUI_LEN]; /* organizational universal id */ + +} __attribute__ ((packed)); + +#define SNAP_SIZE sizeof(struct ieee80211_snap_hdr) + +#define WLAN_FC_GET_TYPE(fc) ((fc) & IEEE80211_FCTL_FTYPE) +#define WLAN_FC_GET_STYPE(fc) ((fc) & IEEE80211_FCTL_STYPE) + +#define WLAN_GET_SEQ_FRAG(seq) ((seq) & IEEE80211_SCTL_FRAG) +#define WLAN_GET_SEQ_SEQ(seq) ((seq) & IEEE80211_SCTL_SEQ) + +/* Authentication algorithms */ +#define WLAN_AUTH_OPEN 0 +#define WLAN_AUTH_SHARED_KEY 1 + +#define WLAN_AUTH_CHALLENGE_LEN 128 + +#define WLAN_CAPABILITY_BSS (1<<0) +#define WLAN_CAPABILITY_IBSS (1<<1) +#define WLAN_CAPABILITY_CF_POLLABLE (1<<2) +#define WLAN_CAPABILITY_CF_POLL_REQUEST (1<<3) +#define WLAN_CAPABILITY_PRIVACY (1<<4) +#define WLAN_CAPABILITY_SHORT_PREAMBLE (1<<5) +#define WLAN_CAPABILITY_PBCC (1<<6) +#define WLAN_CAPABILITY_CHANNEL_AGILITY (1<<7) + +/* Status codes */ +#define WLAN_STATUS_SUCCESS 0 +#define WLAN_STATUS_UNSPECIFIED_FAILURE 1 +#define WLAN_STATUS_CAPS_UNSUPPORTED 10 +#define WLAN_STATUS_REASSOC_NO_ASSOC 11 +#define WLAN_STATUS_ASSOC_DENIED_UNSPEC 12 +#define WLAN_STATUS_NOT_SUPPORTED_AUTH_ALG 13 +#define WLAN_STATUS_UNKNOWN_AUTH_TRANSACTION 14 +#define WLAN_STATUS_CHALLENGE_FAIL 15 +#define WLAN_STATUS_AUTH_TIMEOUT 16 +#define WLAN_STATUS_AP_UNABLE_TO_HANDLE_NEW_STA 17 +#define WLAN_STATUS_ASSOC_DENIED_RATES 18 +/* 802.11b */ +#define WLAN_STATUS_ASSOC_DENIED_NOSHORT 19 +#define WLAN_STATUS_ASSOC_DENIED_NOPBCC 20 +#define WLAN_STATUS_ASSOC_DENIED_NOAGILITY 21 + +/* Reason codes */ +#define WLAN_REASON_UNSPECIFIED 1 +#define WLAN_REASON_PREV_AUTH_NOT_VALID 2 +#define WLAN_REASON_DEAUTH_LEAVING 3 +#define WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY 4 +#define WLAN_REASON_DISASSOC_AP_BUSY 5 +#define WLAN_REASON_CLASS2_FRAME_FROM_NONAUTH_STA 6 +#define WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA 7 +#define WLAN_REASON_DISASSOC_STA_HAS_LEFT 8 +#define WLAN_REASON_STA_REQ_ASSOC_WITHOUT_AUTH 9 + + +/* Information Element IDs */ +#define WLAN_EID_SSID 0 +#define WLAN_EID_SUPP_RATES 1 +#define WLAN_EID_FH_PARAMS 2 +#define WLAN_EID_DS_PARAMS 3 +#define WLAN_EID_CF_PARAMS 4 +#define WLAN_EID_TIM 5 +#define WLAN_EID_IBSS_PARAMS 6 +#define WLAN_EID_CHALLENGE 16 +#define WLAN_EID_RSN 48 +#define WLAN_EID_GENERIC 221 + +#define IEEE80211_MGMT_HDR_LEN 24 +#define IEEE80211_DATA_HDR3_LEN 24 +#define IEEE80211_DATA_HDR4_LEN 30 + + +#define IEEE80211_STATMASK_SIGNAL (1<<0) +#define IEEE80211_STATMASK_RSSI (1<<1) +#define IEEE80211_STATMASK_NOISE (1<<2) +#define IEEE80211_STATMASK_RATE (1<<3) +#define IEEE80211_STATMASK_WEMASK 0x7 + + +#define IEEE80211_CCK_MODULATION (1<<0) +#define IEEE80211_OFDM_MODULATION (1<<1) + +#define IEEE80211_24GHZ_BAND (1<<0) +#define IEEE80211_52GHZ_BAND (1<<1) + +#define IEEE80211_CCK_RATE_1MB 0x02 +#define IEEE80211_CCK_RATE_2MB 0x04 +#define IEEE80211_CCK_RATE_5MB 0x0B +#define IEEE80211_CCK_RATE_11MB 0x16 +#define IEEE80211_OFDM_RATE_6MB 0x0C +#define IEEE80211_OFDM_RATE_9MB 0x12 +#define IEEE80211_OFDM_RATE_12MB 0x18 +#define IEEE80211_OFDM_RATE_18MB 0x24 +#define IEEE80211_OFDM_RATE_24MB 0x30 +#define IEEE80211_OFDM_RATE_36MB 0x48 +#define IEEE80211_OFDM_RATE_48MB 0x60 +#define IEEE80211_OFDM_RATE_54MB 0x6C +#define IEEE80211_BASIC_RATE_MASK 0x80 + +#define IEEE80211_CCK_RATE_1MB_MASK (1<<0) +#define IEEE80211_CCK_RATE_2MB_MASK (1<<1) +#define IEEE80211_CCK_RATE_5MB_MASK (1<<2) +#define IEEE80211_CCK_RATE_11MB_MASK (1<<3) +#define IEEE80211_OFDM_RATE_6MB_MASK (1<<4) +#define IEEE80211_OFDM_RATE_9MB_MASK (1<<5) +#define IEEE80211_OFDM_RATE_12MB_MASK (1<<6) +#define IEEE80211_OFDM_RATE_18MB_MASK (1<<7) +#define IEEE80211_OFDM_RATE_24MB_MASK (1<<8) +#define IEEE80211_OFDM_RATE_36MB_MASK (1<<9) +#define IEEE80211_OFDM_RATE_48MB_MASK (1<<10) +#define IEEE80211_OFDM_RATE_54MB_MASK (1<<11) + +#define IEEE80211_CCK_RATES_MASK 0x0000000F +#define IEEE80211_CCK_BASIC_RATES_MASK (IEEE80211_CCK_RATE_1MB_MASK | \ + IEEE80211_CCK_RATE_2MB_MASK) +#define IEEE80211_CCK_DEFAULT_RATES_MASK (IEEE80211_CCK_BASIC_RATES_MASK | \ + IEEE80211_CCK_RATE_5MB_MASK | \ + IEEE80211_CCK_RATE_11MB_MASK) + +#define IEEE80211_OFDM_RATES_MASK 0x00000FF0 +#define IEEE80211_OFDM_BASIC_RATES_MASK (IEEE80211_OFDM_RATE_6MB_MASK | \ + IEEE80211_OFDM_RATE_12MB_MASK | \ + IEEE80211_OFDM_RATE_24MB_MASK) +#define IEEE80211_OFDM_DEFAULT_RATES_MASK (IEEE80211_OFDM_BASIC_RATES_MASK | \ + IEEE80211_OFDM_RATE_9MB_MASK | \ + IEEE80211_OFDM_RATE_18MB_MASK | \ + IEEE80211_OFDM_RATE_36MB_MASK | \ + IEEE80211_OFDM_RATE_48MB_MASK | \ + IEEE80211_OFDM_RATE_54MB_MASK) +#define IEEE80211_DEFAULT_RATES_MASK (IEEE80211_OFDM_DEFAULT_RATES_MASK | \ + IEEE80211_CCK_DEFAULT_RATES_MASK) + +#define IEEE80211_NUM_OFDM_RATES 8 +#define IEEE80211_NUM_CCK_RATES 4 +#define IEEE80211_OFDM_SHIFT_MASK_A 4 + + + + +/* NOTE: This data is for statistical purposes; not all hardware provides this + * information for frames received. Not setting these will not cause + * any adverse affects. */ +struct ieee80211_rx_stats { + u32 mac_time; + s8 rssi; + u8 signal; + u8 noise; + u16 rate; /* in 100 kbps */ + u8 received_channel; + u8 control; + u8 mask; + u8 freq; + u16 len; +}; + +/* IEEE 802.11 requires that STA supports concurrent reception of at least + * three fragmented frames. This define can be increased to support more + * concurrent frames, but it should be noted that each entry can consume about + * 2 kB of RAM and increasing cache size will slow down frame reassembly. */ +#define IEEE80211_FRAG_CACHE_LEN 4 + +struct ieee80211_frag_entry { + unsigned long first_frag_time; + unsigned int seq; + unsigned int last_frag; + struct sk_buff *skb; + u8 src_addr[ETH_ALEN]; + u8 dst_addr[ETH_ALEN]; +}; + +struct ieee80211_stats { + unsigned int tx_unicast_frames; + unsigned int tx_multicast_frames; + unsigned int tx_fragments; + unsigned int tx_unicast_octets; + unsigned int tx_multicast_octets; + unsigned int tx_deferred_transmissions; + unsigned int tx_single_retry_frames; + unsigned int tx_multiple_retry_frames; + unsigned int tx_retry_limit_exceeded; + unsigned int tx_discards; + unsigned int rx_unicast_frames; + unsigned int rx_multicast_frames; + unsigned int rx_fragments; + unsigned int rx_unicast_octets; + unsigned int rx_multicast_octets; + unsigned int rx_fcs_errors; + unsigned int rx_discards_no_buffer; + unsigned int tx_discards_wrong_sa; + unsigned int rx_discards_undecryptable; + unsigned int rx_message_in_msg_fragments; + unsigned int rx_message_in_bad_msg_fragments; +}; + +struct ieee80211_device; + +#define SEC_KEY_1 (1<<0) +#define SEC_KEY_2 (1<<1) +#define SEC_KEY_3 (1<<2) +#define SEC_KEY_4 (1<<3) +#define SEC_ACTIVE_KEY (1<<4) +#define SEC_AUTH_MODE (1<<5) +#define SEC_UNICAST_GROUP (1<<6) +#define SEC_LEVEL (1<<7) +#define SEC_ENABLED (1<<8) + +#define SEC_LEVEL_0 0 /* None */ +#define SEC_LEVEL_1 1 /* WEP 40 and 104 bit */ +#define SEC_LEVEL_2 2 /* Level 1 + TKIP */ +#define SEC_LEVEL_2_CKIP 3 /* Level 1 + CKIP */ +#define SEC_LEVEL_3 4 /* Level 2 + CCMP */ + +#define WEP_KEYS 4 +#define WEP_KEY_LEN 13 + +struct ieee80211_security { + u16 active_key:2, + enabled:1, + auth_mode:2, + auth_algo:4, + unicast_uses_group:1; + u8 key_sizes[WEP_KEYS]; + u8 keys[WEP_KEYS][WEP_KEY_LEN]; + u8 level; + u16 flags; +} __attribute__ ((packed)); + + +/* + + 802.11 data frame from AP + + ,-------------------------------------------------------------------. +Bytes | 2 | 2 | 6 | 6 | 6 | 2 | 0..2312 | 4 | + |------|------|---------|---------|---------|------|---------|------| +Desc. | ctrl | dura | DA/RA | TA | SA | Sequ | frame | fcs | + | | tion | (BSSID) | | | ence | data | | + `-------------------------------------------------------------------' + +Total: 28-2340 bytes + +*/ + +struct ieee80211_header_data { + u16 frame_ctl; + u16 duration_id; + u8 addr1[6]; + u8 addr2[6]; + u8 addr3[6]; + u16 seq_ctrl; +}; + +#define BEACON_PROBE_SSID_ID_POSITION 12 + +/* Management Frame Information Element Types */ +#define MFIE_TYPE_SSID 0 +#define MFIE_TYPE_RATES 1 +#define MFIE_TYPE_FH_SET 2 +#define MFIE_TYPE_DS_SET 3 +#define MFIE_TYPE_CF_SET 4 +#define MFIE_TYPE_TIM 5 +#define MFIE_TYPE_IBSS_SET 6 +#define MFIE_TYPE_CHALLENGE 16 +#define MFIE_TYPE_RSN 48 +#define MFIE_TYPE_RATES_EX 50 +#define MFIE_TYPE_GENERIC 221 + +struct ieee80211_info_element_hdr { + u8 id; + u8 len; +} __attribute__ ((packed)); + +struct ieee80211_info_element { + u8 id; + u8 len; + u8 data[0]; +} __attribute__ ((packed)); + +/* + * These are the data types that can make up management packets + * + u16 auth_algorithm; + u16 auth_sequence; + u16 beacon_interval; + u16 capability; + u8 current_ap[ETH_ALEN]; + u16 listen_interval; + struct { + u16 association_id:14, reserved:2; + } __attribute__ ((packed)); + u32 time_stamp[2]; + u16 reason; + u16 status; +*/ + +struct ieee80211_authentication { + struct ieee80211_header_data header; + u16 algorithm; + u16 transaction; + u16 status; + struct ieee80211_info_element info_element; +} __attribute__ ((packed)); + + +struct ieee80211_probe_response { + struct ieee80211_header_data header; + u32 time_stamp[2]; + u16 beacon_interval; + u16 capability; + struct ieee80211_info_element info_element; +} __attribute__ ((packed)); + +struct ieee80211_assoc_request_frame { + u16 capability; + u16 listen_interval; + u8 current_ap[ETH_ALEN]; + struct ieee80211_info_element info_element; +} __attribute__ ((packed)); + +struct ieee80211_assoc_response_frame { + struct ieee80211_hdr_3addr header; + u16 capability; + u16 status; + u16 aid; + struct ieee80211_info_element info_element; /* supported rates */ +} __attribute__ ((packed)); + + +struct ieee80211_txb { + u8 nr_frags; + u8 encrypted; + u16 reserved; + u16 frag_size; + u16 payload_size; + struct sk_buff *fragments[0]; +}; + + +/* SWEEP TABLE ENTRIES NUMBER*/ +#define MAX_SWEEP_TAB_ENTRIES 42 +#define MAX_SWEEP_TAB_ENTRIES_PER_PACKET 7 +/* MAX_RATES_LENGTH needs to be 12. The spec says 8, and many APs + * only use 8, and then use extended rates for the remaining supported + * rates. Other APs, however, stick all of their supported rates on the + * main rates information element... */ +#define MAX_RATES_LENGTH ((u8)12) +#define MAX_RATES_EX_LENGTH ((u8)16) +#define MAX_NETWORK_COUNT 128 + +#define CRC_LENGTH 4U + +#define MAX_WPA_IE_LEN 64 + +#define NETWORK_EMPTY_ESSID (1<<0) +#define NETWORK_HAS_OFDM (1<<1) +#define NETWORK_HAS_CCK (1<<2) + +struct ieee80211_network { + /* These entries are used to identify a unique network */ + u8 bssid[ETH_ALEN]; + u8 channel; + /* Ensure null-terminated for any debug msgs */ + u8 ssid[IW_ESSID_MAX_SIZE + 1]; + u8 ssid_len; + + /* These are network statistics */ + struct ieee80211_rx_stats stats; + u16 capability; + u8 rates[MAX_RATES_LENGTH]; + u8 rates_len; + u8 rates_ex[MAX_RATES_EX_LENGTH]; + u8 rates_ex_len; + unsigned long last_scanned; + u8 mode; + u8 flags; + u32 last_associate; + u32 time_stamp[2]; + u16 beacon_interval; + u16 listen_interval; + u16 atim_window; + u8 wpa_ie[MAX_WPA_IE_LEN]; + size_t wpa_ie_len; + u8 rsn_ie[MAX_WPA_IE_LEN]; + size_t rsn_ie_len; + struct list_head list; +}; + +enum ieee80211_state { + IEEE80211_UNINITIALIZED = 0, + IEEE80211_INITIALIZED, + IEEE80211_ASSOCIATING, + IEEE80211_ASSOCIATED, + IEEE80211_AUTHENTICATING, + IEEE80211_AUTHENTICATED, + IEEE80211_SHUTDOWN +}; + +#define DEFAULT_MAX_SCAN_AGE (15 * HZ) +#define DEFAULT_FTS 2346 +#define MAC_FMT "%02x:%02x:%02x:%02x:%02x:%02x" +#define MAC_ARG(x) ((u8*)(x))[0],((u8*)(x))[1],((u8*)(x))[2],((u8*)(x))[3],((u8*)(x))[4],((u8*)(x))[5] + + +extern inline int is_broadcast_ether_addr(const u8 *addr) +{ + return ((addr[0] == 0xff) && (addr[1] == 0xff) && (addr[2] == 0xff) && \ + (addr[3] == 0xff) && (addr[4] == 0xff) && (addr[5] == 0xff)); +} + +#define CFG_IEEE80211_RESERVE_FCS (1<<0) +#define CFG_IEEE80211_COMPUTE_FCS (1<<1) + +struct ieee80211_device { + struct net_device *dev; + + /* Bookkeeping structures */ + struct net_device_stats stats; + struct ieee80211_stats ieee_stats; + + /* Probe / Beacon management */ + struct list_head network_free_list; + struct list_head network_list; + struct ieee80211_network *networks; + int scans; + int scan_age; + + int iw_mode; /* operating mode (IW_MODE_*) */ + + spinlock_t lock; + + int tx_headroom; /* Set to size of any additional room needed at front + * of allocated Tx SKBs */ + u32 config; + + /* WEP and other encryption related settings at the device level */ + int open_wep; /* Set to 1 to allow unencrypted frames */ + + int reset_on_keychange; /* Set to 1 if the HW needs to be reset on + * WEP key changes */ + + /* If the host performs {en,de}cryption, then set to 1 */ + int host_encrypt; + int host_decrypt; + int ieee802_1x; /* is IEEE 802.1X used */ + + /* WPA data */ + int wpa_enabled; + int drop_unencrypted; + int tkip_countermeasures; + int privacy_invoked; + size_t wpa_ie_len; + u8 *wpa_ie; + + struct list_head crypt_deinit_list; + struct ieee80211_crypt_data *crypt[WEP_KEYS]; + int tx_keyidx; /* default TX key index (crypt[tx_keyidx]) */ + struct timer_list crypt_deinit_timer; + + int bcrx_sta_key; /* use individual keys to override default keys even + * with RX of broad/multicast frames */ + + /* Fragmentation structures */ + struct ieee80211_frag_entry frag_cache[IEEE80211_FRAG_CACHE_LEN]; + unsigned int frag_next_idx; + u16 fts; /* Fragmentation Threshold */ + + /* Association info */ + u8 bssid[ETH_ALEN]; + + enum ieee80211_state state; + + int mode; /* A, B, G */ + int modulation; /* CCK, OFDM */ + int freq_band; /* 2.4Ghz, 5.2Ghz, Mixed */ + int abg_ture; /* ABG flag */ + + /* Callback functions */ + void (*set_security)(struct net_device *dev, + struct ieee80211_security *sec); + int (*hard_start_xmit)(struct ieee80211_txb *txb, + struct net_device *dev); + int (*reset_port)(struct net_device *dev); + + /* This must be the last item so that it points to the data + * allocated beyond this structure by alloc_ieee80211 */ + u8 priv[0]; +}; + +#define IEEE_A (1<<0) +#define IEEE_B (1<<1) +#define IEEE_G (1<<2) +#define IEEE_MODE_MASK (IEEE_A|IEEE_B|IEEE_G) + +extern inline void *ieee80211_priv(struct net_device *dev) +{ + return ((struct ieee80211_device *)netdev_priv(dev))->priv; +} + +extern inline int ieee80211_is_empty_essid(const char *essid, int essid_len) +{ + /* Single white space is for Linksys APs */ + if (essid_len == 1 && essid[0] == ' ') + return 1; + + /* Otherwise, if the entire essid is 0, we assume it is hidden */ + while (essid_len) { + essid_len--; + if (essid[essid_len] != '\0') + return 0; + } + + return 1; +} + +extern inline int ieee80211_is_valid_mode(struct ieee80211_device *ieee, int mode) +{ + /* + * It is possible for both access points and our device to support + * combinations of modes, so as long as there is one valid combination + * of ap/device supported modes, then return success + * + */ + if ((mode & IEEE_A) && + (ieee->modulation & IEEE80211_OFDM_MODULATION) && + (ieee->freq_band & IEEE80211_52GHZ_BAND)) + return 1; + + if ((mode & IEEE_G) && + (ieee->modulation & IEEE80211_OFDM_MODULATION) && + (ieee->freq_band & IEEE80211_24GHZ_BAND)) + return 1; + + if ((mode & IEEE_B) && + (ieee->modulation & IEEE80211_CCK_MODULATION) && + (ieee->freq_band & IEEE80211_24GHZ_BAND)) + return 1; + + return 0; +} + +extern inline int ieee80211_get_hdrlen(u16 fc) +{ + int hdrlen = 24; + + switch (WLAN_FC_GET_TYPE(fc)) { + case IEEE80211_FTYPE_DATA: + if ((fc & IEEE80211_FCTL_FROMDS) && (fc & IEEE80211_FCTL_TODS)) + hdrlen = 30; /* Addr4 */ + break; + case IEEE80211_FTYPE_CTL: + switch (WLAN_FC_GET_STYPE(fc)) { + case IEEE80211_STYPE_CTS: + case IEEE80211_STYPE_ACK: + hdrlen = 10; + break; + default: + hdrlen = 16; + break; + } + break; + } + + return hdrlen; +} + + + +/* ieee80211.c */ +extern void free_ieee80211(struct net_device *dev); +extern struct net_device *alloc_ieee80211(int sizeof_priv); + +extern int ieee80211_set_encryption(struct ieee80211_device *ieee); + +/* ieee80211_tx.c */ + + +extern int ieee80211_xmit(struct sk_buff *skb, + struct net_device *dev); +extern void ieee80211_txb_free(struct ieee80211_txb *); + + +/* ieee80211_rx.c */ +extern int ieee80211_rx(struct ieee80211_device *ieee, struct sk_buff *skb, + struct ieee80211_rx_stats *rx_stats); +extern void ieee80211_rx_mgt(struct ieee80211_device *ieee, + struct ieee80211_hdr *header, + struct ieee80211_rx_stats *stats); + +/* iee80211_wx.c */ +extern int ieee80211_wx_get_scan(struct ieee80211_device *ieee, + struct iw_request_info *info, + union iwreq_data *wrqu, char *key); +extern int ieee80211_wx_set_encode(struct ieee80211_device *ieee, + struct iw_request_info *info, + union iwreq_data *wrqu, char *key); +extern int ieee80211_wx_get_encode(struct ieee80211_device *ieee, + struct iw_request_info *info, + union iwreq_data *wrqu, char *key); + + +extern inline void ieee80211_increment_scans(struct ieee80211_device *ieee) +{ + ieee->scans++; +} + +extern inline int ieee80211_get_scans(struct ieee80211_device *ieee) +{ + return ieee->scans; +} + +static inline const char *escape_essid(const char *essid, u8 essid_len) { + static char escaped[IW_ESSID_MAX_SIZE * 2 + 1]; + const char *s = essid; + char *d = escaped; + + if (ieee80211_is_empty_essid(essid, essid_len)) { + memcpy(escaped, "", sizeof("")); + return escaped; + } + + essid_len = min(essid_len, (u8)IW_ESSID_MAX_SIZE); + while (essid_len--) { + if (*s == '\0') { + *d++ = '\\'; + *d++ = '0'; + s++; + } else { + *d++ = *s++; + } + } + *d = '\0'; + return escaped; +} + +#endif /* IEEE80211_H */ From hch@lst.de Sat Jun 18 16:28:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:28:46 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INSeH9022655 for ; Sat, 18 Jun 2005 16:28:42 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INRC6t010103 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:27:12 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INRA6H010101; Sun, 19 Jun 2005 01:27:10 +0200 Date: Sun, 19 Jun 2005 01:27:10 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 0/9] orinoco updates Message-ID: <20050618232710.GA9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2459 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 366 Lines: 9 This is a tweleve patch series to bring orinoco in mainline uptodate with CVS. I've also use the new ieee80211.h constants from the Jeff's wireless tree, so the first patch brings over a copy of that one. I'm mentioning the original authors wherever I could find them out, but most patches actually consists of various CVS commits that belong together logically. From hch@lst.de Sat Jun 18 16:29:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:29:40 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INTZH9023117 for ; Sat, 18 Jun 2005 16:29:36 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INSC6t010236 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:28:12 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INSCvD010234; Sun, 19 Jun 2005 01:28:12 +0200 Date: Sun, 19 Jun 2005 01:28:12 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 9/9] orinoco: update changelog and version Message-ID: <20050618232812.GJ9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2468 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 3230 Lines: 70 diff -urp -X /home/hch/dontdiff linux-2.6/drivers/net/wireless/orinoco.c orinoco/orinoco.c --- linux-2.6/drivers/net/wireless/orinoco.c 2005-05-14 13:49:02.000000000 +0200 +++ orinoco/orinoco.c 2005-05-14 13:05:46.000000000 +0200 @@ -416,12 +416,50 @@ * port. This is supposed to allow 802.1x to work sanely, but * doesn't seem to yet. * + * v0.14alpha2 -> v0.15rc1 - 19 Apr 2004 - Pavel Roskin & David Gibson + * o Fix bug which prevented setting 32 character ESSIDs from + * iwconfig (Thomas Schulz). + * o Fix for incorrect CIS access in orinoco_plx (Pavel Roskin). + * o Fix setting WEP key if __orinoco_fastkeychange() is not + * supported (Pavel Roskin). + * o New wireless extensions API and scanning support (patch from + * Moustafa Youssef, updated by Jim Carter and Pavel Roskin). + * o Add minimal ethtool support (Pavel Roskin). + * o Replace CardServices() calls for compatibility with Linux + * 2.6.2 and above (Pavel Roskin). + * o Fix recognition of Intersil x.x.1 firmware (Pavel Roskin). + * o Replace dump_recs with more flexible get_rid ioctl (Pavel + * Roskin). + * o RF monitor mode support (Pavel Roskin). + * o Lots of bugfixes. + * + * v0.15rc1 -> v0.15rc2 - 28 Jul 2004 - Pavel Roskin & David Gibson + * o orinoco_pci saves PCI registers on suspend (Simon Huggins). + * o Monitor mode disabled on Agere 8.xx firmware - it's broken. + * o BAP timeout increased - needed for Intersil firmware. + * o Tx power is no longer reported - it's unreliable. + * o Use 802.11 header in rx path. Hide packets with ToDS flag + * from programs that don't need promiscous mode (John Denker). + * o Manual roaming implemented for Symbol and Intersil firmware. + * o Use netdev_priv() instead of directly dereferencing dev->priv. + * o Some simplification of pcmcia init code in orinoco_cs and + * spectrum_cs. + * o Numerous trivial cleanups, mainly arising from long-overdue + * merge with mainline. + * + * v0.15rc2 -> ???? - ???? - David Gibson + * o Use ssleep() or msleep() instead of hardcoded + * schedule_timeout()s (Nishanth Aravamudan via kernel-janitors + * list). + * o Several cleanups and bugfixes in pci/plx/tmd/nortel drivers. + * o Fix memory leak in orinoco_join_ap(). + * o Change io handling to avoid sparse and gcc warnings. + * o Use C99 array initializers and ARRAY_SIZE() for iw_handler + * tables. + * * TODO - * o New wireless extensions API (patch from Moustafa - * Youssef, updated by Jim Carter and Pavel Roskin). * o Handle de-encapsulation within network layer, provide 802.11 * headers (patch from Thomas 'Dent' Mirlacher) - * o RF monitor mode support * o Fix possible races in SPY handling. * o Disconnect wireless extensions from fundamental configuration. * o (maybe) Software WEP support (patch from Stano Meduna). diff -urp -X /home/hch/dontdiff linux-2.6/drivers/net/wireless/orinoco.h orinoco/orinoco.h --- linux-2.6/drivers/net/wireless/orinoco.h 2005-05-14 13:48:44.000000000 +0200 +++ orinoco/orinoco.h 2005-03-18 16:03:24.000000000 +0100 @@ -7,7 +7,7 @@ #ifndef _ORINOCO_H #define _ORINOCO_H -#define DRIVER_VERSION "0.14alpha2" +#define DRIVER_VERSION "0.15rc2" #include #include From hch@lst.de Sat Jun 18 16:29:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 18 Jun 2005 16:29:31 -0700 (PDT) Received: from mail.lst.de (verein.lst.de [213.95.11.210]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5INTQH9023054 for ; Sat, 18 Jun 2005 16:29:26 -0700 Received: from verein.lst.de (localhost [127.0.0.1]) by mail.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j5INS26t010210 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Sun, 19 Jun 2005 01:28:02 +0200 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-6.6) id j5INS2oJ010208; Sun, 19 Jun 2005 01:28:02 +0200 Date: Sun, 19 Jun 2005 01:28:02 +0200 From: Christoph Hellwig To: jgarzik@pobox.com, proski@gnu.org, hermes@gibson.dropbear.id.au Cc: netdev@oss.sgi.com Subject: [PATCH 7/9] orinoco: always use 802.11 header for rx processing Message-ID: <20050618232802.GH9918@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2465 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hch@lst.de Precedence: bulk X-list: netdev Content-Length: 5678 Lines: 185 If the frame has ToDS flag set, mark it by setting skb->pkt_type to PACKET_OTHERHOST, so that applications unaware of promiscous mode won't get uplink (STA->AP) packets for STA->STA transmissions relayed by the AP. Thanks to John Denker and David Gibson for finding the problem and the solution. Patch from Pavel Roskin Index: linux-2.6/drivers/net/wireless/orinoco.c =================================================================== --- linux-2.6.orig/drivers/net/wireless/orinoco.c 2005-06-19 01:03:51.000000000 +0200 +++ linux-2.6/drivers/net/wireless/orinoco.c 2005-06-19 01:04:28.000000000 +0200 @@ -619,7 +619,9 @@ u16 ethertype; } __attribute__ ((packed)); +/* Rx frame header except compatibility 802.3 header */ struct hermes_rx_descriptor { + /* Control */ u16 status; u32 time; u8 silence; @@ -627,6 +629,18 @@ u8 rate; u8 rxflow; u32 reserved; + + /* 802.11 header */ + u16 frame_ctl; + u16 duration_id; + u8 addr1[ETH_ALEN]; + u8 addr2[ETH_ALEN]; + u8 addr3[ETH_ALEN]; + u16 seq_ctl; + u8 addr4[ETH_ALEN]; + + /* Data length */ + u16 data_len; } __attribute__ ((packed)); /********************************************************************/ @@ -1110,12 +1124,10 @@ struct net_device_stats *stats = &priv->stats; struct iw_statistics *wstats = &priv->wstats; struct sk_buff *skb = NULL; - u16 rxfid, status; - int length, data_len, data_off; - char *p; + u16 rxfid, status, fc; + int length; struct hermes_rx_descriptor desc; - struct header_struct hdr; - struct ethhdr *eh; + struct ethhdr *hdr; int err; rxfid = hermes_read_regn(hw, RXFID); @@ -1140,24 +1152,14 @@ stats->rx_crc_errors++; DEBUG(1, "%s: Bad CRC on Rx. Frame dropped.\n", dev->name); } - stats->rx_errors++; - goto drop; - } - - /* For now we ignore the 802.11 header completely, assuming - that the card's firmware has handled anything vital */ - err = hermes_bap_pread(hw, IRQ_BAP, &hdr, sizeof(hdr), - rxfid, HERMES_802_3_OFFSET); - if (err) { - printk(KERN_ERR "%s: error %d reading frame header. " - "Frame dropped.\n", dev->name, err); stats->rx_errors++; goto drop; } - length = ntohs(hdr.len); - + length = le16_to_cpu(desc.data_len); + fc = le16_to_cpu(desc.frame_ctl); + /* Sanity checks */ if (length < 3) { /* No for even an 802.2 LLC header */ /* At least on Symbol firmware with PCF we get quite a @@ -1186,57 +1188,51 @@ goto drop; } - skb_reserve(skb, 2); /* This way the IP header is aligned */ + /* We'll prepend the header, so reserve space for it. The worst + case is no decapsulation, when 802.3 header is prepended and + nothing is removed. 2 is for aligning the IP header. */ + skb_reserve(skb, ETH_HLEN + 2); + + err = hermes_bap_pread(hw, IRQ_BAP, skb_put(skb, length), + ALIGN(length, 2), rxfid, + HERMES_802_2_OFFSET); + if (err) { + printk(KERN_ERR "%s: error %d reading frame. " + "Frame dropped.\n", dev->name, err); + stats->rx_errors++; + goto drop; + } /* Handle decapsulation * In most cases, the firmware tell us about SNAP frames. * For some reason, the SNAP frames sent by LinkSys APs * are not properly recognised by most firmwares. * So, check ourselves */ - if (((status & HERMES_RXSTAT_MSGTYPE) == HERMES_RXSTAT_1042) || - ((status & HERMES_RXSTAT_MSGTYPE) == HERMES_RXSTAT_TUNNEL) || - is_ethersnap(&hdr)) { + if (length >= ENCAPS_OVERHEAD && + (((status & HERMES_RXSTAT_MSGTYPE) == HERMES_RXSTAT_1042) || + ((status & HERMES_RXSTAT_MSGTYPE) == HERMES_RXSTAT_TUNNEL) || + is_ethersnap(skb->data))) { /* These indicate a SNAP within 802.2 LLC within 802.11 frame which we'll need to de-encapsulate to the original EthernetII frame. */ - - if (length < ENCAPS_OVERHEAD) { /* No room for full LLC+SNAP */ - stats->rx_length_errors++; - goto drop; - } - - /* Remove SNAP header, reconstruct EthernetII frame */ - data_len = length - ENCAPS_OVERHEAD; - data_off = HERMES_802_3_OFFSET + sizeof(hdr); - - eh = (struct ethhdr *)skb_put(skb, ETH_HLEN); - - memcpy(eh, &hdr, 2 * ETH_ALEN); - eh->h_proto = hdr.ethertype; + hdr = (struct ethhdr *)skb_push(skb, ETH_HLEN - ENCAPS_OVERHEAD); } else { - /* All other cases indicate a genuine 802.3 frame. No - decapsulation needed. We just throw the whole - thing in, and hope the protocol layer can deal with - it as 802.3 */ - data_len = length; - data_off = HERMES_802_3_OFFSET; - /* FIXME: we re-read from the card data we already read here */ - } - - p = skb_put(skb, data_len); - err = hermes_bap_pread(hw, IRQ_BAP, p, ALIGN(data_len, 2), - rxfid, data_off); - if (err) { - printk(KERN_ERR "%s: error %d reading frame. " - "Frame dropped.\n", dev->name, err); - stats->rx_errors++; - goto drop; - } + /* 802.3 frame - prepend 802.3 header as is */ + hdr = (struct ethhdr *)skb_push(skb, ETH_HLEN); + hdr->h_proto = htons(length); + } + memcpy(hdr->h_dest, desc.addr1, ETH_ALEN); + if (fc & IEEE80211_FCTL_FROMDS) + memcpy(hdr->h_source, desc.addr3, ETH_ALEN); + else + memcpy(hdr->h_source, desc.addr2, ETH_ALEN); dev->last_rx = jiffies; skb->dev = dev; skb->protocol = eth_type_trans(skb, dev); skb->ip_summed = CHECKSUM_NONE; + if (fc & IEEE80211_FCTL_TODS) + skb->pkt_type = PACKET_OTHERHOST; /* Process the wireless stats if needed */ orinoco_stat_gather(dev, skb, &desc); @@ -1457,6 +1453,9 @@ u16 newstatus; int connected; + if (priv->iw_mode == IW_MODE_MONITOR) + break; + if (len != sizeof(linkstatus)) { printk(KERN_WARNING "%s: Unexpected size for linkstatus frame (%d bytes)\n", dev->name, len); From oxymoron@waste.org Sun Jun 19 11:15:54 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 19 Jun 2005 11:15:57 -0700 (PDT) Received: from waste.org (waste.org [216.27.176.166]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5JIFsH9023109 for ; Sun, 19 Jun 2005 11:15:54 -0700 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.4/8.13.4/Debian-3) with ESMTP id j5JIEaXp001672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 19 Jun 2005 13:14:36 -0500 Received: (from oxymoron@localhost) by waste.org (8.13.4/8.13.4/Submit) id j5JIEasj001669; Sun, 19 Jun 2005 13:14:36 -0500 Date: Sun, 19 Jun 2005 11:14:36 -0700 From: Matt Mackall To: Jeff Moyer Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: netpoll and the bonding driver Message-ID: <20050619181436.GX27572@waste.org> References: <17075.10995.498758.773092@segfault.boston.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17075.10995.498758.773092@segfault.boston.redhat.com> User-Agent: Mutt/1.5.9i X-archive-position: 2470 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: netdev Content-Length: 313 Lines: 10 On Fri, Jun 17, 2005 at 03:56:35PM -0400, Jeff Moyer wrote: > Hi, > > I'm trying to implement a netpoll hook for the bonding driver. My first question would be: does this really make sense to do? Why not just bind netpoll to one of the underlying devices? -- Mathematics is the supreme nostalgia of our time. From linville@bilbo.tuxdriver.com Sun Jun 19 17:22:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 19 Jun 2005 17:22:46 -0700 (PDT) Received: from apollo.tuxdriver.com (apollo.tuxdriver.com [24.172.12.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5K0MfH9011164 for ; Sun, 19 Jun 2005 17:22:42 -0700 Received: from bilbo.tuxdriver.com (azure.tuxdriver.com [24.172.12.5]) by apollo.tuxdriver.com (8.12.11/8.12.11) with ESMTP id j5JNI8gk001255; Sun, 19 Jun 2005 19:18:08 -0400 Received: from bilbo.tuxdriver.com (localhost.localdomain [127.0.0.1]) by bilbo.tuxdriver.com (8.13.1/8.13.1) with ESMTP id j5K0LMac019257; Sun, 19 Jun 2005 20:21:22 -0400 Received: (from linville@localhost) by bilbo.tuxdriver.com (8.13.1/8.13.1/Submit) id j5K0LK6W019256; Sun, 19 Jun 2005 20:21:20 -0400 Date: Sun, 19 Jun 2005 20:21:20 -0400 From: "John W. Linville" To: Matt Mackall Cc: Jeff Moyer , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: netpoll and the bonding driver Message-ID: <20050620002118.GA16859@tuxdriver.com> Mail-Followup-To: Matt Mackall , Jeff Moyer , netdev@oss.sgi.com, linux-kernel@vger.kernel.org References: <17075.10995.498758.773092@segfault.boston.redhat.com> <20050619181436.GX27572@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050619181436.GX27572@waste.org> User-Agent: Mutt/1.4.1i X-archive-position: 2471 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: linville@tuxdriver.com Precedence: bulk X-list: netdev Content-Length: 547 Lines: 16 On Sun, Jun 19, 2005 at 11:14:36AM -0700, Matt Mackall wrote: > On Fri, Jun 17, 2005 at 03:56:35PM -0400, Jeff Moyer wrote: > > I'm trying to implement a netpoll hook for the bonding driver. > > My first question would be: does this really make sense to do? Why not > just bind netpoll to one of the underlying devices? Depending on the bonding mode, this would be very unlikely to work. The other side of the link will still be expecting to talk to the bond rather than to an individual link. John -- John W. Linville linville@tuxdriver.com From linville@bilbo.tuxdriver.com Sun Jun 19 17:24:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 19 Jun 2005 17:24:12 -0700 (PDT) Received: from apollo.tuxdriver.com (apollo.tuxdriver.com [24.172.12.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5K0O9H9011653 for ; Sun, 19 Jun 2005 17:24:09 -0700 Received: from bilbo.tuxdriver.com (azure.tuxdriver.com [24.172.12.5]) by apollo.tuxdriver.com (8.12.11/8.12.11) with ESMTP id j5JNJVHk001258; Sun, 19 Jun 2005 19:19:31 -0400 Received: from bilbo.tuxdriver.com (localhost.localdomain [127.0.0.1]) by bilbo.tuxdriver.com (8.13.1/8.13.1) with ESMTP id j5K0MjRj019263; Sun, 19 Jun 2005 20:22:45 -0400 Received: (from linville@localhost) by bilbo.tuxdriver.com (8.13.1/8.13.1/Submit) id j5K0Mjnk019262; Sun, 19 Jun 2005 20:22:45 -0400 Date: Sun, 19 Jun 2005 20:22:45 -0400 From: "John W. Linville" To: Malli Chilakala Cc: "jgarzik@pobox.com" , netdev Subject: Re: [PATCH net-drivers-2.6 0/9] ixgb: driver update Message-ID: <20050620002244.GB16859@tuxdriver.com> Mail-Followup-To: Malli Chilakala , "jgarzik@pobox.com" , netdev References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-archive-position: 2472 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: linville@tuxdriver.com Precedence: bulk X-list: netdev Content-Length: 648 Lines: 18 On Fri, Jun 17, 2005 at 04:54:36PM -0700, Malli Chilakala wrote: > ixgb: driver update > 1. Set RXDCTL:PTHRESH/HTHRESH to zero > 2. Fix unnecessary link state messages > 3. Use netdev_priv() instead of netdev->priv > 4. Fix Broadcast/Multicast packets received statistics > 5. Fix data output by ethtool -d > 6. Ethtool cleanup patch from Stephen Hemminger > 7. Remove unused functions, render some variable static instead of global > 8. Redefined buffer_info-dma to be dma_addr_t instead of uint64 > 9. Driver version & white space fixes Hmmm...I only got parts 1 & 2...anyone else missing parts? -- John W. Linville linville@tuxdriver.com From pavel@ucw.cz Mon Jun 20 04:31:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 20 Jun 2005 04:31:27 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5KBVAH9029178 for ; Mon, 20 Jun 2005 04:31:13 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 64C718B8EC; Mon, 20 Jun 2005 13:29:50 +0200 (CEST) Date: Mon, 20 Jun 2005 13:29:50 +0200 From: Pavel Machek To: Netdev list , Andrew Morton , Jeff Garzik , "James P. Ketrenos" Subject: [patch] ipw2100: remove by-hand function entry/exit debugging Message-ID: <20050620112950.GA12102@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2473 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 8904 Lines: 427 This removes debug prints from entry/exit of functions. Such level of debugging should probably be done by gdb or similar. Signed-off-by: Pavel Machek --- clean-mm/drivers/net/wireless/ipw2100.c 2005-06-20 12:34:33.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.c 2005-06-20 12:55:03.000000000 +0200 @@ -1083,8 +1046,6 @@ { struct ipw2100_ordinals *ord = &priv->ordinals; - IPW_DEBUG_INFO("enter\n"); - read_register(priv->net_dev, IPW_MEM_HOST_SHARED_ORDINALS_TABLE_1, &ord->table1_addr); @@ -1095,10 +1056,6 @@ read_nic_dword(priv->net_dev, ord->table2_addr, &ord->table2_size); ord->table2_size &= 0x0000FFFF; - - IPW_DEBUG_INFO("table 1 size: %d\n", ord->table1_size); - IPW_DEBUG_INFO("table 2 size: %d\n", ord->table2_size); - IPW_DEBUG_INFO("exit\n"); } static inline void ipw2100_hw_set_gpio(struct ipw2100_priv *priv) @@ -1196,8 +1153,6 @@ int i; u32 inta, inta_mask, gpio; - IPW_DEBUG_INFO("enter\n"); - if (priv->status & STATUS_RUNNING) return 0; @@ -1284,9 +1239,6 @@ /* The adapter has been reset; we are not associated */ priv->status &= ~(STATUS_ASSOCIATING | STATUS_ASSOCIATED); - - IPW_DEBUG_INFO("exit\n"); - return 0; } @@ -1596,8 +1548,6 @@ }; int err; - IPW_DEBUG_INFO("enter\n"); - IPW_DEBUG_SCAN("setting scan options\n"); cmd.host_command_parameters[0] = 0; @@ -1641,8 +1591,6 @@ return 0; } - IPW_DEBUG_INFO("enter\n"); - /* Not clearing here; doing so makes iwlist always return nothing... * * We should modify the table logic to use aging tables vs. clearing @@ -1655,8 +1603,6 @@ if (err) priv->status &= ~STATUS_SCANNING; - IPW_DEBUG_INFO("exit\n"); - return err; } @@ -3190,8 +3087,6 @@ ipw2100_enable_interrupts(priv); spin_unlock_irqrestore(&priv->low_lock, flags); - - IPW_DEBUG_ISR("exit\n"); } @@ -4149,43 +3460,32 @@ { struct ipw2100_status_queue *q = &priv->status_queue; - IPW_DEBUG_INFO("enter\n"); - q->size = entries * sizeof(struct ipw2100_status); q->drv = (struct ipw2100_status *)pci_alloc_consistent( priv->pci_dev, q->size, &q->nic); if (!q->drv) { - IPW_DEBUG_WARNING( + printk(KERN_WARNING DRV_NAME ": " "Can not allocate status queue.\n"); return -ENOMEM; } memset(q->drv, 0, q->size); - - IPW_DEBUG_INFO("exit\n"); - return 0; } static void status_queue_free(struct ipw2100_priv *priv) { - IPW_DEBUG_INFO("enter\n"); - if (priv->status_queue.drv) { pci_free_consistent( priv->pci_dev, priv->status_queue.size, priv->status_queue.drv, priv->status_queue.nic); priv->status_queue.drv = NULL; } - - IPW_DEBUG_INFO("exit\n"); } static int bd_queue_allocate(struct ipw2100_priv *priv, struct ipw2100_bd_queue *q, int entries) { - IPW_DEBUG_INFO("enter\n"); - memset(q, 0, sizeof(struct ipw2100_bd_queue)); q->entries = entries; @@ -4196,17 +3496,12 @@ return -ENOMEM; } memset(q->drv, 0, q->size); - - IPW_DEBUG_INFO("exit\n"); - return 0; } static void bd_queue_free(struct ipw2100_priv *priv, struct ipw2100_bd_queue *q) { - IPW_DEBUG_INFO("enter\n"); - if (!q) return; @@ -4215,24 +3510,18 @@ q->size, q->drv, q->nic); q->drv = NULL; } - - IPW_DEBUG_INFO("exit\n"); } static void bd_queue_initialize( struct ipw2100_priv *priv, struct ipw2100_bd_queue * q, u32 base, u32 size, u32 r, u32 w) { - IPW_DEBUG_INFO("enter\n"); - IPW_DEBUG_INFO("initializing bd queue at virt=%p, phys=%08x\n", q->drv, q->nic); write_register(priv->net_dev, base, q->nic); write_register(priv->net_dev, size, q->entries); write_register(priv->net_dev, r, q->oldest); write_register(priv->net_dev, w, q->next); - - IPW_DEBUG_INFO("exit\n"); } static void ipw2100_kill_workqueue(struct ipw2100_priv *priv) @@ -4256,11 +3545,9 @@ void *v; dma_addr_t p; - IPW_DEBUG_INFO("enter\n"); - err = bd_queue_allocate(priv, &priv->tx_queue, TX_QUEUE_LENGTH); if (err) { - IPW_DEBUG_ERROR("%s: failed bd_queue_allocate\n", + printk(KERN_ERR DRV_NAME ": %s: failed bd_queue_allocate\n", priv->net_dev->name); return err; } @@ -4312,8 +3599,6 @@ { int i; - IPW_DEBUG_INFO("enter\n"); - /* * reinitialize packet info lists */ @@ -4352,17 +3637,12 @@ IPW_MEM_HOST_SHARED_TX_QUEUE_BD_SIZE, IPW_MEM_HOST_SHARED_TX_QUEUE_READ_INDEX, IPW_MEM_HOST_SHARED_TX_QUEUE_WRITE_INDEX); - - IPW_DEBUG_INFO("exit\n"); - } static void ipw2100_tx_free(struct ipw2100_priv *priv) { int i; - IPW_DEBUG_INFO("enter\n"); - bd_queue_free(priv, &priv->tx_queue); if (!priv->tx_buffers) @@ -4383,8 +3663,6 @@ kfree(priv->tx_buffers); priv->tx_buffers = NULL; - - IPW_DEBUG_INFO("exit\n"); } @@ -4393,8 +3671,6 @@ { int i, j, err = -EINVAL; - IPW_DEBUG_INFO("enter\n"); - err = bd_queue_allocate(priv, &priv->rx_queue, RX_QUEUE_LENGTH); if (err) { IPW_DEBUG_INFO("failed bd_queue_allocate\n"); @@ -4416,11 +3692,8 @@ GFP_KERNEL); if (!priv->rx_buffers) { IPW_DEBUG_INFO("can't allocate rx packet buffer table\n"); - bd_queue_free(priv, &priv->rx_queue); - status_queue_free(priv); - return -ENOMEM; } @@ -4461,8 +3734,6 @@ static void ipw2100_rx_initialize(struct ipw2100_priv *priv) { - IPW_DEBUG_INFO("enter\n"); - priv->rx_queue.oldest = 0; priv->rx_queue.available = priv->rx_queue.entries - 1; priv->rx_queue.next = priv->rx_queue.entries - 1; @@ -4479,16 +3750,12 @@ /* set up the status queue */ write_register(priv->net_dev, IPW_MEM_HOST_SHARED_RX_STATUS_BASE, priv->status_queue.nic); - - IPW_DEBUG_INFO("exit\n"); } static void ipw2100_rx_free(struct ipw2100_priv *priv) { int i; - IPW_DEBUG_INFO("enter\n"); - bd_queue_free(priv, &priv->rx_queue); status_queue_free(priv); @@ -4507,8 +3774,6 @@ kfree(priv->rx_buffers); priv->rx_buffers = NULL; - - IPW_DEBUG_INFO("exit\n"); } static int ipw2100_read_mac_address(struct ipw2100_priv *priv) @@ -4549,8 +3814,6 @@ IPW_DEBUG_HC("SET_MAC_ADDRESS\n"); - IPW_DEBUG_INFO("enter\n"); - if (priv->config & CFG_CUSTOM_MAC) { memcpy(cmd.host_command_parameters, priv->mac_addr, ETH_ALEN); @@ -4560,8 +3823,6 @@ ETH_ALEN); err = ipw2100_hw_send_command(priv, &cmd); - - IPW_DEBUG_INFO("exit\n"); return err; } @@ -4948,8 +4168,6 @@ int err; int len; - IPW_DEBUG_HC("DISASSOCIATION_BSSID\n"); - len = ETH_ALEN; /* The Firmware currently ignores the BSSID and just disassociates from * the currently associated AP -- but in the off chance that a future @@ -5009,8 +4196,6 @@ }; int err; - IPW_DEBUG_HC("SET_WPA_IE\n"); - if (!batch_mode) { err = ipw2100_disable_adapter(priv); if (err) @@ -5136,8 +4321,6 @@ cmd.host_command_parameters[0] = interval; - IPW_DEBUG_INFO("enter\n"); - if (priv->ieee->iw_mode == IW_MODE_ADHOC) { if (!batch_mode) { err = ipw2100_disable_adapter(priv); @@ -5153,9 +4336,6 @@ return err; } } - - IPW_DEBUG_INFO("exit\n"); - return 0; } @@ -5515,8 +4695,6 @@ int batch_mode = 1; u8 *bssid; - IPW_DEBUG_INFO("enter\n"); - err = ipw2100_disable_adapter(priv); if (err) return err; @@ -5525,9 +4703,6 @@ err = ipw2100_set_channel(priv, priv->channel, batch_mode); if (err) return err; - - IPW_DEBUG_INFO("exit\n"); - return 0; } #endif /* CONFIG_IPW2100_MONITOR */ @@ -5604,9 +4779,6 @@ if (err) return err; */ - - IPW_DEBUG_INFO("exit\n"); - return 0; } @@ -5669,8 +4841,6 @@ struct list_head *element; struct ipw2100_tx_packet *packet; - IPW_DEBUG_INFO("enter\n"); - spin_lock_irqsave(&priv->low_lock, flags); if (priv->status & STATUS_ASSOCIATED) @@ -5692,9 +4862,6 @@ INC_STAT(&priv->tx_free_stat); } spin_unlock_irqrestore(&priv->low_lock, flags); - - IPW_DEBUG_INFO("exit\n"); - return 0; } @@ -6449,8 +5616,6 @@ int registered = 0; u32 val; - IPW_DEBUG_INFO("enter\n"); - mem_start = pci_resource_start(pci_dev, 0); mem_len = pci_resource_len(pci_dev, 0); mem_flags = pci_resource_flags(pci_dev, 0); @@ -6598,8 +5763,6 @@ ipw2100_start_scan(priv); } - IPW_DEBUG_INFO("exit\n"); - priv->status |= STATUS_INITIALIZED; up(&priv->action_sem); @@ -6689,17 +5851,11 @@ pci_release_regions(pci_dev); pci_disable_device(pci_dev); - - IPW_DEBUG_INFO("exit\n"); } #ifdef CONFIG_PM -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,11) -static int ipw2100_suspend(struct pci_dev *pci_dev, u32 state) -#else static int ipw2100_suspend(struct pci_dev *pci_dev, pm_message_t state) -#endif { struct ipw2100_priv *priv = pci_get_drvdata(pci_dev); struct net_device *dev = priv->net_dev; @@ -8288,8 +7444,6 @@ down(&priv->action_sem); - IPW_DEBUG_WX("enter\n"); - up(&priv->action_sem); wrqu.ap_addr.sa_family = ARPHRD_ETHER; -- teflon -- maybe it is a trademark, but it should not be. From pavel@ucw.cz Mon Jun 20 04:34:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 20 Jun 2005 04:34:38 -0700 (PDT) Received: from amd.ucw.cz (gprs189-60.eurotel.cz [160.218.189.60]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5KBYUH9029482 for ; Mon, 20 Jun 2005 04:34:32 -0700 Received: by amd.ucw.cz (Postfix, from userid 8) id 6D4648B8EC; Mon, 20 Jun 2005 13:33:07 +0200 (CEST) Date: Mon, 20 Jun 2005 13:33:07 +0200 From: Pavel Machek To: Netdev list , Andrew Morton , Jeff Garzik , "James P. Ketrenos" Subject: [patch] ipw2100: remove commented-out code Message-ID: <20050620113307.GA14918@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Warning: Reading this can be dangerous to your mental health. User-Agent: Mutt/1.5.9i X-archive-position: 2474 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pavel@ucw.cz Precedence: bulk X-list: netdev Content-Length: 2845 Lines: 86 This removes up various code/defines that was just commented out instead of being deleted. --- clean-mm/drivers/net/wireless/ipw2100.c 2005-06-20 12:34:33.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.c 2005-06-20 12:55:03.000000000 +0200 @@ -915,12 +907,10 @@ if (i == 10000) return -EIO; /* TODO: better error value */ -//#if CONFIG_IPW2100_D0ENABLED /* set D0 standby bit */ read_register(priv->net_dev, IPW_REG_GP_CNTRL, &r); write_register(priv->net_dev, IPW_REG_GP_CNTRL, r | IPW_AUX_HOST_GP_CNTRL_BIT_HOST_ALLOWS_STANDBY); -//#endif return 0; } --- clean-mm/drivers/net/wireless/ipw2100.h 2005-06-20 12:34:33.000000000 +0200 +++ linux-mm/drivers/net/wireless/ipw2100.h 2005-06-20 12:55:10.000000000 +0200 @@ -320,7 +318,7 @@ u16 fragment_size; } __attribute__ ((packed)); -// Host command data structure +/* Host command data structure */ struct host_command { u32 host_command; // COMMAND ID u32 host_command1; // COMMAND ID @@ -663,7 +656,7 @@ #define MSDU_TX_RATES 62 -// Rogue AP Detection +/* Rogue AP Detection */ #define SET_STATION_STAT_BITS 64 #define CLEAR_STATIONS_STAT_BITS 65 #define LEAP_ROGUE_MODE 66 //TODO tbw replaced by CFG_LEAP_ROGUE_AP @@ -673,25 +666,16 @@ -// system configuration bit mask: -//#define IPW_CFG_ANTENNA_SETTING 0x03 -//#define IPW_CFG_ANTENNA_A 0x01 -//#define IPW_CFG_ANTENNA_B 0x02 +/* system configuration bit mask: */ #define IPW_CFG_MONITOR 0x00004 -//#define IPW_CFG_TX_STATUS_ENABLE 0x00008 #define IPW_CFG_PREAMBLE_AUTO 0x00010 #define IPW_CFG_IBSS_AUTO_START 0x00020 -//#define IPW_CFG_KERBEROS_ENABLE 0x00040 #define IPW_CFG_LOOPBACK 0x00100 -//#define IPW_CFG_WNMP_PING_PASS 0x00200 -//#define IPW_CFG_DEBUG_ENABLE 0x00400 #define IPW_CFG_ANSWER_BCSSID_PROBE 0x00800 -//#define IPW_CFG_BT_PRIORITY 0x01000 #define IPW_CFG_BT_SIDEBAND_SIGNAL 0x02000 #define IPW_CFG_802_1x_ENABLE 0x04000 #define IPW_CFG_BSS_MASK 0x08000 #define IPW_CFG_IBSS_MASK 0x10000 -//#define IPW_CFG_DYNAMIC_CW 0x10000 #define IPW_SCAN_NOASSOCIATE (1<<0) #define IPW_SCAN_MIXED_CELL (1<<1) @@ -840,7 +824,7 @@ } rx_data; } __attribute__ ((packed)); -// Bit 0-7 are for 802.11b tx rates - . Bit 5-7 are reserved +/* Bit 0-7 are for 802.11b tx rates - . Bit 5-7 are reserved */ #define TX_RATE_1_MBIT 0x0001 #define TX_RATE_2_MBIT 0x0002 #define TX_RATE_5_5_MBIT 0x0004 @@ -1120,7 +1104,6 @@ IPW_ORD_UCODE_VERSION, // Ucode Version IPW_ORD_HW_RF_SWITCH_STATE = 214, // HW RF Kill Switch State } ORDINALTABLE1; -//ENDOF TABLE1 // ordinal table 2 // Variable length data: -- teflon -- maybe it is a trademark, but it should not be. From jmoyer@redhat.com Mon Jun 20 08:02:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 20 Jun 2005 08:02:45 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5KF2eH9010266 for ; Mon, 20 Jun 2005 08:02:41 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5KF1MVI007780; Mon, 20 Jun 2005 11:01:22 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5KF1Hu05219; Mon, 20 Jun 2005 11:01:17 -0400 Received: from segfault.boston.redhat.com (segfault.boston.redhat.com [172.16.80.57]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j5KF1HCv027248; Mon, 20 Jun 2005 11:01:17 -0400 Received: from segfault.boston.redhat.com (localhost.localdomain [127.0.0.1]) by segfault.boston.redhat.com (8.13.1/8.13.1) with ESMTP id j5KF1HMp016180; Mon, 20 Jun 2005 11:01:17 -0400 Received: (from jmoyer@localhost) by segfault.boston.redhat.com (8.13.1/8.13.1/Submit) id j5KF1Eg9016176; Mon, 20 Jun 2005 11:01:14 -0400 From: Jeff Moyer MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17078.55866.893715.792418@segfault.boston.redhat.com> Date: Mon, 20 Jun 2005 11:01:14 -0400 To: "John W. Linville" Cc: Matt Mackall , netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: netpoll and the bonding driver In-Reply-To: <20050620002118.GA16859@tuxdriver.com> References: <17075.10995.498758.773092@segfault.boston.redhat.com> <20050619181436.GX27572@waste.org> <20050620002118.GA16859@tuxdriver.com> X-Mailer: VM 7.17 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid Reply-To: jmoyer@redhat.com X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? X-archive-position: 2475 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmoyer@redhat.com Precedence: bulk X-list: netdev Content-Length: 773 Lines: 18 ==> Regarding Re: netpoll and the bonding driver; "John W. Linville" adds: linville> On Sun, Jun 19, 2005 at 11:14:36AM -0700, Matt Mackall wrote: >> On Fri, Jun 17, 2005 at 03:56:35PM -0400, Jeff Moyer wrote: >> > I'm trying to implement a netpoll hook for the bonding driver. >> >> My first question would be: does this really make sense to do? Why not >> just bind netpoll to one of the underlying devices? linville> Depending on the bonding mode, this would be very unlikely to linville> work. The other side of the link will still be expecting to talk linville> to the bond rather than to an individual link. Right, and for those drivers which register a netpoll_rx routine, they may not get all of the packets destined for them. -Jeff From davem@davemloft.net Mon Jun 20 13:25:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 20 Jun 2005 13:25:39 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5KKPWH9003538 for ; Mon, 20 Jun 2005 13:25:32 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DkSoL-0007y2-IQ; Mon, 20 Jun 2005 13:24:01 -0700 Date: Mon, 20 Jun 2005 13:24:01 -0700 (PDT) Message-Id: <20050620.132401.21595297.davem@davemloft.net> To: herbert@gondor.apana.org.au Cc: jmorris@redhat.com, kaber@trash.net, yoshfuji@linux-ipv6.org, netdev@oss.sgi.com Subject: Re: [net-2.6.13 0/3] [IPSEC] Allow PMTU discovery to be turned off From: "David S. Miller" In-Reply-To: <20050613073353.GA21454@gondor.apana.org.au> References: <20050613073353.GA21454@gondor.apana.org.au> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2476 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1182 Lines: 26 From: Herbert Xu Date: Mon, 13 Jun 2005 17:33:53 +1000 > One of the problems that's been plaguing our IPsec stack is ICMP > blackholes. ICMP blackholes are particularly bad for tunnels because > the most common remediy -- MSS clamping has no effect when applied > outside the tunnel. It is often impractical to apply it inside > the tunnel since the point where the clamping is applied may be some > way away from either IPsec endpoint. > > The best solution so far has been to disable PMTU discovery when a > blackhole is detected. We already support that for IPIP/GRE tunnels. > The following patchset adds support for a similar strategy to IPsec > tunnels. > > It is by no means ideal but it's something that you need to survive > on today's Internet. All 3 patches applied, thanks Herbert. One thing needs clarification in your description. When I first read "blackhole is detected" I was under the wrong impression as to _who_ does the detection. Your patches allow the administrator to do this, whereas I thought you were going to add some code which dynamically figured out the presence of ICMP black holes and would thus set the bit. From ganesh.venkatesan@gmail.com Mon Jun 20 14:30:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 20 Jun 2005 14:30:52 -0700 (PDT) Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.196]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5KLUlH9006941 for ; Mon, 20 Jun 2005 14:30:48 -0700 Received: by zproxy.gmail.com with SMTP id 34so718759nzf for ; Mon, 20 Jun 2005 14:29:29 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=G10TZto1yzJtzGs0fuJ6QpsXMcuRNstL814qk85ya7sUrw8IPhpdO0Nra0BW8985JtV1jktQK2e61h9Py/dF76x/dghYq+QzgZVf1flbl644bN1V1OjrXI8KdxcARYWEZgPZTvInnOtD1fK69WgYuQrji9XfhkjNzOGnGQX46Hg= Received: by 10.36.220.79 with SMTP id s79mr3430522nzg; Mon, 20 Jun 2005 14:29:29 -0700 (PDT) Received: by 10.36.66.9 with HTTP; Mon, 20 Jun 2005 14:29:29 -0700 (PDT) Message-ID: <5fc59ff305062014294069b259@mail.gmail.com> Date: Mon, 20 Jun 2005 14:29:29 -0700 From: Ganesh Venkatesan Reply-To: Ganesh Venkatesan To: Malli Chilakala , "jgarzik@pobox.com" , netdev Subject: Re: [PATCH net-drivers-2.6 0/9] ixgb: driver update In-Reply-To: <20050620002244.GB16859@tuxdriver.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline References: <20050620002244.GB16859@tuxdriver.com> Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5KLUlH9006941 X-archive-position: 2477 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ganesh.venkatesan@gmail.com Precedence: bulk X-list: netdev Content-Length: 810 Lines: 28 John: Are you subscribed to netdev@vger.kernel.org? ganesh. On 6/19/05, John W. Linville wrote: > On Fri, Jun 17, 2005 at 04:54:36PM -0700, Malli Chilakala wrote: > > ixgb: driver update > > > 1. Set RXDCTL:PTHRESH/HTHRESH to zero > > 2. Fix unnecessary link state messages > > 3. Use netdev_priv() instead of netdev->priv > > 4. Fix Broadcast/Multicast packets received statistics > > 5. Fix data output by ethtool -d > > 6. Ethtool cleanup patch from Stephen Hemminger > > 7. Remove unused functions, render some variable static instead of global > > 8. Redefined buffer_info-dma to be dma_addr_t instead of uint64 > > 9. Driver version & white space fixes > > Hmmm...I only got parts 1 & 2...anyone else missing parts? > > -- > John W. Linville > linville@tuxdriver.com > > From simon@thekelleys.org.uk Tue Jun 21 00:48:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 00:48:48 -0700 (PDT) Received: from thekelleys.org.uk (cpc4-cmbg4-4-0-cust124.cmbg.cable.ntl.com [81.108.205.124]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5L7mfH9024523 for ; Tue, 21 Jun 2005 00:48:41 -0700 Received: from vaio.thekelleys.org.uk ([192.168.1.157]) by thekelleys.org.uk with esmtp (Exim 3.35 #1 (Debian)) id 1DkdPq-0007PA-00; Tue, 21 Jun 2005 08:43:26 +0100 Message-ID: <42B7C4D0.9070809@thekelleys.org.uk> Date: Tue, 21 Jun 2005 08:42:08 +0100 From: Simon Kelley User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041007 Debian/1.7.3-5 X-Accept-Language: en MIME-Version: 1.0 To: Jirka Bohac CC: Denis Vlasenko , Pavel Machek , Jeff Garzik , Netdev list , kernel list Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> <20050608145653.GA8844@dwarf.suse.cz> In-Reply-To: <20050608145653.GA8844@dwarf.suse.cz> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2478 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: simon@thekelleys.org.uk Precedence: bulk X-list: netdev Content-Length: 1641 Lines: 48 Jirka Bohac wrote: > On Wed, Jun 08, 2005 at 05:44:20PM +0300, Denis Vlasenko wrote: > >>On Wednesday 08 June 2005 17:23, Pavel Machek wrote: >> >>>What's the prefered way to solve this one? Only load firmware when >>>user does ifconfig eth1 up? [It is wifi, it looks like it would be >>>better to start firmware sooner so that it can associate to the >>>AP...]. >> >>Do you want to associate to an AP when your kernel boots, >>_before_ any iwconfig had a chance to configure anything? >>That's strange. >> >>My position is that wifi drivers must start up in an "OFF" mode. >>Do not send anything. Do not join APs or start IBSS. > > > Agreed. > > >>Thus, no need to load fw in early boot. > > > I don't think this is true. Loading the firmware on the first > "ifconfig up" is problematic. Often, people want to rename the > device from ethX/wlanX/... to something stable. This is usually > based on the adapter's MAC address, which is not visible until > the firmware is loaded. > > Prism54 does it this way and it really sucks. You need to bring > the adapter up to load the firmware, then bring it back down, > rename it, and bring it up again. > The atmel driver includes a small firmware stub which does nothing but determine the MAC address, to solve this problem. This is compiled into the driver and so doesn't depend on request_firmware(). The stub was created by reverse engineering the card and is GPL, so there's no problem including it in the kernel. This is not a general solution, since it depends on the ability to create such MAC reader firmware, but it might be a possibility in this case. Cheers, Simon. From feyd@nmskb.cz Tue Jun 21 01:31:06 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 01:31:13 -0700 (PDT) Received: from smtp.nmskb.cz (router.nmskb.cz [82.142.73.249] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5L8V5H9027276 for ; Tue, 21 Jun 2005 01:31:06 -0700 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.nmskb.cz (Postfix) with ESMTP id 90D2E33A860; Tue, 21 Jun 2005 10:29:45 +0200 (CEST) Received: from smtp.nmskb.cz ([127.0.0.1]) by localhost (smtp1 [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 26138-07; Tue, 21 Jun 2005 10:29:44 +0200 (CEST) Received: from alfa.nmskb.cz (unknown [192.168.1.14]) by smtp.nmskb.cz (Postfix) with ESMTP id 6BFAD33A75D; Tue, 21 Jun 2005 10:29:43 +0200 (CEST) Date: Tue, 21 Jun 2005 10:29:21 +0200 From: Feyd To: Simon Kelley Cc: Jirka Bohac , Denis Vlasenko , Pavel Machek , Jeff Garzik , Netdev list , kernel list Subject: Re: ipw2100: firmware problem Message-ID: <20050621102921.5a8c953a@alfa.nmskb.cz> In-Reply-To: <42B7C4D0.9070809@thekelleys.org.uk> References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> <20050608145653.GA8844@dwarf.suse.cz> <42B7C4D0.9070809@thekelleys.org.uk> X-Mailer: Sylpheed-Claws 1.0.3 (GTK+ 1.2.10; i686-suse-linux) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2479 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: feyd@nmskb.cz Precedence: bulk X-list: netdev Content-Length: 289 Lines: 9 On Tue, 21 Jun 2005 08:42:08 +0100 Simon Kelley wrote: > The atmel driver includes a small firmware stub which does nothing but > determine the MAC address, to solve this problem. This is compiled into Does it power-down the card after reading the MAC? Feyd From simon@thekelleys.org.uk Tue Jun 21 01:48:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 01:48:39 -0700 (PDT) Received: from thekelleys.org.uk (cpc4-cmbg4-4-0-cust124.cmbg.cable.ntl.com [81.108.205.124]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5L8mZH9028548 for ; Tue, 21 Jun 2005 01:48:36 -0700 Received: from central ([192.168.0.4] helo=[127.0.0.1]) by thekelleys.org.uk with esmtp (Exim 3.35 #1 (Debian)) id 1DkeOX-0007XA-00; Tue, 21 Jun 2005 09:46:09 +0100 Message-ID: <42B7D3D2.8010606@thekelleys.org.uk> Date: Tue, 21 Jun 2005 09:46:10 +0100 From: Simon Kelley User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050509 Debian/1.7.6-1ubuntu2.1 X-Accept-Language: en MIME-Version: 1.0 To: Feyd CC: Jirka Bohac , Denis Vlasenko , Pavel Machek , Jeff Garzik , Netdev list , kernel list Subject: Re: ipw2100: firmware problem References: <20050608142310.GA2339@elf.ucw.cz> <200506081744.20687.vda@ilport.com.ua> <20050608145653.GA8844@dwarf.suse.cz> <42B7C4D0.9070809@thekelleys.org.uk> <20050621102921.5a8c953a@alfa.nmskb.cz> In-Reply-To: <20050621102921.5a8c953a@alfa.nmskb.cz> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2480 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: simon@thekelleys.org.uk Precedence: bulk X-list: netdev Content-Length: 507 Lines: 19 Feyd wrote: > On Tue, 21 Jun 2005 08:42:08 +0100 > Simon Kelley wrote: > > >>The atmel driver includes a small firmware stub which does nothing but >>determine the MAC address, to solve this problem. This is compiled into > > > Does it power-down the card after reading the MAC? > Yes, it loads the special firmware, runs it to get the MAC, and then returns the card to quiesent state, ready for the real firmware load which happens at device open time. Cheers, Simon. From davem@davemloft.net Tue Jun 21 13:22:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 13:22:54 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LKMoH9011736 for ; Tue, 21 Jun 2005 13:22:50 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DkpFH-0004D2-Dl; Tue, 21 Jun 2005 13:21:19 -0700 Date: Tue, 21 Jun 2005 13:21:19 -0700 (PDT) Message-Id: <20050621.132119.85686894.davem@davemloft.net> To: gnb@melbourne.sgi.com Cc: netdev@oss.sgi.com, mchan@broadcom.com Subject: Re: [PATCH]: Tigon3 new NAPI locking v2 From: "David S. Miller" In-Reply-To: <1118139072.2198.119.camel@hole.melbourne.sgi.com> References: <20050603.122558.88474819.davem@davemloft.net> <1118139072.2198.119.camel@hole.melbourne.sgi.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2482 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 330 Lines: 9 From: Greg Banks Date: Tue, 07 Jun 2005 20:11:12 +1000 > This patch seems to run well, so far without the lockup we saw > with the first version. It really helps with irq fairness when > we have lots of tg3 and Fibre Channel HBA interrupts going to the > same CPU. A belated thank you for testing Greg. From davem@davemloft.net Tue Jun 21 13:22:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 13:22:51 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LKMVH9011714 for ; Tue, 21 Jun 2005 13:22:36 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DkpEi-0004Cd-Dm; Tue, 21 Jun 2005 13:20:44 -0700 Date: Tue, 21 Jun 2005 13:20:44 -0700 (PDT) Message-Id: <20050621.132044.115910664.davem@davemloft.net> To: shemminger@osdl.org Cc: mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <42A5284C.3060808@osdl.org> References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2481 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 859 Lines: 19 From: Stephen Hemminger Date: Mon, 06 Jun 2005 21:53:32 -0700 > I noticed that the tg3 driver copies packets less than a certain > threshold to a new buffer, but e1000 always passes the big buffer up > the stack. Could this be having an impact? I bet it does, this makes ACK processing a lot more expensive. And it is so much cheaper to just recycle the big buffer back to the chip if you copy to a small buffer, and it warms up the caches for the packet headers as a side effect as well. Actually, it has a _HUGE_ _HUGE_ impact. If you pass the big buffer up, the receiving socket gets charged for the size of the huge buffer, not for just the size of the packet contained within. This makes sockets get overcharged for data reception, and it can cause all kinds of performance problems. I highly recommend that this gets fixed. From davem@davemloft.net Tue Jun 21 13:38:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 13:38:55 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LKcpH9013412 for ; Tue, 21 Jun 2005 13:38:51 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DkpUW-0004F6-PC; Tue, 21 Jun 2005 13:37:04 -0700 Date: Tue, 21 Jun 2005 13:37:04 -0700 (PDT) Message-Id: <20050621.133704.08321534.davem@davemloft.net> To: gandalf@wlug.westbo.se Cc: hadi@cyberus.ca, shemminger@osdl.org, mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: References: <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2483 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 2079 Lines: 40 From: Martin Josefsson Date: Tue, 7 Jun 2005 14:06:18 +0200 (CEST) > One thing that jumps to mind is that e1000 starts at lastrxdescriptor+1 > and loops and checks the status of each descriptor and stops when it finds > a descriptor that isn't finished. Another way to do it is to read out the > current position of the ring and loop from lastrxdescriptor+1 up to the > current position. Scott Feldman implemented this for TX and there it > increased performance somewhat (discussed here on netdev some months ago). > I wonder if it could also decrease RX latency, I mean, we have to get the > cache miss sometime anyway. > > I havn't checked how tg3 does it. I don't think this matters all that much. tg3 does loop on RX producer index, so doesn't touch descriptors unless the RX producer index states there is a ready packet there. One thing I noticed with Super TSO testing is that e1000 has very expensive TSO transmit processing. The big problem is the context descriptor. This is 4 extra 32-bit words eaten up in the transmit ring for every TSO packet. Whereas tg3 stores all the TSO offload information directly in the normal TX descriptor (which is the same size, 16 bytes, as the e1000 normal TX descriptor). It accounts for a non-trivial amount of overhead. On my SunBlade1500 with Super TSO, e1000 transmitter eats %40 of CPU to fill a gigabit pipe whereas tg3 takes %30. All of the extra time, based upon quick scans of oprofile dumps, shows it in the e1000 driver. Also, e1000 sends full MTU sized SKBs down into the stack even if the packet is very small. This also hurts performance a lot. As discussed elsewhere, it should use a "small packet" cut-off just like other drivers do. If the RX frame is less than this cut-off value, a new smaller sized SKB is allocated and the RX data copied into it. The RX ring SKB is left in-place and given back to the chip. My only guess is that the e1000 driver implemented things this way to simplify the RX recycling logic. Well, it is an area ripe for improvement in this driver :) From rick.jones2@hp.com Tue Jun 21 13:40:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 13:40:06 -0700 (PDT) Received: from palrel12.hp.com (palrel12.hp.com [156.153.255.237]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LKe0H9013610 for ; Tue, 21 Jun 2005 13:40:01 -0700 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel12.hp.com (Postfix) with ESMTP id 8EE1F4021EC; Tue, 21 Jun 2005 13:38:40 -0700 (PDT) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id NAA02570; Tue, 21 Jun 2005 13:38:39 -0700 (PDT) Message-ID: <42B87ACF.3080800@hp.com> Date: Tue, 21 Jun 2005 13:38:39 -0700 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" Cc: shemminger@osdl.org, mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> <20050621.132044.115910664.davem@davemloft.net> In-Reply-To: <20050621.132044.115910664.davem@davemloft.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2484 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev Content-Length: 1578 Lines: 41 David S. Miller wrote: > From: Stephen Hemminger > Date: Mon, 06 Jun 2005 21:53:32 -0700 > > >>I noticed that the tg3 driver copies packets less than a certain >>threshold to a new buffer, but e1000 always passes the big buffer up >>the stack. Could this be having an impact? > > > I bet it does, this makes ACK processing a lot more expensive. Why would ACK processing care about the size of the buffer containing the ACK segment? > And it > is so much cheaper to just recycle the big buffer back to the chip > if you copy to a small buffer, and it warms up the caches for the > packet headers as a side effect as well. I would think that the cache business would be a wash either way. With 64 byte cache lines (128 in some cases) just accessing the link-level header has brought the IP header into the cache, and probably the TCP header as well. Isn't the decision point between the sum of allocating a small buffer and doing the copy, versus allocating a new large buffer and (re)mapping it for DMA? I guess that would come down to copy versus mapping overhead. > Actually, it has a _HUGE_ _HUGE_ impact. If you pass the big buffer > up, the receiving socket gets charged for the size of the huge buffer, > not for just the size of the packet contained within. This makes > sockets get overcharged for data reception, and it can cause all kinds > of performance problems. Then copy when the socket is about to fill with overhead bytes? > I highly recommend that this gets fixed. What is the cut-off point for the copy? rick jones From davem@davemloft.net Tue Jun 21 13:49:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 13:50:03 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LKnwH9014930 for ; Tue, 21 Jun 2005 13:49:58 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DkpfS-0004GF-Ba; Tue, 21 Jun 2005 13:48:22 -0700 Date: Tue, 21 Jun 2005 13:48:22 -0700 (PDT) Message-Id: <20050621.134822.21926602.davem@davemloft.net> To: pmeda@akamai.com Cc: jgarzik@pobox.com, akpm@osdl.org, netdev@oss.sgi.com Subject: Re: [patch] devinet: cleanup if statements From: "David S. Miller" In-Reply-To: <200506072032.NAA06207@allur.sanmateo.akamai.com> References: <200506072032.NAA06207@allur.sanmateo.akamai.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2485 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 886 Lines: 26 From: pmeda@akamai.com Date: Tue, 7 Jun 2005 13:32:44 -0700 > Cleanup the devinet if statements. > - when there is no colon, interface name is same as device. > - ifa_label is an array, not a pointer, and so can never be null. > > Signed-Off-by: Prasanna Meda Ok, I can see how your first change is correct. When there is a colon, we've modified ifr.ifr_name by patching the ':' character to be a '\0'. This is for the __dev_get_by_name() lookup. After that lookup, we re-patch the ':' character back into ifr.ifr_name. So indeed, always using the ifr_name in that code block would be correct. The second hunk of your patch seems to defeat the intention of that code. I believe the idea is that if the label and the device name differ, use the label. This whole area is pretty messy, we should examine the true intended semantics of the ifa_label stuff. From davem@davemloft.net Tue Jun 21 13:56:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 13:56:55 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LKujH9015645 for ; Tue, 21 Jun 2005 13:56:45 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dkplt-0004Hf-29; Tue, 21 Jun 2005 13:55:01 -0700 Date: Tue, 21 Jun 2005 13:55:00 -0700 (PDT) Message-Id: <20050621.135500.35467865.davem@davemloft.net> To: rick.jones2@hp.com Cc: shemminger@osdl.org, mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, hadi@cyberus.ca, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <42B87ACF.3080800@hp.com> References: <42A5284C.3060808@osdl.org> <20050621.132044.115910664.davem@davemloft.net> <42B87ACF.3080800@hp.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2486 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 226 Lines: 8 From: Rick Jones Date: Tue, 21 Jun 2005 13:38:39 -0700 > > I highly recommend that this gets fixed. > > What is the cut-off point for the copy? 256 has been found to be a well functioning value to use. From jmoyer@redhat.com Tue Jun 21 14:43:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 14:43:04 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LLh0H9018551 for ; Tue, 21 Jun 2005 14:43:00 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5LLfZ76005561; Tue, 21 Jun 2005 17:41:35 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5LLfZu09092; Tue, 21 Jun 2005 17:41:35 -0400 Received: from segfault.boston.redhat.com (segfault.boston.redhat.com [172.16.80.57]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j5LLfZc0029620; Tue, 21 Jun 2005 17:41:35 -0400 Received: from segfault.boston.redhat.com (localhost.localdomain [127.0.0.1]) by segfault.boston.redhat.com (8.13.1/8.13.1) with ESMTP id j5LLfZXD004871; Tue, 21 Jun 2005 17:41:35 -0400 Received: (from jmoyer@localhost) by segfault.boston.redhat.com (8.13.1/8.13.1/Submit) id j5LLfYMb004868; Tue, 21 Jun 2005 17:41:34 -0400 From: Jeff Moyer MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17080.35214.507402.998984@segfault.boston.redhat.com> Date: Tue, 21 Jun 2005 17:41:34 -0400 To: mpm@selenic.com CC: netdev@oss.sgi.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [patch,rfc] allow registration of multiple netpolls per interface X-Mailer: VM 7.17 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid Reply-To: jmoyer@redhat.com X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? X-archive-position: 2487 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmoyer@redhat.com Precedence: bulk X-list: netdev Content-Length: 10919 Lines: 334 Hi, This patch restores functionality that was removed when the recursive ->poll bug was fixed. Namely, it allows multiple netpoll clients to register against the same network interface. In order to put things into perspective, I'm going to provide some background information. So, here is how things used to work: Multiple users of the netpoll interface could register themselves to send packets over the same interface. Any number of these netpoll clients could register an rx_hook, as well. However, only the very first in the list (hence the last one that registered), that matched the incoming interface, would be called when a packet arrived. The reason for this was not design, it was an oversight in the implementation. In practice, however, no one ever stumbled over this. (There are more subtleties when dealing with multiple rx_hooks registered to the same interface, but we'll ignore these, since no one ever ran into such problems.) Note that each netpoll client that registered an rx_hook was put on a netpoll_rx_list. This list was protected by a spinlock, and so operations which touched the rx routines would incur a locking penalty and a list traversal. I am mentioning this because the list and associated lock were removed when the code was refactored, and the patches I propose will reintroduce the lock, but not the list. Moving to what we have today: Multiple netpoll clients can register to send packets over the same interface. That's right, you can actually do this. However, there are ugly side effects. Because we now have a pointer from the net_device to a struct netpoll, the last netpoll client to register will be pointed to by the net_device->np. What this means is that if you had two clients, the first registers an rx_hook and the second does not, then the netpoll code will not know that any device has actually registered an rx_hook (since the np pointer in the struct net_device is overwritten)! As a result, no incoming packets will be delivered to the registered rx routine. This is clearly undesirable behaviour. So what does the patch do? I created a new structure: struct netpoll_info { spinlock_t poll_lock; int poll_owner; int rx_flags; spinlock_t rx_lock; struct netpoll *rx_np; /* netpoll that registered an rx_hook */ }; This is the structure which gets pointed to by the net_device. All of the flags and locks which are specific to the INTERFACE go here. Any variables which must be kept per struct netpoll were left in the struct netpoll. So now, we have a cleaner separation of data and its scope. Since we never really supported having more than one struct netpoll register an rx_hook, I got rid of the rx_list. This is replaced by a single pointer in the netpoll_info structure (np_rx). We still need to protect addition or removal of the rx_np pointer, and so keep the lock (rx_lock). There is one lock per struct net_device, and I am certain that it will be 0 contention, as rx_np will only be changed during an insmod or rmmod. If people think this would be a good rcu candidate, let me know and I'll change it to use that locking scheme. In the process of making these changes, I've fixed a couple other minor bugs [1]. These fixes are included in this patch, but I will break them out if people agree with this approach. I have tested this by registering multiple netpoll clients, and verifying that they both function properly. I have not yet tried registering an rx_hook, but I believe the code should be sufficient to handle that case. And so, here is the full patch. I'd appreciate comments. Once we've reached consensus, I will resubmit as a patch series. Oh, and I've cc'd both netdev@oss.sgi.com and @vger.kernel.org. Is it safe to just use the vger list? Thanks, Jeff [1] netpoll_poll_unlock unlocked and then set the poll_owner. I've reversed the order of those operations. The netpoll_cleanup code could dereference a null pointer, that was fixed by virtue of being very different in the new case. --- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400 +++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400 @@ -131,18 +131,19 @@ static int checksum_udp(struct sk_buff * static void poll_napi(struct netpoll *np) { int budget = 16; + struct netpoll_info *npinfo = np->dev->npinfo; if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) && - np->poll_owner != smp_processor_id() && - spin_trylock(&np->poll_lock)) { - np->rx_flags |= NETPOLL_RX_DROP; + npinfo->poll_owner != smp_processor_id() && + spin_trylock(&npinfo->poll_lock)) { + npinfo->rx_flags |= NETPOLL_RX_DROP; atomic_inc(&trapped); np->dev->poll(np->dev, &budget); atomic_dec(&trapped); - np->rx_flags &= ~NETPOLL_RX_DROP; - spin_unlock(&np->poll_lock); + npinfo->rx_flags &= ~NETPOLL_RX_DROP; + spin_unlock(&npinfo->poll_lock); } } @@ -245,6 +246,7 @@ repeat: static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) { int status; + struct netpoll_info *npinfo; repeat: if(!np || !np->dev || !netif_running(np->dev)) { @@ -253,7 +255,8 @@ repeat: } /* avoid recursion */ - if(np->poll_owner == smp_processor_id() || + npinfo = np->dev->npinfo; + if(npinfo->poll_owner == smp_processor_id() || np->dev->xmit_lock_owner == smp_processor_id()) { if (np->drop) np->drop(skb); @@ -346,7 +349,15 @@ static void arp_reply(struct sk_buff *sk int size, type = ARPOP_REPLY, ptype = ETH_P_ARP; u32 sip, tip; struct sk_buff *send_skb; - struct netpoll *np = skb->dev->np; + struct netpoll *np; + struct netpoll_info *npinfo = skb->dev->npinfo; + + if (!npinfo) return; + + spin_lock_irqsave(&npinfo->rx_lock, flags); + if (npinfo->rx_np->dev == skb->dev) + np = npinfo->rx_np; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); if (!np) return; @@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb) int proto, len, ulen; struct iphdr *iph; struct udphdr *uh; - struct netpoll *np = skb->dev->np; + struct netpoll *np = skb->dev->npinfo->rx_np; - if (!np->rx_hook) + if (!np) goto out; if (skb->dev->type != ARPHRD_ETHER) goto out; @@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np) { struct net_device *ndev = NULL; struct in_device *in_dev; - - np->poll_lock = SPIN_LOCK_UNLOCKED; - np->poll_owner = -1; + struct netpoll_info *npinfo; + unsigned long flags; if (np->dev_name) ndev = dev_get_by_name(np->dev_name); @@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np) } np->dev = ndev; - ndev->np = np; + if (!ndev->npinfo) { + npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL); + if (!npinfo) + goto release; + + npinfo->rx_np = NULL; + npinfo->poll_lock = SPIN_LOCK_UNLOCKED; + npinfo->poll_owner = -1; + npinfo->rx_lock = SPIN_LOCK_UNLOCKED; + } else + npinfo = ndev->npinfo; if (!ndev->poll_controller) { printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n", @@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np) np->name, HIPQUAD(np->local_ip)); } - if(np->rx_hook) - np->rx_flags = NETPOLL_RX_ENABLED; + if(np->rx_hook) { + spin_lock_irqsave(&npinfo->rx_lock, flags); + npinfo->rx_flags |= NETPOLL_RX_ENABLED; + npinfo->rx_np = np; + spin_unlock_irqsave(&npinfo->rx_lock, flags); + } + /* last thing to do is link it to the net device structure */ + ndev->npinfo = npinfo; return 0; release: - ndev->np = NULL; + if (!ndev->npinfo) + kfree(npinfo); np->dev = NULL; dev_put(ndev); return -1; @@ -706,9 +733,17 @@ int netpoll_setup(struct netpoll *np) void netpoll_cleanup(struct netpoll *np) { - if (np->dev) - np->dev->np = NULL; - dev_put(np->dev); + struct netpoll_info *npinfo; + + if (np->dev) { + npinfo = np->dev->npinfo; + if (npinfo && npinfo->rx_np == np) { + npinfo->rx_np = NULL; + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; + } + dev_put(np->dev); + } + np->dev = NULL; } --- linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400 +++ linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400 @@ -1656,6 +1656,7 @@ int netif_receive_skb(struct sk_buff *sk unsigned short type; /* if we've gotten here through NAPI, check netpoll */ + /* how else can we get here? --phro */ if (skb->dev->poll && netpoll_rx(skb)) return NET_RX_DROP; --- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400 +++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.994422229 -0400 @@ -16,14 +16,19 @@ struct netpoll; struct netpoll { struct net_device *dev; char dev_name[16], *name; - int rx_flags; void (*rx_hook)(struct netpoll *, int, char *, int); void (*drop)(struct sk_buff *skb); u32 local_ip, remote_ip; u16 local_port, remote_port; unsigned char local_mac[6], remote_mac[6]; +}; + +struct netpoll_info { spinlock_t poll_lock; int poll_owner; + int rx_flags; + spinlock_t rx_lock; + struct netpoll *rx_np; /* netpoll that registered an rx_hook */ }; void netpoll_poll(struct netpoll *np); @@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb); #ifdef CONFIG_NETPOLL static inline int netpoll_rx(struct sk_buff *skb) { - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb); + struct netpoll_info *npinfo = skb->dev->npinfo; + unsigned long flags; + int ret = 0; + + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags)) + return 0; + + spin_lock_irqsave(&npinfo->rx_lock, flags); + /* check rx_flags again with the lock held */ + if (npinfo->rx_flags && __netpoll_rx(skb)) + ret = 1; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); + + return ret; } static inline void netpoll_poll_lock(struct net_device *dev) { - if (dev->np) { - spin_lock(&dev->np->poll_lock); - dev->np->poll_owner = smp_processor_id(); + if (dev->npinfo) { + spin_lock(&dev->npinfo->poll_lock); + dev->npinfo->poll_owner = smp_processor_id(); } } static inline void netpoll_poll_unlock(struct net_device *dev) { - if (dev->np) { - spin_unlock(&dev->np->poll_lock); - dev->np->poll_owner = -1; + if (dev->npinfo) { + dev->npinfo->poll_owner = -1; + spin_unlock(&dev->npinfo->poll_lock); } } --- linux-2.6.12-rc6/include/linux/netdevice.h.orig 2005-06-20 20:26:21.000000000 -0400 +++ linux-2.6.12-rc6/include/linux/netdevice.h 2005-06-21 14:46:52.093190854 -0400 @@ -41,7 +41,7 @@ struct divert_blk; struct vlan_group; struct ethtool_ops; -struct netpoll; +struct netpoll_info; /* source back-compat hooks */ #define SET_ETHTOOL_OPS(netdev,ops) \ ( (netdev)->ethtool_ops = (ops) ) @@ -468,7 +468,7 @@ struct net_device unsigned char *haddr); int (*neigh_setup)(struct net_device *dev, struct neigh_parms *); #ifdef CONFIG_NETPOLL - struct netpoll *np; + struct netpoll_info *npinfo; #endif #ifdef CONFIG_NET_POLL_CONTROLLER void (*poll_controller)(struct net_device *dev); From ak@suse.de Tue Jun 21 14:48:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 14:48:34 -0700 (PDT) Received: from mx2.suse.de (ns2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LLmTH9019216 for ; Tue, 21 Jun 2005 14:48:30 -0700 Received: from Relay1.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id A19EF1D798; Tue, 21 Jun 2005 23:47:08 +0200 (CEST) To: Rick Jones Cc: netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch References: <468F3FDA28AA87429AD807992E22D07E0450C00B@orsmsx408> <42A5284C.3060808@osdl.org> <20050621.132044.115910664.davem@davemloft.net> <42B87ACF.3080800@hp.com> From: Andi Kleen Date: 21 Jun 2005 23:47:07 +0200 In-Reply-To: <42B87ACF.3080800@hp.com> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2488 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 835 Lines: 21 Rick Jones writes: > > Actually, it has a _HUGE_ _HUGE_ impact. If you pass the big buffer > > up, the receiving socket gets charged for the size of the huge buffer, > > not for just the size of the packet contained within. This makes > > sockets get overcharged for data reception, and it can cause all kinds > > of performance problems. > > Then copy when the socket is about to fill with overhead bytes? The stack has supported that since 2.4. Mostly because it is the only sane way to handle devices with very big MTU. But it turns off all kinds of fast paths before it happens, I guess that is what David was refering too. However I suspect the cut-off points with rx-copybreak in common driver have been often tuned before that code was introduced and it might be worth to do some retesting. -Andi From ak@suse.de Tue Jun 21 14:52:32 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 14:52:36 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LLqVH9019959 for ; Tue, 21 Jun 2005 14:52:32 -0700 Received: from Relay1.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id A2CB71D758; Tue, 21 Jun 2005 23:51:11 +0200 (CEST) To: "David S. Miller" Cc: pmeda@akamai.com, netdev@oss.sgi.com Subject: Re: [patch] devinet: cleanup if statements References: <200506072032.NAA06207@allur.sanmateo.akamai.com> <20050621.134822.21926602.davem@davemloft.net> From: Andi Kleen Date: 21 Jun 2005 23:51:11 +0200 In-Reply-To: <20050621.134822.21926602.davem@davemloft.net> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2489 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 485 Lines: 14 "David S. Miller" writes: > > This whole area is pretty messy, we should examine the true > intended semantics of the ifa_label stuff. Perhaps it would be best to just retire that 2.0 alias compatibility cruft now. Everybody should be using ifconfig or iproute2 by now that can add or remove addresses directly without the compatibility syntax. They never worked fully in corner cases anyways, e.g. SIOCSIFNAME with these aliases was always a problem. -Andi From becker@scyld.com Tue Jun 21 15:30:45 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 15:30:51 -0700 (PDT) Received: from bluewest.scyld.com (bluewest.scyld.com [64.240.166.233]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LMUjH9022187 for ; Tue, 21 Jun 2005 15:30:45 -0700 Received: from bluewest.scyld.com (localhost.localdomain [127.0.0.1]) by localhost.localdomain (8.12.11/8.12.11) with ESMTP id j5LMMn3O030210; Tue, 21 Jun 2005 15:22:49 -0700 Received: from localhost (becker@localhost) by bluewest.scyld.com (8.12.11/8.12.11/Submit) with ESMTP id j5LMMn9L030207; Tue, 21 Jun 2005 15:22:49 -0700 X-Authentication-Warning: bluewest.scyld.com: becker owned process doing -bs Date: Tue, 21 Jun 2005 15:22:49 -0700 (PDT) From: Donald Becker To: Andi Kleen cc: Rick Jones , , Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2490 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: becker@scyld.com Precedence: bulk X-list: netdev Content-Length: 1783 Lines: 44 On 21 Jun 2005, Andi Kleen wrote: > Rick Jones writes: > > > > Actually, it has a _HUGE_ _HUGE_ impact. If you pass the big buffer > > > up, the receiving socket gets charged for the size of the huge buffer, > > > not for just the size of the packet contained within. This makes > > > sockets get overcharged for data reception, and it can cause all kinds > > > of performance problems. > > > > Then copy when the socket is about to fill with overhead bytes? Or better, predict when the frame you are currently stuffing into the queue will be there when the queue fills up. And then use the same crystal ball to... > Mostly because it is the only sane way to handle devices with very big > MTU. But it turns off all kinds of fast paths before it happens, I > guess that is what David was refering too. > > However I suspect the cut-off points with rx-copybreak in common driver > have been often tuned before that code was introduced and it might > be worth to do some retesting. Most of that analysis and tuning was done in the 1996-99 timeframe. While much has changed since then, the same basic parameters remain - cache line size - frame header size (MAC+IP+ProtocolHeader) - hot cache lines from copying or type classification - cold memory lines from PCI writes I suspect you'll find that a good rx_copybreak is pretty much the same as it was when I did the original evaluation. If you are looking for an area that has changed: the hidden cost of maintaining consistent cache lines on SMP systems is far higher than it was back in the days of the Pentium Pro. Donald Becker becker@scyld.com Scyld Software A Penguin Computing company 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993 From ak@suse.de Tue Jun 21 15:35:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 15:36:00 -0700 (PDT) Received: from mx1.suse.de (mx1.suse.de [195.135.220.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LMZvH9023289 for ; Tue, 21 Jun 2005 15:35:57 -0700 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.suse.de (Postfix) with ESMTP id 04065EFF6; Wed, 22 Jun 2005 00:34:37 +0200 (CEST) Date: Wed, 22 Jun 2005 00:34:36 +0200 From: Andi Kleen To: Donald Becker Cc: Andi Kleen , Rick Jones , netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050621223436.GG14251@wotan.suse.de> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-archive-position: 2491 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 529 Lines: 20 > While much has changed since then, the same basic parameters remain > - cache line size In 96 we had 32 byte cache lines. These days 64-128 are common, with some 256 byte cache line systems around. > - frame header size (MAC+IP+ProtocolHeader) In 96 people tended to not use time stamps. > - hot cache lines from copying or type classification Not sure what you mean with that. > - cold memory lines from PCI writes I suspect in '96 chipsets also didn't do as aggressive prefetching as they do today. -Andi From oxymoron@waste.org Tue Jun 21 15:54:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 15:54:22 -0700 (PDT) Received: from waste.org (waste.org [216.27.176.166]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LMsFH9024402 for ; Tue, 21 Jun 2005 15:54:15 -0700 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.4/8.13.4/Debian-3) with ESMTP id j5LMqq6a028487 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 21 Jun 2005 17:52:53 -0500 Received: (from oxymoron@localhost) by waste.org (8.13.4/8.13.4/Submit) id j5LMqqgj028479; Tue, 21 Jun 2005 17:52:52 -0500 Date: Tue, 21 Jun 2005 15:52:52 -0700 From: Matt Mackall To: Jeff Moyer Cc: netdev@oss.sgi.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [patch,rfc] allow registration of multiple netpolls per interface Message-ID: <20050621225252.GY27572@waste.org> References: <17080.35214.507402.998984@segfault.boston.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17080.35214.507402.998984@segfault.boston.redhat.com> User-Agent: Mutt/1.5.9i X-archive-position: 2492 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: netdev Content-Length: 12098 Lines: 352 On Tue, Jun 21, 2005 at 05:41:34PM -0400, Jeff Moyer wrote: > Hi, > > This patch restores functionality that was removed when the recursive > ->poll bug was fixed. Namely, it allows multiple netpoll clients to > register against the same network interface. Thanks. I've been neglecting this for a bit while I've been busy with other things. > In order to put things into perspective, I'm going to provide some > background information. So, here is how things used to work: > > Multiple users of the netpoll interface could register themselves to send > packets over the same interface. Any number of these netpoll clients could > register an rx_hook, as well. However, only the very first in the list > (hence the last one that registered), that matched the incoming interface, > would be called when a packet arrived. The reason for this was not design, > it was an oversight in the implementation. In practice, however, no one > ever stumbled over this. (There are more subtleties when dealing with > multiple rx_hooks registered to the same interface, but we'll ignore these, > since no one ever ran into such problems.) Hmm. It's conceivable we'll want netdump and kgdb on the same interface so we'll have to address this eventually.. > Note that each netpoll client that registered an rx_hook was put on a > netpoll_rx_list. This list was protected by a spinlock, and so operations > which touched the rx routines would incur a locking penalty and a list > traversal. I am mentioning this because the list and associated lock were > removed when the code was refactored, and the patches I propose will > reintroduce the lock, but not the list. ..so we'll probably want the list back in some form. Sigh. > Moving to what we have today: > > Multiple netpoll clients can register to send packets over the same > interface. That's right, you can actually do this. However, there are > ugly side effects. Because we now have a pointer from the net_device to a > struct netpoll, the last netpoll client to register will be pointed to by > the net_device->np. What this means is that if you had two clients, the > first registers an rx_hook and the second does not, then the netpoll code > will not know that any device has actually registered an rx_hook (since the > np pointer in the struct net_device is overwritten)! As a result, no > incoming packets will be delivered to the registered rx routine. This is > clearly undesirable behaviour. > > So what does the patch do? > > I created a new structure: > > struct netpoll_info { > spinlock_t poll_lock; > int poll_owner; > int rx_flags; > spinlock_t rx_lock; > struct netpoll *rx_np; /* netpoll that registered an rx_hook */ > }; > > This is the structure which gets pointed to by the net_device. All of the > flags and locks which are specific to the INTERFACE go here. Any variables > which must be kept per struct netpoll were left in the struct netpoll. So > now, we have a cleaner separation of data and its scope. > > Since we never really supported having more than one struct netpoll > register an rx_hook, I got rid of the rx_list. This is replaced by a > single pointer in the netpoll_info structure (np_rx). We still need to > protect addition or removal of the rx_np pointer, and so keep the lock > (rx_lock). There is one lock per struct net_device, and I am certain that > it will be 0 contention, as rx_np will only be changed during an insmod or > rmmod. If people think this would be a good rcu candidate, let me know and > I'll change it to use that locking scheme. It might be simpler to have a single lock here..? > In the process of making these changes, I've fixed a couple other minor > bugs [1]. These fixes are included in this patch, but I will break them > out if people agree with this approach. > > I have tested this by registering multiple netpoll clients, and verifying > that they both function properly. I have not yet tried registering an > rx_hook, but I believe the code should be sufficient to handle that case. > > And so, here is the full patch. I'd appreciate comments. Once we've > reached consensus, I will resubmit as a patch series. I think the general idea is sound. So let's take a look at the patch itself. > Oh, and I've cc'd both netdev@oss.sgi.com and @vger.kernel.org. Is it safe > to just use the vger list? Yes. > [1] netpoll_poll_unlock unlocked and then set the poll_owner. I've > reversed the order of those operations. The netpoll_cleanup code could > dereference a null pointer, that was fixed by virtue of being very > different in the new case. Ok, let's fix the lock ordering bit first. > --- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400 > +++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400 > @@ -131,18 +131,19 @@ static int checksum_udp(struct sk_buff * > static void poll_napi(struct netpoll *np) > { > int budget = 16; > + struct netpoll_info *npinfo = np->dev->npinfo; As a minor point of style, I like to put the "get my private info" lines first. > @@ -245,6 +246,7 @@ repeat: > static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) > { > int status; > + struct netpoll_info *npinfo; > > repeat: > if(!np || !np->dev || !netif_running(np->dev)) { > @@ -253,7 +255,8 @@ repeat: > } > > /* avoid recursion */ > - if(np->poll_owner == smp_processor_id() || > + npinfo = np->dev->npinfo; Again, the npinfo assignment ought to happen as soon as possible. > + if(npinfo->poll_owner == smp_processor_id() || > np->dev->xmit_lock_owner == smp_processor_id()) { > if (np->drop) > np->drop(skb); > @@ -346,7 +349,15 @@ static void arp_reply(struct sk_buff *sk > int size, type = ARPOP_REPLY, ptype = ETH_P_ARP; > u32 sip, tip; > struct sk_buff *send_skb; > - struct netpoll *np = skb->dev->np; > + struct netpoll *np; > + struct netpoll_info *npinfo = skb->dev->npinfo; > + > + if (!npinfo) return; We should only be replying to ARPs if we're trapped, right? How do we get here with npinfo unset? The return ought to be on a separate line, btw. > + spin_lock_irqsave(&npinfo->rx_lock, flags); > + if (npinfo->rx_np->dev == skb->dev) > + np = npinfo->rx_np; > + spin_unlock_irqrestore(&npinfo->rx_lock, flags); And I think that means we don't need the lock here either. > if (!np) return; And the same question and style criticism of my own code. > @@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb) > int proto, len, ulen; > struct iphdr *iph; > struct udphdr *uh; > - struct netpoll *np = skb->dev->np; > + struct netpoll *np = skb->dev->npinfo->rx_np; > > - if (!np->rx_hook) > + if (!np) > goto out; > if (skb->dev->type != ARPHRD_ETHER) > goto out; > @@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np) > { > struct net_device *ndev = NULL; > struct in_device *in_dev; > - > - np->poll_lock = SPIN_LOCK_UNLOCKED; > - np->poll_owner = -1; > + struct netpoll_info *npinfo; > + unsigned long flags; > > if (np->dev_name) > ndev = dev_get_by_name(np->dev_name); > @@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np) > } > > np->dev = ndev; > - ndev->np = np; > + if (!ndev->npinfo) { > + npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL); > + if (!npinfo) > + goto release; > + > + npinfo->rx_np = NULL; > + npinfo->poll_lock = SPIN_LOCK_UNLOCKED; > + npinfo->poll_owner = -1; > + npinfo->rx_lock = SPIN_LOCK_UNLOCKED; > + } else > + npinfo = ndev->npinfo; > > if (!ndev->poll_controller) { > printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n", > @@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np) > np->name, HIPQUAD(np->local_ip)); > } > > - if(np->rx_hook) > - np->rx_flags = NETPOLL_RX_ENABLED; > + if(np->rx_hook) { > + spin_lock_irqsave(&npinfo->rx_lock, flags); > + npinfo->rx_flags |= NETPOLL_RX_ENABLED; > + npinfo->rx_np = np; > + spin_unlock_irqsave(&npinfo->rx_lock, flags); > + } > + /* last thing to do is link it to the net device structure */ > + ndev->npinfo = npinfo; > > return 0; > > release: > - ndev->np = NULL; > + if (!ndev->npinfo) > + kfree(npinfo); > np->dev = NULL; > dev_put(ndev); > return -1; > @@ -706,9 +733,17 @@ int netpoll_setup(struct netpoll *np) > > void netpoll_cleanup(struct netpoll *np) > { > - if (np->dev) > - np->dev->np = NULL; > - dev_put(np->dev); > + struct netpoll_info *npinfo; > + > + if (np->dev) { > + npinfo = np->dev->npinfo; > + if (npinfo && npinfo->rx_np == np) { > + npinfo->rx_np = NULL; > + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; > + } > + dev_put(np->dev); > + } > + > np->dev = NULL; > } > > --- linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400 > +++ linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400 > @@ -1656,6 +1656,7 @@ int netif_receive_skb(struct sk_buff *sk > unsigned short type; > > /* if we've gotten here through NAPI, check netpoll */ > + /* how else can we get here? --phro */ We can get here in the usual route of non-NAPI delivery, IIRC. > if (skb->dev->poll && netpoll_rx(skb)) > return NET_RX_DROP; > > --- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400 > +++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.994422229 -0400 > @@ -16,14 +16,19 @@ struct netpoll; > struct netpoll { > struct net_device *dev; > char dev_name[16], *name; > - int rx_flags; > void (*rx_hook)(struct netpoll *, int, char *, int); > void (*drop)(struct sk_buff *skb); > u32 local_ip, remote_ip; > u16 local_port, remote_port; > unsigned char local_mac[6], remote_mac[6]; > +}; > + > +struct netpoll_info { > spinlock_t poll_lock; > int poll_owner; > + int rx_flags; > + spinlock_t rx_lock; > + struct netpoll *rx_np; /* netpoll that registered an rx_hook */ > }; > > void netpoll_poll(struct netpoll *np); > @@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb); > #ifdef CONFIG_NETPOLL > static inline int netpoll_rx(struct sk_buff *skb) > { > - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb); > + struct netpoll_info *npinfo = skb->dev->npinfo; > + unsigned long flags; > + int ret = 0; > + > + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags)) > + return 0; > + > + spin_lock_irqsave(&npinfo->rx_lock, flags); > + /* check rx_flags again with the lock held */ > + if (npinfo->rx_flags && __netpoll_rx(skb)) > + ret = 1; > + spin_unlock_irqrestore(&npinfo->rx_lock, flags); > + > + return ret; > } This is perhaps a problem due to cache line bouncing. Perhaps we can use an atomic op and a memory barrier instead? > static inline void netpoll_poll_lock(struct net_device *dev) > { > - if (dev->np) { > - spin_lock(&dev->np->poll_lock); > - dev->np->poll_owner = smp_processor_id(); > + if (dev->npinfo) { > + spin_lock(&dev->npinfo->poll_lock); > + dev->npinfo->poll_owner = smp_processor_id(); > } > } > > static inline void netpoll_poll_unlock(struct net_device *dev) > { > - if (dev->np) { > - spin_unlock(&dev->np->poll_lock); > - dev->np->poll_owner = -1; > + if (dev->npinfo) { > + dev->npinfo->poll_owner = -1; > + spin_unlock(&dev->npinfo->poll_lock); > } > } > > --- linux-2.6.12-rc6/include/linux/netdevice.h.orig 2005-06-20 20:26:21.000000000 -0400 > +++ linux-2.6.12-rc6/include/linux/netdevice.h 2005-06-21 14:46:52.093190854 -0400 > @@ -41,7 +41,7 @@ > struct divert_blk; > struct vlan_group; > struct ethtool_ops; > -struct netpoll; > +struct netpoll_info; > /* source back-compat hooks */ > #define SET_ETHTOOL_OPS(netdev,ops) \ > ( (netdev)->ethtool_ops = (ops) ) > @@ -468,7 +468,7 @@ struct net_device > unsigned char *haddr); > int (*neigh_setup)(struct net_device *dev, struct neigh_parms *); > #ifdef CONFIG_NETPOLL > - struct netpoll *np; > + struct netpoll_info *npinfo; > #endif > #ifdef CONFIG_NET_POLL_CONTROLLER > void (*poll_controller)(struct net_device *dev); -- Mathematics is the supreme nostalgia of our time. From davem@davemloft.net Tue Jun 21 16:58:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 16:58:10 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5LNvwH9001255 for ; Tue, 21 Jun 2005 16:58:02 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1Dksbb-0004MQ-0m; Tue, 21 Jun 2005 16:56:35 -0700 Date: Tue, 21 Jun 2005 16:56:34 -0700 (PDT) Message-Id: <20050621.165634.07642938.davem@davemloft.net> To: iod00d@hp.com Cc: mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [PATCH] tg3_msi() and weakly ordered memory From: "David S. Miller" In-Reply-To: <20050614211530.GB25516@esmail.cup.hp.com> References: <20050614154625.GB24371@esmail.cup.hp.com> <1118771563.7059.30.camel@rh4> <20050614211530.GB25516@esmail.cup.hp.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2493 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1614 Lines: 49 Ok, here is the patch I came up with as a result of this thread. Michael stated he would investigate using a pure tag comparison in place of tg3_has_work() when the chip is using tagged interrupts. Thanks. [TG3]: Fix missing memory barriers and SD_STATUS_UPDATED bit clearing. There must be a rmb() between reading the status block tag and calling tg3_has_work(). This was missing in tg3_mis() and tg3_interrupt_tagged(). tg3_poll() got it right. Also, SD_STATUS_UPDATED must be cleared in the status block right before we call tg3_has_work(). Only tg3_poll() got this wrong. Based upon patches and commentary from Grant Grundler and Michael Chan. Signed-off-by: David S. Miller --- 1/drivers/net/tg3.c.~1~ 2005-06-21 16:39:19.000000000 -0700 +++ 2/drivers/net/tg3.c 2005-06-21 16:47:55.000000000 -0700 @@ -2929,6 +2929,7 @@ if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS) tp->last_tag = sblk->status_tag; rmb(); + sblk->status &= ~SD_STATUS_UPDATED; /* if no more work, tell net stack and NIC we're done */ done = !tg3_has_work(tp); @@ -2964,6 +2965,7 @@ */ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); tp->last_tag = sblk->status_tag; + rmb(); sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ @@ -3051,6 +3053,7 @@ tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000001); tp->last_tag = sblk->status_tag; + rmb(); sblk->status &= ~SD_STATUS_UPDATED; if (likely(tg3_has_work(tp))) netif_rx_schedule(dev); /* schedule NAPI poll */ From becker@scyld.com Tue Jun 21 17:16:22 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 17:16:26 -0700 (PDT) Received: from bluewest.scyld.com (scyld.com [64.240.166.233]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5M0GMH9002343 for ; Tue, 21 Jun 2005 17:16:22 -0700 Received: from bluewest.scyld.com (localhost.localdomain [127.0.0.1]) by localhost.localdomain (8.12.11/8.12.11) with ESMTP id j5M08NJp032413; Tue, 21 Jun 2005 17:08:23 -0700 Received: from localhost (becker@localhost) by bluewest.scyld.com (8.12.11/8.12.11/Submit) with ESMTP id j5M08NPr032410; Tue, 21 Jun 2005 17:08:23 -0700 X-Authentication-Warning: bluewest.scyld.com: becker owned process doing -bs Date: Tue, 21 Jun 2005 17:08:23 -0700 (PDT) From: Donald Becker To: Andi Kleen cc: Rick Jones , , Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <20050621223436.GG14251@wotan.suse.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2494 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: becker@scyld.com Precedence: bulk X-list: netdev Content-Length: 2294 Lines: 58 On Wed, 22 Jun 2005, Andi Kleen wrote: > > While much has changed since then, the same basic parameters remain > > - cache line size > > In 96 we had 32 byte cache lines. These days 64-128 are common, > with some 256 byte cache line systems around. Good point. I believe that the most common line size is 64 bytes for L1 cache. Most L2 caches that have larger line sizes still fill only 64 byte blocks unless prefetching is triggered. (Feel free to correct me with non-obscure CPUs and relevant cases. For instance, I know that on the Itanium the 128 byte line L2 cache is used as L1, but only for FPU fetches. That doesn't count.) The implication here is that as soon as we look at the first byte of the MAC address, we have read in 64 bytes. That's a whole minimum-size EThernet frame. > > - frame header size (MAC+IP+ProtocolHeader) > > In 96 people tended to not use time stamps. Ehh, not a big difference. > > - hot cache lines from copying or type classification > Not sure what you mean with that. See the comment above. We decide if a packet is multicast vs. unicast, IP vs. other at approximately interrupt/"rx_copybreak" time. Very few NIC provide this info in status bits, so we end up looking at the packet header. That read moves the previously known-uncached data (after all, it was just came in from a bus write) into the L1 cache for the CPU handling the device. Once it's there, the copy is almost free. [[ Background: Yes, the allocating the new skbuff is very expensive. But we can either allocate a new, correctly-sized skbuff to copy into, or allocate a new full-sized skbuff to replace the one we will send to the Rx queue. ]] > > - cold memory lines from PCI writes > > I suspect in '96 chipsets also didn't do as aggressive prefetching > as they do today. Prefetching helps linear read bandwidth, but we shouldn't be triggering it. And I claim that cache line prefetching only restores the relative balance between L1/L2 caches, otherwise the long L2 cache lines would be very expensive with bump-read-bump-read with linear scans through memory. -- Donald Becker becker@scyld.com Scyld Software Scyld Beowulf cluster systems 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993 From cfriesen@nortel.com Tue Jun 21 21:46:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 21:46:15 -0700 (PDT) Received: from zcars04e.ca.nortel.com (zcars04e.nortelnetworks.com [47.129.242.56]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5M4k5H9019256 for ; Tue, 21 Jun 2005 21:46:08 -0700 Received: from zcard303.ca.nortel.com (zcard303.ca.nortel.com [47.129.242.59]) by zcars04e.ca.nortel.com (Switch-2.2.0/Switch-2.2.0) with ESMTP id j5M4hF912291; Wed, 22 Jun 2005 00:43:16 -0400 (EDT) Received: from nortel.com (acart070.ca.nortel.com [47.130.16.222]) by zcard303.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id MRAF3559; Wed, 22 Jun 2005 00:44:19 -0400 Message-ID: <42B8ECA0.5060904@nortel.com> Date: Tue, 21 Jun 2005 22:44:16 -0600 X-Sybari-Space: 00000000 00000000 00000000 00000000 From: Chris Friesen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Donald Becker CC: Andi Kleen , Rick Jones , netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2495 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cfriesen@nortel.com Precedence: bulk X-list: netdev Content-Length: 482 Lines: 18 Donald Becker wrote: > On Wed, 22 Jun 2005, Andi Kleen wrote: > > >>>While much has changed since then, the same basic parameters remain >>> - cache line size >> >>In 96 we had 32 byte cache lines. These days 64-128 are common, >>with some 256 byte cache line systems around. > > > Good point. > I believe that the most common line size is 64 bytes for L1 cache. If I recall, G4 chips are 32 bytes, and G5s are 128 bytes. Most current x86 chips are 64 bytes though. Chris From grundler@cup.hp.com Tue Jun 21 22:19:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 21 Jun 2005 22:19:07 -0700 (PDT) Received: from palrel10.hp.com (palrel10.hp.com [156.153.255.245]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5M5IpH9021475 for ; Tue, 21 Jun 2005 22:19:04 -0700 Received: from esmail.cup.hp.com (esmail.cup.hp.com [15.0.65.164]) by palrel10.hp.com (Postfix) with ESMTP id 438E01227; Tue, 21 Jun 2005 22:17:31 -0700 (PDT) Received: from localhost.localdomain (debian.cup.hp.com [15.244.57.47]) by esmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id WAA13450; Tue, 21 Jun 2005 22:11:18 -0700 (PDT) Received: by localhost.localdomain (Postfix, from userid 1000) id 7087B90231; Tue, 21 Jun 2005 22:20:12 -0700 (PDT) Date: Tue, 21 Jun 2005 22:20:12 -0700 From: Grant Grundler To: "David S. Miller" Cc: iod00d@hp.com, mchan@broadcom.com, netdev@oss.sgi.com Subject: Re: [PATCH] tg3_msi() and weakly ordered memory Message-ID: <20050622052012.GA17224@esmail.cup.hp.com> References: <20050614154625.GB24371@esmail.cup.hp.com> <1118771563.7059.30.camel@rh4> <20050614211530.GB25516@esmail.cup.hp.com> <20050621.165634.07642938.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050621.165634.07642938.davem@davemloft.net> User-Agent: Mutt/1.5.9i X-archive-position: 2496 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: iod00d@hp.com Precedence: bulk X-list: netdev Content-Length: 1050 Lines: 30 On Tue, Jun 21, 2005 at 04:56:34PM -0700, David S. Miller wrote: > > Ok, here is the patch I came up with as a result of this thread. looks good to me. > Michael stated he would investigate using a pure tag comparison in > place of tg3_has_work() when the chip is using tagged interrupts. The more I think about it, the more I like the idea of each ISR calling into a different tg3_poll routine. The specific _poll() routine could do the "is there more work" checking instead the TX/RX ring cleanup code. The main reason is the "more work" checks can be better optimized for MSI (use tags) vs IRQ Line interrupt (use ring indices) handlers. I also hope to reduce cacheline movement by touching the status block fewer times. This isn't a trivial patch and I'm short on time (preparing stuff for OLS and HP World before my vacation). If there is still interest, I can prototype a patch in late August or Sept (about 8 weeks from now). > Thanks. Welcome and thanks too. BTW, I greatly appreciate Michael clarifying tg3 behavior. thanks, grant From dada1@cosmosbay.com Wed Jun 22 00:35:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 00:35:42 -0700 (PDT) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5M7ZQH9004909 for ; Wed, 22 Jun 2005 00:35:27 -0700 Received: from [172.16.2.14] ([172.16.2.14]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j5M7RjI9003870; Wed, 22 Jun 2005 09:28:01 +0200 Message-ID: <42B912EE.9020909@cosmosbay.com> Date: Wed, 22 Jun 2005 09:27:42 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: gandalf@wlug.westbo.se, hadi@cyberus.ca, shemminger@osdl.org, mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch References: <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> <20050621.133704.08321534.davem@davemloft.net> In-Reply-To: <20050621.133704.08321534.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Wed, 22 Jun 2005 09:30:25 +0200 (CEST) X-archive-position: 2497 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 2216 Lines: 69 David S. Miller a écrit : > > Also, e1000 sends full MTU sized SKBs down into the stack even if the > packet is very small. This also hurts performance a lot. As > discussed elsewhere, it should use a "small packet" cut-off just like > other drivers do. If the RX frame is less than this cut-off value, a > new smaller sized SKB is allocated and the RX data copied into it. > The RX ring SKB is left in-place and given back to the chip. > > My only guess is that the e1000 driver implemented things this way > to simplify the RX recycling logic. Well, it is an area ripe for > improvement in this driver :) > > Here is a copy of a mail from Scott Feldman (19/11/2003) when I asked him to add this copybreak feature into e1000 driver : It did improve performance on my workload. It also reduce the memory requirement a *lot* (It was using 300.000 active TCP sockets, mostly receiving short frames) Eric Dumazet --------------------------------------------------- Try this (untested) patch. It's against 5.2.26 (which you don't have), so hand patch it. (Sorry). Do you have any way to measure performance? CPU utilization? The copy isn't free. Oh, also, this patch doesn't try to recycle the 4K skb that was copied from. Instead, it's freed and re-allocated. Shouldn't be a big deal because your totally system memory allocation should remain constant (except for outstanding copybreak skb's). Let us know how it goes. -scott ---------------- diff -Naurp e1000-5.2.26/src/e1000_main.c e1000-5.2.26-cb/src/e1000_main.c --- e1000-5.2.26/src/e1000_main.c 2003-11-17 19:23:38.000000000 -0800 +++ e1000-5.2.26-cb/src/e1000_main.c 2003-11-18 18:18:07.000000000 -0800 @@ -2343,6 +2343,20 @@ e1000_clean_rx_irq(struct e1000_adapter } } + /* RONCH 11/18/03 - code added for copybreak test */ +#define E1000_CB_LENGTH 128 + if(length < E1000_CB_LENGTH ) { + struct sk_buff *new_skb = dev_alloc_skb(length +2); + if(new_skb) { + skb_reserve(new_skb, 2); + new_skb->dev = netdev; + memcpy(new_skb->data, skb->data, length); + dev_kfree_skb(skb); + skb = new_skb; + } + } + /* end copybreak code */ + /* Good Receive */ skb_put(skb, length - ETHERNET_FCS_SIZE); From vda@ilport.com.ua Wed Jun 22 01:33:30 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 01:33:33 -0700 (PDT) Received: from port.imtp.ilyichevsk.odessa.ua (167.imtp.Ilyichevsk.Odessa.UA [195.66.192.167] (may be forged)) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5M8XLH9008518 for ; Wed, 22 Jun 2005 01:33:26 -0700 Received: (qmail 11549 invoked by alias); 22 Jun 2005 08:31:53 -0000 Received: from unknown (172.17.13.22) by 0 (195.66.192.170) with ESMTP; 22 Jun 2005 08:31:47 -0000 From: Denis Vlasenko To: davem@davemloft.net Subject: [PATCH] Micro optimization in eth_header() Date: Wed, 22 Jun 2005 11:31:43 +0300 User-Agent: KMail/1.5.4 Cc: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_vHSuC2OnpQh1+22" Message-Id: <200506221131.43049.vda@ilport.com.ua> X-archive-position: 2498 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vda@ilport.com.ua Precedence: bulk X-list: netdev Content-Length: 2946 Lines: 75 --Boundary-00=_vHSuC2OnpQh1+22 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Content-Disposition: inline Compile tested only. -- vda --Boundary-00=_vHSuC2OnpQh1+22 Content-Type: text/x-diff; charset="koi8-r"; name="eth.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="eth.c.diff" Micro optimization in eth_header(). Changes in asm code (new on the right): ... 50: shl $0x8,%edx 50: shl $0x8,%edx 53: shr $0x8,%eax 53: shr $0x8,%eax 56: or %eax,%edx 56: or %eax,%edx 58: mov %dx,0xc(%ebx) + 58: test %esi,%esi 5c: mov 0xc(%ebp),%edx + 5a: mov %dx,0xc(%ebx) 5f: test %esi,%esi + 5e: jne 69 61: mov 0xb0(%edx),%al + 60: mov 0xc(%ebp),%esi 67: je 71 + 63: add $0x90,%esi 69: movzbl %al,%eax + 69: mov 0xc(%ebp),%edx 6c: lea 0x6(%ebx),%edi + 6c: movzbl 0xb0(%edx),%eax 6f: jmp 80 + 73: mov %eax,%ecx 71: mov 0xc(%ebp),%edx + 75: lea 0x6(%ebx),%edi 74: movzbl %al,%eax + 78: shr $0x2,%ecx 77: lea 0x6(%ebx),%edi + 7b: repz movsl %ds:(%esi),%es:(%edi) 7a: lea 0x90(%edx),%esi + 7d: mov %eax,%ecx 80: mov %eax,%ecx + 7f: and $0x3,%ecx 82: shr $0x2,%ecx + 82: je 86 85: repz movsl %ds:(%esi),%es:(%edi) + 84: repz movsb %ds:(%esi),%es:(%edi) 87: mov %eax,%ecx + 86: testb $0x88,0x58(%edx) 89: and $0x3,%ecx + 8a: je b1 8c: je 90 + 8c: movzbl 0xb0(%edx),%esi 8e: repz movsb %ds:(%esi),%es:(%edi) + 93: mov %esi,%ecx 90: mov 0xc(%ebp),%eax + 95: shr $0x2,%ecx 93: testb $0x88,0x58(%eax) + 98: mov %ebx,%edi 97: je bc + 9a: xor %eax,%eax 99: movzbl 0xb0(%eax),%edx + 9c: mov %esi,%edx a0: mov %edx,%ecx + 9e: repz stos %eax,%es:(%edi) a2: xor %eax,%eax a4: shr $0x2,%ecx a7: mov %ebx,%edi a9: repz stos %eax,%es:(%edi) .. --- linux-2.6.12-rc2.src/net/ethernet/eth.c.orig Thu Mar 3 09:31:21 2005 +++ linux-2.6.12-rc2.src/net/ethernet/eth.c Wed Jun 22 11:13:54 2005 @@ -92,10 +92,9 @@ int eth_header(struct sk_buff *skb, stru * Set the source hardware address. */ - if(saddr) - memcpy(eth->h_source,saddr,dev->addr_len); - else - memcpy(eth->h_source,dev->dev_addr,dev->addr_len); + if(!saddr) + saddr = dev->dev_addr; + memcpy(eth->h_source,saddr,dev->addr_len); /* * Anyway, the loopback-device should never use this function... --Boundary-00=_vHSuC2OnpQh1+22-- From P@draigBrady.com Wed Jun 22 01:45:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 01:45:20 -0700 (PDT) Received: from corvil.com (gate.corvil.net [213.94.219.177]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5M8j9H9010012 for ; Wed, 22 Jun 2005 01:45:10 -0700 Received: from draigBrady.com (pixelbeat.local.corvil.com [172.18.1.170]) by corvil.com (8.13.3/8.13.3) with ESMTP id j5M8guap086118; Wed, 22 Jun 2005 09:42:57 +0100 (IST) (envelope-from P@draigBrady.com) Message-ID: <42B92490.40005@draigBrady.com> Date: Wed, 22 Jun 2005 09:42:56 +0100 From: P@draigBrady.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: gandalf@wlug.westbo.se, hadi@cyberus.ca, shemminger@osdl.org, mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch References: <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> <20050621.133704.08321534.davem@davemloft.net> In-Reply-To: <20050621.133704.08321534.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2499 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: P@draigBrady.com Precedence: bulk X-list: netdev Content-Length: 1010 Lines: 22 David S. Miller wrote: > Also, e1000 sends full MTU sized SKBs down into the stack even if the > packet is very small. This also hurts performance a lot. As > discussed elsewhere, it should use a "small packet" cut-off just like > other drivers do. If the RX frame is less than this cut-off value, a > new smaller sized SKB is allocated and the RX data copied into it. > The RX ring SKB is left in-place and given back to the chip. Yes the copy is essentially free here as the data is already cached. As a data point, I went the whole hog and used buffer recycling in my essentially packet sniffing application. I.E. there are no allocs per packet at all, and this make a HUGE difference. On a 2x3.4GHz 2xe1000 system I can receive 620Kpps per port sustained into my userspace app which does a LOT of processing per packet. Without the buffer recycling is was around 250Kpps. Note I don't reuse an skb until the packet is copied into a PACKET_MMAP buffer. -- Pádraig Brady - http://www.pixelbeat.org -- From michael.vittrup.larsen@ericsson.com Wed Jun 22 02:18:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 02:18:40 -0700 (PDT) Received: from mailgw4.ericsson.se (mailgw4.ericsson.se [193.180.251.62]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5M9IYH9012441 for ; Wed, 22 Jun 2005 02:18:35 -0700 Received: from esealmw126.eemea.ericsson.se (unknown [153.88.254.123]) by mailgw4.ericsson.se (Symantec Mail Security) with ESMTP id 64213BA1; Wed, 22 Jun 2005 11:17:11 +0200 (CEST) Received: from esealmw128.eemea.ericsson.se ([153.88.254.172]) by esealmw126.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.211); Wed, 22 Jun 2005 11:17:10 +0200 Received: from unixmail.ted.dk.eu.ericsson.se ([213.159.188.246]) by esealmw128.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.211); Wed, 22 Jun 2005 11:17:10 +0200 Received: from begonia.ted.dk.eu.ericsson.se (tedmvla@begonia.ted.ericsson.se [213.159.189.32]) by unixmail.ted.dk.eu.ericsson.se (8.10.1/8.10.1/TEDmain-1.0) with ESMTP id j5M9H5M01167; Wed, 22 Jun 2005 11:17:06 +0200 (MEST) From: Michael Vittrup Larsen Organization: Ericsson To: "David S. Miller" Subject: Re: [PATCH] tcp: efficient port randomistion (rev 3) Date: Wed, 22 Jun 2005 11:17:03 +0200 User-Agent: KMail/1.7.2 Cc: Stephen Hemminger , netdev@oss.sgi.com References: <20041220153916.6c00c114.davem@davemloft.net> In-Reply-To: <20041220153916.6c00c114.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200506221117.04334.michael.vittrup.larsen@ericsson.com> X-OriginalArrivalTime: 22 Jun 2005 09:17:10.0147 (UTC) FILETIME=[2B1B1D30:01C5770B] X-Brightmail-Tracker: AAAAAA== X-archive-position: 2500 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael.vittrup.larsen@ericsson.com Precedence: bulk X-list: netdev Content-Length: 597 Lines: 18 On Tuesday 21 December 2004 00:39, David S. Miller wrote: > On Fri, 10 Dec 2004 17:09:00 -0800 > > Stephen Hemminger wrote: > > okay, here is the revised version. Testing shows that it > > is more consistent, and just as fast as existing code, > > probably because of the getting rid of portalloc_lock and > > better distribution. > > > > Signed-off-by: Stephen Hemminger > > Queued up for 2.6.11, thanks Stephen. What's the status of this - I see it is not part of 2.6.12? Is there a general dislike of the port randomisation mechanism or? /Michael From maca02@atlas.cz Wed Jun 22 03:08:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 03:08:37 -0700 (PDT) Received: from localhost.localdomain (maca.fortech.cz [213.250.192.50]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MA8TH9019226 for ; Wed, 22 Jun 2005 03:08:35 -0700 Received: from localhost (localhost [127.0.0.1]) by localhost.localdomain (8.12.11/8.12.8) with ESMTP id j5MA70MW003389 for ; Wed, 22 Jun 2005 11:07:00 +0100 Date: Wed, 22 Jun 2005 12:07:00 +0200 (CEST) From: =?ISO-8859-2?Q?Tom=E1=B9_Macek?= X-X-Sender: root@localhost.localdomain To: netdev@oss.sgi.com Subject: Re: receive only one record from the routing table In-Reply-To: <20050618202359.GP22463@postel.suug.ch> Message-ID: References: <20050617141527.GN22463@postel.suug.ch> <20050617191340.GO22463@postel.suug.ch> <20050618202359.GP22463@postel.suug.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2501 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: maca02@atlas.cz Precedence: bulk X-list: netdev Content-Length: 3316 Lines: 96 >> ...or give libnl a second chance I would like to give it second change, but when typing 'make', it outputs this: ... Entering lib MAKE libnl.so.0.5.1 CC helpers.c helpers.c:417: error: `ARPHRD_EUI64' undeclared here (not in a function) helpers.c:417: error: initializer element is not constant helpers.c:417: error: (near initialization for `llprotos[13].i') helpers.c:417: error: initializer element is not constant helpers.c:417: error: (near initialization for `llprotos[13]') helpers.c:418: error: initializer element is not constant helpers.c:418: error: (near initialization for `llprotos[14]') helpers.c:419: error: initializer element is not constant helpers.c:419: error: (near initialization for `llprotos[15]') helpers.c:420: error: initializer element is not constant helpers.c:420: error: (near initialization for `llprotos[16]') helpers.c:421: error: initializer element is not constant helpers.c:421: error: (near initialization for `llprotos[17]') ... On Sat, 18 Jun 2005, Thomas Graf wrote: > * Tom?? Macek 2005-06-18 20:55 >> The 'rtm_dst_len = 16' should mean the mask of the route I'm looking for, correct? > > Yes. > >> The whole code before sending the packet is below: >> >> >> /* Create Socket */ >> if((sock = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)) < 0) >> perror("Socket Creation: "); >> >> /* Initialize the buffer */ >> memset(msgBuf, 0, BUFSIZE); >> >> /* point the header and the msg structure pointers into the buffer */ >> nlMsg = (struct nlmsghdr *)msgBuf; >> rtMsg = (struct rtmsg *)NLMSG_DATA(nlMsg); >> rtMsg->rtm_family = AF_INET; >> rtMsg->rtm_dst_len = 16; >> >> /* Fill in the nlmsg header*/ >> nlMsg->nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)); // Length of message. >> nlMsg->nlmsg_type = RTM_GETROUTE; // Get the routes from kernel routing table . >> nlMsg->nlmsg_flags = NLM_F_REQUEST; // The message is a request for dump. >> nlMsg->nlmsg_seq = msgSeq++; // Sequence of the message packet. >> nlMsg->nlmsg_pid = getpid(); // PID of process sending the request. >> >> char *cp; >> unsigned int xx[4]; int i = 0; >> unsigned char *ap = (unsigned char *)xx; >> for (cp = argv[1], i = 0; *cp; cp++) { >> if (*cp <= '9' && *cp >= '0') { >> ap[i] = 10*ap[i] + (*cp-'0'); >> continue; >> } >> if (*cp == '.' && ++i <= 3) >> continue; >> return -1; >> } >> >> NetlinkAddAttr(nlMsg, sizeof(nlMsg), RTA_DST, &xx, 4); > > This looks good but your NetlinkAddAttr is bogus, it should > be something like this: > > int nl_msg_append_tlv(struct nlmsghdr *n, int type, void *data, size_t len) > { > int tlen; > struct rtattr *rta; > > tlen = NLMSG_ALIGN(n->nlmsg_len) + RTA_LENGTH(NLMSG_ALIGN(len)); > > rta = (struct rtattr *) NLMSG_TAIL(n); > rta->rta_type = type; > rta->rta_len = RTA_LENGTH(NLMSG_ALIGN(len)); > memcpy(RTA_DATA(rta), data, len); > n->nlmsg_len = tlen; > > return 0; > } > > Your code is missing various alignment requirements. I can't tell > whether this is the last bug. I recommend you to read ip/iproute.c > in the iproute2 source or give libnl a second chance. > > > > > > From ak@suse.de Wed Jun 22 04:33:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 04:33:13 -0700 (PDT) Received: from mx1.suse.de (ns.suse.de [195.135.220.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MBX5H9001358 for ; Wed, 22 Jun 2005 04:33:07 -0700 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.suse.de (Postfix) with ESMTP id 8E200EE23; Wed, 22 Jun 2005 13:31:40 +0200 (CEST) Date: Wed, 22 Jun 2005 13:31:32 +0200 From: Andi Kleen To: Chris Friesen Cc: Donald Becker , Andi Kleen , Rick Jones , netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050622113132.GR14251@wotan.suse.de> References: <42B8ECA0.5060904@nortel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42B8ECA0.5060904@nortel.com> X-archive-position: 2502 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 190 Lines: 6 > If I recall, G4 chips are 32 bytes, and G5s are 128 bytes. Most current > x86 chips are 64 bytes though. P4s are effectively 128 byte. And that is the most common x86 right now. -Andi From jmoyer@redhat.com Wed Jun 22 04:49:04 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 04:49:13 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MBn3H9008875 for ; Wed, 22 Jun 2005 04:49:03 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5MBlciM029286; Wed, 22 Jun 2005 07:47:38 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5MBlcu15384; Wed, 22 Jun 2005 07:47:38 -0400 Received: from segfault.boston.redhat.com (segfault.boston.redhat.com [172.16.80.57]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j5MBlcc0024090; Wed, 22 Jun 2005 07:47:38 -0400 Received: from segfault.boston.redhat.com (localhost.localdomain [127.0.0.1]) by segfault.boston.redhat.com (8.13.1/8.13.1) with ESMTP id j5MBlbVM017334; Wed, 22 Jun 2005 07:47:37 -0400 Received: (from jmoyer@localhost) by segfault.boston.redhat.com (8.13.1/8.13.1/Submit) id j5MBlbpG017331; Wed, 22 Jun 2005 07:47:37 -0400 From: Jeff Moyer MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17081.20441.714191.528270@segfault.boston.redhat.com> Date: Wed, 22 Jun 2005 07:47:37 -0400 To: Matt Mackall Cc: netdev@oss.sgi.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [patch,rfc] allow registration of multiple netpolls per interface In-Reply-To: <20050621225252.GY27572@waste.org> References: <17080.35214.507402.998984@segfault.boston.redhat.com> <20050621225252.GY27572@waste.org> X-Mailer: VM 7.17 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid Reply-To: jmoyer@redhat.com X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? X-archive-position: 2503 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmoyer@redhat.com Precedence: bulk X-list: netdev Content-Length: 12219 Lines: 348 ==> Regarding Re: [patch,rfc] allow registration of multiple netpolls per interface; Matt Mackall adds: mpm> On Tue, Jun 21, 2005 at 05:41:34PM -0400, Jeff Moyer wrote: >> Hi, >> >> This patch restores functionality that was removed when the recursive -> poll bug was fixed. Namely, it allows multiple netpoll clients to >> register against the same network interface. mpm> Thanks. I've been neglecting this for a bit while I've been busy with mpm> other things. >> In order to put things into perspective, I'm going to provide some >> background information. So, here is how things used to work: >> >> Multiple users of the netpoll interface could register themselves to send >> packets over the same interface. Any number of these netpoll clients could >> register an rx_hook, as well. However, only the very first in the list >> (hence the last one that registered), that matched the incoming interface, >> would be called when a packet arrived. The reason for this was not design, >> it was an oversight in the implementation. In practice, however, no one >> ever stumbled over this. (There are more subtleties when dealing with >> multiple rx_hooks registered to the same interface, but we'll ignore these, >> since no one ever ran into such problems.) mpm> Hmm. It's conceivable we'll want netdump and kgdb on the same mpm> interface so we'll have to address this eventually.. Well, do you want to address it eventually, or now? As I said, it's never bitten anyone before. >> Note that each netpoll client that registered an rx_hook was put on a >> netpoll_rx_list. This list was protected by a spinlock, and so operations >> which touched the rx routines would incur a locking penalty and a list >> traversal. I am mentioning this because the list and associated lock were >> removed when the code was refactored, and the patches I propose will >> reintroduce the lock, but not the list. mpm> ..so we'll probably want the list back in some form. Sigh. >> Moving to what we have today: >> >> Multiple netpoll clients can register to send packets over the same >> interface. That's right, you can actually do this. However, there are >> ugly side effects. Because we now have a pointer from the net_device to a >> struct netpoll, the last netpoll client to register will be pointed to by >> the net_device->np. What this means is that if you had two clients, the >> first registers an rx_hook and the second does not, then the netpoll code >> will not know that any device has actually registered an rx_hook (since the >> np pointer in the struct net_device is overwritten)! As a result, no >> incoming packets will be delivered to the registered rx routine. This is >> clearly undesirable behaviour. >> >> So what does the patch do? >> >> I created a new structure: >> >> struct netpoll_info { >> spinlock_t poll_lock; >> int poll_owner; >> int rx_flags; >> spinlock_t rx_lock; >> struct netpoll *rx_np; /* netpoll that registered an rx_hook */ >> }; >> >> This is the structure which gets pointed to by the net_device. All of the >> flags and locks which are specific to the INTERFACE go here. Any variables >> which must be kept per struct netpoll were left in the struct netpoll. So >> now, we have a cleaner separation of data and its scope. >> >> Since we never really supported having more than one struct netpoll >> register an rx_hook, I got rid of the rx_list. This is replaced by a >> single pointer in the netpoll_info structure (np_rx). We still need to >> protect addition or removal of the rx_np pointer, and so keep the lock >> (rx_lock). There is one lock per struct net_device, and I am certain that >> it will be 0 contention, as rx_np will only be changed during an insmod or >> rmmod. If people think this would be a good rcu candidate, let me know and >> I'll change it to use that locking scheme. mpm> It might be simpler to have a single lock here..? Maybe. You can't really have netpoll code running on multiple cpus at the same time, right? This is the rx path, remember, so the other cpu should be spinning on the poll_lock. Keeping separate locks would allow you to unregister a struct netpoll associated with another net device without causing lock contention. This is a very minor win, obviously. I still feel like this npinfo struct is the right place for this, though. If you're strongly opposed to that, I'll change it. >> In the process of making these changes, I've fixed a couple other minor >> bugs [1]. These fixes are included in this patch, but I will break them >> out if people agree with this approach. >> >> I have tested this by registering multiple netpoll clients, and verifying >> that they both function properly. I have not yet tried registering an >> rx_hook, but I believe the code should be sufficient to handle that case. >> >> And so, here is the full patch. I'd appreciate comments. Once we've >> reached consensus, I will resubmit as a patch series. mpm> I think the general idea is sound. So let's take a look at the patch itself. >> Oh, and I've cc'd both netdev@oss.sgi.com and @vger.kernel.org. Is it safe >> to just use the vger list? mpm> Yes. >> [1] netpoll_poll_unlock unlocked and then set the poll_owner. I've >> reversed the order of those operations. The netpoll_cleanup code could >> dereference a null pointer, that was fixed by virtue of being very >> different in the new case. mpm> Ok, let's fix the lock ordering bit first. >> --- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400 >> +++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400 >> @@ -131,18 +131,19 @@ static int checksum_udp(struct sk_buff * >> static void poll_napi(struct netpoll *np) >> { >> int budget = 16; >> + struct netpoll_info *npinfo = np->dev->npinfo; mpm> As a minor point of style, I like to put the "get my private info" mpm> lines first. Quite the minor nit! It's the second line in the function! I'll fix it, though. ;) >> @@ -245,6 +246,7 @@ repeat: >> static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) >> { >> int status; >> + struct netpoll_info *npinfo; >> >> repeat: >> if(!np || !np->dev || !netif_running(np->dev)) { >> @@ -253,7 +255,8 @@ repeat: >> } >> >> /* avoid recursion */ >> - if(np->poll_owner == smp_processor_id() || >> + npinfo = np->dev->npinfo; mpm> Again, the npinfo assignment ought to happen as soon as possible. This is as soon as possible. Note that above we check to see if np and np->dev are valid pointers. We can't get to the npinfo struct before we know that. >> + if(npinfo->poll_owner == smp_processor_id() || np-> dev->xmit_lock_owner == smp_processor_id()) { >> if (np->drop) np-> drop(skb); >> @@ -346,7 +349,15 @@ static void arp_reply(struct sk_buff *sk >> int size, type = ARPOP_REPLY, ptype = ETH_P_ARP; >> u32 sip, tip; >> struct sk_buff *send_skb; >> - struct netpoll *np = skb->dev->np; >> + struct netpoll *np; >> + struct netpoll_info *npinfo = skb->dev->npinfo; >> + >> + if (!npinfo) return; mpm> We should only be replying to ARPs if we're trapped, right? How do we mpm> get here with npinfo unset? Good point. mpm> The return ought to be on a separate line, btw. Agreed. >> + spin_lock_irqsave(&npinfo->rx_lock, flags); >> + if (npinfo->rx_np->dev == skb->dev) >> + np = npinfo->rx_np; >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags); mpm> And I think that means we don't need the lock here either. Sure we do. We need to protect against rmmod's. >> if (!np) return; mpm> And the same question and style criticism of my own code. ;) >> @@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb) >> int proto, len, ulen; >> struct iphdr *iph; >> struct udphdr *uh; >> - struct netpoll *np = skb->dev->np; >> + struct netpoll *np = skb->dev->npinfo->rx_np; >> >> - if (!np->rx_hook) >> + if (!np) >> goto out; >> if (skb->dev->type != ARPHRD_ETHER) >> goto out; >> @@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np) >> { >> struct net_device *ndev = NULL; >> struct in_device *in_dev; >> - >> - np->poll_lock = SPIN_LOCK_UNLOCKED; >> - np->poll_owner = -1; >> + struct netpoll_info *npinfo; >> + unsigned long flags; >> >> if (np->dev_name) >> ndev = dev_get_by_name(np->dev_name); >> @@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np) >> } >> np-> dev = ndev; >> - ndev->np = np; >> + if (!ndev->npinfo) { >> + npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL); >> + if (!npinfo) >> + goto release; >> + >> + npinfo->rx_np = NULL; >> + npinfo->poll_lock = SPIN_LOCK_UNLOCKED; >> + npinfo->poll_owner = -1; >> + npinfo->rx_lock = SPIN_LOCK_UNLOCKED; >> + } else >> + npinfo = ndev->npinfo; >> >> if (!ndev->poll_controller) { >> printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n", >> @@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np) np-> name, HIPQUAD(np->local_ip)); >> } >> >> - if(np->rx_hook) >> - np->rx_flags = NETPOLL_RX_ENABLED; >> + if(np->rx_hook) { >> + spin_lock_irqsave(&npinfo->rx_lock, flags); >> + npinfo->rx_flags |= NETPOLL_RX_ENABLED; >> + npinfo->rx_np = np; >> + spin_unlock_irqsave(&npinfo->rx_lock, flags); >> + } >> + /* last thing to do is link it to the net device structure */ >> + ndev->npinfo = npinfo; >> >> return 0; >> >> release: >> - ndev->np = NULL; >> + if (!ndev->npinfo) >> + kfree(npinfo); np-> dev = NULL; >> dev_put(ndev); >> return -1; >> @@ -706,9 +733,17 @@ int netpoll_setup(struct netpoll *np) >> >> void netpoll_cleanup(struct netpoll *np) >> { >> - if (np->dev) >> - np->dev->np = NULL; >> - dev_put(np->dev); >> + struct netpoll_info *npinfo; >> + >> + if (np->dev) { >> + npinfo = np->dev->npinfo; >> + if (npinfo && npinfo->rx_np == np) { >> + npinfo->rx_np = NULL; >> + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; >> + } >> + dev_put(np->dev); >> + } >> + np-> dev = NULL; >> } >> >> --- linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400 >> +++ linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400 >> @@ -1656,6 +1656,7 @@ int netif_receive_skb(struct sk_buff *sk >> unsigned short type; >> >> /* if we've gotten here through NAPI, check netpoll */ >> + /* how else can we get here? --phro */ mpm> We can get here in the usual route of non-NAPI delivery, IIRC. I couldn't find that path. I'll look again. >> if (skb->dev->poll && netpoll_rx(skb)) >> return NET_RX_DROP; >> >> --- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400 >> +++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.994422229 -0400 >> @@ -16,14 +16,19 @@ struct netpoll; >> struct netpoll { >> struct net_device *dev; >> char dev_name[16], *name; >> - int rx_flags; >> void (*rx_hook)(struct netpoll *, int, char *, int); >> void (*drop)(struct sk_buff *skb); >> u32 local_ip, remote_ip; >> u16 local_port, remote_port; >> unsigned char local_mac[6], remote_mac[6]; >> +}; >> + >> +struct netpoll_info { >> spinlock_t poll_lock; >> int poll_owner; >> + int rx_flags; >> + spinlock_t rx_lock; >> + struct netpoll *rx_np; /* netpoll that registered an rx_hook */ >> }; >> >> void netpoll_poll(struct netpoll *np); >> @@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb); >> #ifdef CONFIG_NETPOLL >> static inline int netpoll_rx(struct sk_buff *skb) >> { >> - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb); >> + struct netpoll_info *npinfo = skb->dev->npinfo; >> + unsigned long flags; >> + int ret = 0; >> + >> + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags)) >> + return 0; >> + >> + spin_lock_irqsave(&npinfo->rx_lock, flags); >> + /* check rx_flags again with the lock held */ >> + if (npinfo->rx_flags && __netpoll_rx(skb)) >> + ret = 1; >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags); >> + >> + return ret; >> } mpm> This is perhaps a problem due to cache line bouncing. Perhaps we can mpm> use an atomic op and a memory barrier instead? It really should be a 0 contention lock. Let's not optimize something that doesn't need it. If we find that it causes problems, I'll be more than happy to fix it. Thanks for the review, Matt. I'll put together another patch, test it, and repost later today. -Jeff From dada1@cosmosbay.com Wed Jun 22 05:58:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 05:58:24 -0700 (PDT) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MCwKH9012896 for ; Wed, 22 Jun 2005 05:58:21 -0700 Received: from [172.16.2.14] ([172.16.2.14]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j5MCuu6L025845; Wed, 22 Jun 2005 14:56:56 +0200 Message-ID: <42B96017.1050803@cosmosbay.com> Date: Wed, 22 Jun 2005 14:56:55 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: [PATCH] dont use strlen() but the result from a prior sprintf() References: <20050614154625.GB24371@esmail.cup.hp.com> <1118771563.7059.30.camel@rh4> <20050614211530.GB25516@esmail.cup.hp.com> <20050621.165634.07642938.davem@davemloft.net> In-Reply-To: <20050621.165634.07642938.davem@davemloft.net> Content-Type: multipart/mixed; boundary="------------030608070101010705060000" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Wed, 22 Jun 2005 14:56:56 +0200 (CEST) X-archive-position: 2504 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 1598 Lines: 52 This is a multi-part message in MIME format. --------------030608070101010705060000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi David Small patch to save an unecessary call to strlen() : sprintf() gave us the length, just trust it. Thank you Eric Dumazet diff -Nu linux-2.6.12-orig/net/socket.c linux-2.6.12/net/socket.c --- linux-2.6.12-orig/net/socket.c 2005-06-22 14:47:56.000000000 +0200 +++ linux-2.6.12/net/socket.c 2005-06-22 14:49:22.000000000 +0200 @@ -382,9 +382,8 @@ goto out; } - sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); + this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); this.name = name; - this.len = strlen(name); this.hash = SOCK_INODE(sock)->i_ino; file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this); --------------030608070101010705060000 Content-Type: text/plain; name="patch.1" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch.1" --- linux-2.6.12-orig/net/socket.c 2005-06-22 14:47:56.000000000 +0200 +++ linux-2.6.12/net/socket.c 2005-06-22 14:49:22.000000000 +0200 @@ -382,9 +382,8 @@ goto out; } - sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); + this.len = sprintf(name, "[%lu]", SOCK_INODE(sock)->i_ino); this.name = name; - this.len = strlen(name); this.hash = SOCK_INODE(sock)->i_ino; file->f_dentry = d_alloc(sock_mnt->mnt_sb->s_root, &this); --------------030608070101010705060000-- From maca02@atlas.cz Wed Jun 22 06:55:16 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 06:55:18 -0700 (PDT) Received: from localhost.localdomain (maca.fortech.cz [213.250.192.50]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MDtEH9016514 for ; Wed, 22 Jun 2005 06:55:15 -0700 Received: from localhost (localhost [127.0.0.1]) by localhost.localdomain (8.12.11/8.12.8) with ESMTP id j5MDrsx0003823 for ; Wed, 22 Jun 2005 14:53:54 +0100 Date: Wed, 22 Jun 2005 15:53:54 +0200 (CEST) From: =?ISO-8859-2?Q?Tom=E1=B9_Macek?= X-X-Sender: root@localhost.localdomain To: netdev@oss.sgi.com Subject: Print one record only - addition In-Reply-To: <20050618202359.GP22463@postel.suug.ch> Message-ID: References: <20050617141527.GN22463@postel.suug.ch> <20050617191340.GO22463@postel.suug.ch> <20050618202359.GP22463@postel.suug.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2505 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: maca02@atlas.cz Precedence: bulk X-list: netdev Content-Length: 3747 Lines: 96 On Sat, 18 Jun 2005, Thomas Graf wrote: > * Tom?? Macek 2005-06-18 20:55 >> The 'rtm_dst_len = 16' should mean the mask of the route I'm looking for, correct? > > Yes. > >> The whole code before sending the packet is below: >> >> >> /* Create Socket */ >> if((sock = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)) < 0) >> perror("Socket Creation: "); >> >> /* Initialize the buffer */ >> memset(msgBuf, 0, BUFSIZE); >> >> /* point the header and the msg structure pointers into the buffer */ >> nlMsg = (struct nlmsghdr *)msgBuf; >> rtMsg = (struct rtmsg *)NLMSG_DATA(nlMsg); >> rtMsg->rtm_family = AF_INET; >> rtMsg->rtm_dst_len = 16; >> >> /* Fill in the nlmsg header*/ >> nlMsg->nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)); // Length of message. >> nlMsg->nlmsg_type = RTM_GETROUTE; // Get the routes from kernel routing table . >> nlMsg->nlmsg_flags = NLM_F_REQUEST; // The message is a request for dump. >> nlMsg->nlmsg_seq = msgSeq++; // Sequence of the message packet. >> nlMsg->nlmsg_pid = getpid(); // PID of process sending the request. >> >> char *cp; >> unsigned int xx[4]; int i = 0; >> unsigned char *ap = (unsigned char *)xx; >> for (cp = argv[1], i = 0; *cp; cp++) { >> if (*cp <= '9' && *cp >= '0') { >> ap[i] = 10*ap[i] + (*cp-'0'); >> continue; >> } >> if (*cp == '.' && ++i <= 3) >> continue; >> return -1; >> } >> >> NetlinkAddAttr(nlMsg, sizeof(nlMsg), RTA_DST, &xx, 4); > > This looks good but your NetlinkAddAttr is bogus, it should > be something like this: > > int nl_msg_append_tlv(struct nlmsghdr *n, int type, void *data, size_t len) > { > int tlen; > struct rtattr *rta; > > tlen = NLMSG_ALIGN(n->nlmsg_len) + RTA_LENGTH(NLMSG_ALIGN(len)); > > rta = (struct rtattr *) NLMSG_TAIL(n); > rta->rta_type = type; > rta->rta_len = RTA_LENGTH(NLMSG_ALIGN(len)); > memcpy(RTA_DATA(rta), data, len); > n->nlmsg_len = tlen; > > return 0; > } > > Your code is missing various alignment requirements. I can't tell > whether this is the last bug. I recommend you to read ip/iproute.c > in the iproute2 source or give libnl a second chance. > The code now works this way: [root@localhost route]# route 1.1.1.0 * 255.255.255.0 U 0 0 0 eth0 3.3.0.0 * 255.255.0.0 U 0 0 0 eth1 default meric 0.0.0.0 UG 0 0 0 eth0 [root@localhost route]# ./a.out 2.2.2.2 16 Destination Gateway Interface Source Netmask 2.2.2.2 213.250.192.33 eth0 255.255.255.255 [root@localhost route]# ./a.out 1.1.1.2 16 Destination Gateway Interface Source Netmask 1.1.1.2 *.*.*.* eth0 255.255.255.255 [root@localhost route]# ./a.out 3.3.3.2 16 Destination Gateway Interface Source Netmask 3.3.3.2 *.*.*.* eth1 255.255.255.255 so it returns the route, where the data would go, if their DST address would be the one given by the argv[1] with mask argv[2]. I don't know now, if we understood to each other and if this is you thought it should be. If I will write on the command line './a.out 3.3.0.0 16', it should print the line like this if the record is present: 3.3.0.0 * 255.255.0.0 U 0 0 0 eth1 if I would write './a.out 3.3.3.1 32' it MUST print nothing! :) Thank you for help From dada1@cosmosbay.com Wed Jun 22 08:27:09 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 08:27:12 -0700 (PDT) Received: from gw1.cosmosbay.com (gw1.cosmosbay.com [62.23.185.226]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MFR1H9029247 for ; Wed, 22 Jun 2005 08:27:05 -0700 Received: from [172.16.2.14] ([172.16.2.14]) by gw1.cosmosbay.com (8.13.3/8.13.3) with ESMTP id j5MFP8Wf030017; Wed, 22 Jun 2005 17:25:10 +0200 Message-ID: <42B982D4.9040704@cosmosbay.com> Date: Wed, 22 Jun 2005 17:25:08 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com, mchan@broadcom.com Subject: [TG3]: About hw coalescing infrastructure. References: <20050511.141530.57445142.davem@davemloft.net> In-Reply-To: <20050511.141530.57445142.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Wed, 22 Jun 2005 17:25:12 +0200 (CEST) X-archive-position: 2506 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 8565 Lines: 243 David S. Miller a écrit : > Ok, now that we have the tagged status stuff sorted I began > to work on putting the hw mitigation bits back into the > driver. The discussion on the DMA rw-ctrl settings is still > ongoing, but I will get back to it shortly. > > This is the first step, we cache the settings in the tg3 > struct and put those values into the chip via tg3_set_coalesce(). > > ETHTOOL_GCOALESCE is supported, setting is not. > Hi David I am using 2.6.12 now, but still experiment a high number of interrupts per second on my tg3 NIC, on an dual Opteron based machine. (about 7300 interrupts per second generated by one interface eth0, 100Mbit/s link) Is there anything I can try to tune the coalescing ? Being able to handle 100 packets each interrupt instead of one or two would certainly help. I dont mind about latency. But of course I would like not to drop packets :) But maybe the BCM5702 is not able to delay an interrupt ? Thank you Eric Dumazet ---------------------------------------------------------------------------------------- # lspci -v 02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702 Gigabit Ethernet (rev 02) Subsystem: Broadcom Corporation BCM5702 1000Base-T Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 27 Memory at 00000000fe000000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at [disabled] [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- # ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 1000000 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 20 rx-frames: 5 rx-usecs-irq: 20 rx-frames-irq: 5 tx-usecs: 72 tx-frames: 53 tx-usecs-irq: 20 tx-frames-irq: 5 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 # ethtool -S eth0 NIC statistics: rx_octets: 104634072366 rx_fragments: 0 rx_ucast_packets: 852685070 rx_mcast_packets: 0 rx_bcast_packets: 20429 rx_fcs_errors: 0 rx_align_errors: 0 rx_xon_pause_rcvd: 0 rx_xoff_pause_rcvd: 0 rx_mac_ctrl_rcvd: 0 rx_xoff_entered: 0 rx_frame_too_long_errors: 0 rx_jabbers: 0 rx_undersize_packets: 0 rx_in_length_errors: 0 rx_out_length_errors: 0 rx_64_or_less_octet_packets: 451956709 rx_65_to_127_octet_packets: 272058231 rx_128_to_255_octet_packets: 63364655 rx_256_to_511_octet_packets: 35814973 rx_512_to_1023_octet_packets: 11867701 rx_1024_to_1522_octet_packets: 17643210 rx_1523_to_2047_octet_packets: 0 rx_2048_to_4095_octet_packets: 0 rx_4096_to_8191_octet_packets: 0 rx_8192_to_9022_octet_packets: 0 tx_octets: 134640205605 tx_collisions: 0 tx_xon_sent: 0 tx_xoff_sent: 0 tx_flow_control: 0 tx_mac_errors: 0 tx_single_collisions: 0 tx_mult_collisions: 0 tx_deferred: 0 tx_excessive_collisions: 0 tx_late_collisions: 0 tx_collide_2times: 0 tx_collide_3times: 0 tx_collide_4times: 0 tx_collide_5times: 0 tx_collide_6times: 0 tx_collide_7times: 0 tx_collide_8times: 0 tx_collide_9times: 0 tx_collide_10times: 0 tx_collide_11times: 0 tx_collide_12times: 0 tx_collide_13times: 0 tx_collide_14times: 0 tx_collide_15times: 0 tx_ucast_packets: 774312055 tx_mcast_packets: 13 tx_bcast_packets: 246 tx_carrier_sense_errors: 0 tx_discards: 0 tx_errors: 0 dma_writeq_full: 21375 dma_write_prioq_full: 0 rxbds_empty: 0 rx_discards: 2644 rx_errors: 0 rx_threshold_hit: 57384403 dma_readq_full: 27100189 dma_read_prioq_full: 1557267 tx_comp_queue_full: 35712755 ring_set_send_prod_index: 747986769 ring_status_update: 502110997 nic_irqs: 446148615 nic_avoided_irqs: 55962382 nic_tx_threshold_hit: 37282069 # ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 511 RX Mini: 0 RX Jumbo: 255 TX: 0 Current hardware settings: RX: 200 RX Mini: 0 RX Jumbo: 100 TX: 511 # ethtool eth0 Settings for eth0: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x000000ff (255) Link detected: yes # cat /proc/interrupts (HZ=200) CPU0 CPU1 0: 164055 14038348 IO-APIC-edge timer 2: 0 0 XT-PIC cascade 8: 0 0 IO-APIC-edge rtc 14: 4073 368224 IO-APIC-edge ide0 15: 0 20 IO-APIC-edge ide1 27: 35985951 421578656 IO-APIC-level eth0, eth1 NMI: 874625217 905019517 (oprofile running) LOC: 14201857 14201976 ERR: 0 MIS: 0 oprofile data : # more /tmp/vmlinux.oprofile CPU: Hammer, speed 2205.08 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples cum. samples % cum. % symbol name 20208503 20208503 7.7982 7.7982 ipt_do_table 8336463 28544966 3.2169 11.0151 tcp_v4_rcv 7746814 36291780 2.9894 14.0045 handle_IRQ_event 7117968 43409748 2.7467 16.7512 tg3_poll 6585377 49995125 2.5412 19.2924 memcpy 5184695 55179820 2.0007 21.2931 ip_route_input 4346890 59526710 1.6774 22.9705 kfree 4214007 63740717 1.6261 24.5967 copy_user_generic_c 4093885 67834602 1.5798 26.1764 tcp_ack 4006753 71841355 1.5462 27.7226 tg3_interrupt_tagged 3778976 75620331 1.4583 29.1809 tcp_rcv_established 3756498 79376829 1.4496 30.6304 ip_queue_xmit 3418999 82795828 1.3193 31.9498 schedule 3274459 86070287 1.2636 33.2134 try_to_wake_up 3034809 89105096 1.1711 34.3844 tcp_sendmsg 2846436 91951532 1.0984 35.4828 kmem_cache_alloc 2745147 94696679 1.0593 36.5422 free_block 2679056 97375735 1.0338 37.5760 kmem_cache_free 2595289 99971024 1.0015 38.5775 fn_hash_lookup 2582072 102553096 0.9964 39.5738 __memset 2576462 105129558 0.9942 40.5681 tcp_transmit_skb 2528313 107657871 0.9756 41.5437 tcp_recvmsg 2392370 110050241 0.9232 42.4669 timer_interrupt 2365615 112415856 0.9129 43.3797 system_call 2358666 114774522 0.9102 44.2899 sockfd_lookup 2357192 117131714 0.9096 45.1995 tcp_poll 2340568 119472282 0.9032 46.1027 ip_rcv 2315805 121788087 0.8936 46.9964 tcp_match 2276212 124064299 0.8784 47.8747 sys_epoll_wait 2260913 126325212 0.8725 48.7472 __mod_timer 2173905 128499117 0.8389 49.5861 tg3_start_xmit 2057738 130556855 0.7941 50.3801 __switch_to 2022435 132579290 0.7804 51.1605 ep_poll_callback 2020449 134599739 0.7797 51.9402 sock_wfree 1913008 136512747 0.7382 52.6784 find_busiest_group 1891578 138404325 0.7299 53.4083 local_bh_enable 1860130 140264455 0.7178 54.1261 ip_local_deliver 1793639 142058094 0.6921 54.8183 __ip_route_output_key 1789287 143847381 0.6905 55.5087 alloc_skb 1770972 145618353 0.6834 56.1921 tcp_write_timer 1727286 147345639 0.6665 56.8587 __wake_up 1634111 148979750 0.6306 57.4893 skb_release_data 1625157 150604907 0.6271 58.1164 __kmalloc 1567198 152172105 0.6048 58.7211 tcp_v4_do_rcv 1562495 153734600 0.6029 59.3241 __kfree_skb From leonid.grossman@neterion.com Wed Jun 22 09:25:35 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 09:25:44 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MGPYH9003426 for ; Wed, 22 Jun 2005 09:25:35 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j5MGNqcx020766; Wed, 22 Jun 2005 12:23:52 -0400 (EDT) Received: from lgt40 ([10.16.16.68]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j5MGNkxS001340; Wed, 22 Jun 2005 12:23:46 -0400 (EDT) Message-Id: <200506221623.j5MGNkxS001340@guinness.s2io.com> From: "Leonid Grossman" To: "'Donald Becker'" , "'Andi Kleen'" Cc: "'Rick Jones'" , , Subject: RE: RFC: NAPI packet weighting patch Date: Wed, 22 Jun 2005 09:23:41 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 In-Reply-To: Thread-Index: AcV2v4rVSHmGR7WHSEe+5LC0pdyXbgAhJlQA X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Scanned-By: MIMEDefang 2.34 X-archive-position: 2507 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: leonid.grossman@neterion.com Precedence: bulk X-list: netdev Content-Length: 1595 Lines: 45 > > See the comment above. We decide if a packet is multicast vs. > unicast, IP vs. other at approximately > interrupt/"rx_copybreak" time. Very few NIC provide this > info in status bits, so we end up looking at the packet > header. That read moves the previously known-uncached data > (after all, it was just came in from a bus write) into the L1 > cache for the CPU handling the device. Once it's there, the > copy is almost free. What status bits a NIC has to provide, in order for the stack to avoid touching headers? In our case, the headers are separated by the hardware so ideally we would like to avoid any header processing altogether, and reduce the number of cache misses. > > [[ Background: Yes, the allocating the new skbuff is very > expensive. But we can either allocate a new, correctly-sized > skbuff to copy into, or allocate a new full-sized skbuff to > replace the one we will send to the Rx queue. ]] > > > > - cold memory lines from PCI writes > > > > I suspect in '96 chipsets also didn't do as aggressive > prefetching as > > they do today. > > Prefetching helps linear read bandwidth, but we shouldn't be > triggering it. And I claim that cache line prefetching only > restores the relative balance between L1/L2 caches, otherwise > the long L2 cache lines would be very expensive with > bump-read-bump-read with linear scans through memory. > > -- > Donald Becker becker@scyld.com > Scyld Software Scyld Beowulf > cluster systems > 914 Bay Ridge Road, Suite 220 www.scyld.com > Annapolis MD 21403 410-990-9993 > > > From hadi@cyberus.ca Wed Jun 22 09:38:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 09:38:43 -0700 (PDT) Received: from mx03.cybersurf.com (mx03.cybersurf.com [209.197.145.106]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MGcXH9004816 for ; Wed, 22 Jun 2005 09:38:36 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx03.cybersurf.com with esmtp (Exim 4.30) id 1Dl8Dz-0003Hs-8P for netdev@oss.sgi.com; Wed, 22 Jun 2005 12:37:15 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1Dl8Dv-00045o-7w; Wed, 22 Jun 2005 12:37:11 -0400 Subject: RE: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: Leonid Grossman Cc: "'Donald Becker'" , "'Andi Kleen'" , "'Rick Jones'" , netdev@oss.sgi.com, davem@redhat.com In-Reply-To: <200506221623.j5MGNkxS001340@guinness.s2io.com> References: <200506221623.j5MGNkxS001340@guinness.s2io.com> Content-Type: text/plain Organization: unknown Date: Wed, 22 Jun 2005 12:37:06 -0400 Message-Id: <1119458226.6918.142.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2508 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 1423 Lines: 31 On Wed, 2005-22-06 at 09:23 -0700, Leonid Grossman wrote: > > > > See the comment above. We decide if a packet is multicast vs. > > unicast, IP vs. other at approximately > > interrupt/"rx_copybreak" time. Very few NIC provide this > > info in status bits, so we end up looking at the packet > > header. That read moves the previously known-uncached data > > (after all, it was just came in from a bus write) into the L1 > > cache for the CPU handling the device. Once it's there, the > > copy is almost free. > > What status bits a NIC has to provide, in order for the stack to avoid > touching headers? > In our case, the headers are separated by the hardware so ideally we would > like to avoid any header processing altogether, > and reduce the number of cache misses. > Provide metadata that can be used to totaly replace eth_type_trans() i.e answer the questions: is it multi/uni/broadcast, is the packet for us (you would need to be programmed with what for us means), Is it IP, ARP etc. I am sure any standard NIC these days can do a subset of these You want to go one step further then allow the user to download a number of filters and tell you what tag you should put on the descriptor when sending the packet to user space on a match or mismatch. If say you allowed 1024 such filters (not very different from the current multicast filters), you could cut down a lot of CPU time. cheers, jamal From shemminger@osdl.org Wed Jun 22 09:46:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 09:46:19 -0700 (PDT) Received: from smtp.osdl.org (smtp.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MGkDH9005671 for ; Wed, 22 Jun 2005 09:46:14 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j5MGihjA001995 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 22 Jun 2005 09:44:43 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j5MGigis018674; Wed, 22 Jun 2005 09:44:43 -0700 Date: Wed, 22 Jun 2005 09:44:42 -0700 From: Stephen Hemminger To: Michael Vittrup Larsen Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] tcp: efficient port randomistion (rev 3) Message-ID: <20050622094442.65a0e1bc@dxpl.pdx.osdl.net> In-Reply-To: <200506221117.04334.michael.vittrup.larsen@ericsson.com> References: <20041220153916.6c00c114.davem@davemloft.net> <200506221117.04334.michael.vittrup.larsen@ericsson.com> Organization: Open Source Development Lab X-Mailer: Sylpheed-Claws 1.0.4 (GTK+ 1.2.10; x86_64-unknown-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.111 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 2509 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 1047 Lines: 27 On Wed, 22 Jun 2005 11:17:03 +0200 Michael Vittrup Larsen wrote: > On Tuesday 21 December 2004 00:39, David S. Miller wrote: > > On Fri, 10 Dec 2004 17:09:00 -0800 > > > > Stephen Hemminger wrote: > > > okay, here is the revised version. Testing shows that it > > > is more consistent, and just as fast as existing code, > > > probably because of the getting rid of portalloc_lock and > > > better distribution. > > > > > > Signed-off-by: Stephen Hemminger > > > > Queued up for 2.6.11, thanks Stephen. > > What's the status of this - I see it is not part of 2.6.12? > > Is there a general dislike of the port randomisation mechanism or? There is port randomization in 2.6.11 and 2.6.12, look for secure_tcp_port_ephemeral in the source. 2.6.12 also does random port allocation for IPV6. We still do the non-random stuff for explicit binds (tcp_v4_get_port), but there is no state to seed in that case and it only impacts app's that do an explicit bind to 0. From oxymoron@waste.org Wed Jun 22 10:02:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 10:02:55 -0700 (PDT) Received: from waste.org (waste.org [216.27.176.166]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MH2qH9007013 for ; Wed, 22 Jun 2005 10:02:53 -0700 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.4/8.13.4/Debian-3) with ESMTP id j5MH1Sfi012832 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 22 Jun 2005 12:01:28 -0500 Received: (from oxymoron@localhost) by waste.org (8.13.4/8.13.4/Submit) id j5MH1SdL012829; Wed, 22 Jun 2005 12:01:28 -0500 Date: Wed, 22 Jun 2005 10:01:28 -0700 From: Matt Mackall To: Jeff Moyer Cc: netdev@oss.sgi.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [patch,rfc] allow registration of multiple netpolls per interface Message-ID: <20050622170128.GV27572@waste.org> References: <17080.35214.507402.998984@segfault.boston.redhat.com> <20050621225252.GY27572@waste.org> <17081.20441.714191.528270@segfault.boston.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17081.20441.714191.528270@segfault.boston.redhat.com> User-Agent: Mutt/1.5.9i X-archive-position: 2510 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: netdev Content-Length: 3463 Lines: 88 On Wed, Jun 22, 2005 at 07:47:37AM -0400, Jeff Moyer wrote: > mpm> Hmm. It's conceivable we'll want netdump and kgdb on the same > mpm> interface so we'll have to address this eventually.. > > Well, do you want to address it eventually, or now? As I said, it's never > bitten anyone before. Later's fine. I just don't want to design it out by accident again. > >> struct netpoll_info { > >> spinlock_t poll_lock; > >> int poll_owner; > >> int rx_flags; > >> spinlock_t rx_lock; > >> struct netpoll *rx_np; /* netpoll that registered an rx_hook */ > >> }; > >> > >> This is the structure which gets pointed to by the net_device. All of the > >> flags and locks which are specific to the INTERFACE go here. Any variables > >> which must be kept per struct netpoll were left in the struct netpoll. So > >> now, we have a cleaner separation of data and its scope. > >> > >> Since we never really supported having more than one struct netpoll > >> register an rx_hook, I got rid of the rx_list. This is replaced by a > >> single pointer in the netpoll_info structure (np_rx). We still need to > >> protect addition or removal of the rx_np pointer, and so keep the lock > >> (rx_lock). There is one lock per struct net_device, and I am certain that > >> it will be 0 contention, as rx_np will only be changed during an insmod or > >> rmmod. If people think this would be a good rcu candidate, let me know and > >> I'll change it to use that locking scheme. > > mpm> It might be simpler to have a single lock here..? > > Maybe. You can't really have netpoll code running on multiple cpus at the > same time, right? This is the rx path, remember, so the other cpu should > be spinning on the poll_lock. > > Keeping separate locks would allow you to unregister a struct netpoll > associated with another net device without causing lock contention. This > is a very minor win, obviously. > > I still feel like this npinfo struct is the right place for this, though. > If you're strongly opposed to that, I'll change it. No, certainly having it in npinfo makes sense. I just was wondering if we really need two locks in there. > >> + spin_lock_irqsave(&npinfo->rx_lock, flags); > >> + if (npinfo->rx_np->dev == skb->dev) > >> + np = npinfo->rx_np; > >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags); > > mpm> And I think that means we don't need the lock here either. > > Sure we do. We need to protect against rmmod's. How can we have an rmmmod when we're trapped? > >> static inline int netpoll_rx(struct sk_buff *skb) > >> { > >> - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb); > >> + struct netpoll_info *npinfo = skb->dev->npinfo; > >> + unsigned long flags; > >> + int ret = 0; > >> + > >> + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags)) > >> + return 0; > >> + > >> + spin_lock_irqsave(&npinfo->rx_lock, flags); > >> + /* check rx_flags again with the lock held */ > >> + if (npinfo->rx_flags && __netpoll_rx(skb)) > >> + ret = 1; > >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags); > >> + > >> + return ret; > >> } > > mpm> This is perhaps a problem due to cache line bouncing. Perhaps we can > mpm> use an atomic op and a memory barrier instead? > > It really should be a 0 contention lock. Let's not optimize something that > doesn't need it. If we find that it causes problems, I'll be more than > happy to fix it. Ok, fair enough. -- Mathematics is the supreme nostalgia of our time. From ak@suse.de Wed Jun 22 10:07:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 10:07:19 -0700 (PDT) Received: from mx1.suse.de (cantor.suse.de [195.135.220.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MH7GH9007738 for ; Wed, 22 Jun 2005 10:07:17 -0700 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.suse.de (Postfix) with ESMTP id 95475EF8B; Wed, 22 Jun 2005 19:05:53 +0200 (CEST) Date: Wed, 22 Jun 2005 19:05:49 +0200 From: Andi Kleen To: Leonid Grossman Cc: "'Donald Becker'" , "'Andi Kleen'" , "'Rick Jones'" , netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050622170549.GV14251@wotan.suse.de> References: <200506221623.j5MGNkxS001340@guinness.s2io.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200506221623.j5MGNkxS001340@guinness.s2io.com> X-archive-position: 2511 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 1209 Lines: 29 On Wed, Jun 22, 2005 at 09:23:41AM -0700, Leonid Grossman wrote: > > > > > See the comment above. We decide if a packet is multicast vs. > > unicast, IP vs. other at approximately > > interrupt/"rx_copybreak" time. Very few NIC provide this > > info in status bits, so we end up looking at the packet > > header. That read moves the previously known-uncached data > > (after all, it was just came in from a bus write) into the L1 > > cache for the CPU handling the device. Once it's there, the > > copy is almost free. > > What status bits a NIC has to provide, in order for the stack to avoid > touching headers? To avoid it completely is pretty hard - you would need to supply nearly everything in the header. But when you supply L2 protocol/ and unicast/broadcast/multicast information and if the packet is destined to the localhost or not then the headers can be gotten with a prefetch early and then when the header is later processed then it might be with some luck already in cache. BTW quite a few modern NICs provide this information actually contrary to what Donald stated (sometimes with restrictions like only works without multicast), but it hasn't been widely used yet. -Andi From leonid.grossman@neterion.com Wed Jun 22 11:02:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 11:03:07 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MI2wH9010838 for ; Wed, 22 Jun 2005 11:02:58 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j5MI13cx021227; Wed, 22 Jun 2005 14:01:03 -0400 (EDT) Received: from lgt40 ([10.16.16.68]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j5MI11xS021866; Wed, 22 Jun 2005 14:01:01 -0400 (EDT) Message-Id: <200506221801.j5MI11xS021866@guinness.s2io.com> From: "Leonid Grossman" To: Cc: "'Donald Becker'" , "'Andi Kleen'" , "'Rick Jones'" , , Subject: RE: RFC: NAPI packet weighting patch Date: Wed, 22 Jun 2005 11:00:56 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 In-Reply-To: <1119458226.6918.142.camel@localhost.localdomain> Thread-Index: AcV3SKUHT+0Rqw1STk+WLDKudlg7UAAClk6Q X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Scanned-By: MIMEDefang 2.34 X-archive-position: 2512 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: leonid.grossman@neterion.com Precedence: bulk X-list: netdev Content-Length: 2318 Lines: 65 > -----Original Message----- > From: jamal [mailto:hadi@cyberus.ca] > Sent: Wednesday, June 22, 2005 9:37 AM > To: Leonid Grossman > Cc: 'Donald Becker'; 'Andi Kleen'; 'Rick Jones'; > netdev@oss.sgi.com; davem@redhat.com > Subject: RE: RFC: NAPI packet weighting patch > > On Wed, 2005-22-06 at 09:23 -0700, Leonid Grossman wrote: > > > > > > See the comment above. We decide if a packet is multicast vs. > > > unicast, IP vs. other at approximately interrupt/"rx_copybreak" > > > time. Very few NIC provide this info in status bits, so > we end up > > > looking at the packet header. That read moves the previously > > > known-uncached data (after all, it was just came in from a bus > > > write) into the L1 cache for the CPU handling the device. > Once it's > > > there, the copy is almost free. > > > > What status bits a NIC has to provide, in order for the > stack to avoid > > touching headers? > > In our case, the headers are separated by the hardware so > ideally we > > would like to avoid any header processing altogether, and > reduce the > > number of cache misses. > > > > Provide metadata that can be used to totaly replace > eth_type_trans() i.e answer the questions: is it > multi/uni/broadcast, is the packet for us (you would need to > be programmed with what for us means), Is it IP, ARP etc. I > am sure any standard NIC these days can do a subset of these > You want to go one step further then allow the user to > download a number of filters and tell you what tag you should > put on the descriptor when sending the packet to user space > on a match or mismatch. > If say you allowed 1024 such filters (not very different from > the current multicast filters), you could cut down a lot of CPU time. Well, this is all supported in the hardware. The number of filters is only 256 (not 1024) for direct match, but it is unlimited for a hash match. Of course, the upper layer still needs to be able to take advantage of the filters... Outside of the filters capability, from the (granted, pretty limited) testing we see some noticeable improvement from providing status bits but it is not as big as I would expect, It looks like the headers are still being touched somewhere... We will look at this some more. Thanks, Leonid > > cheers, > jamal > > From ak@suse.de Wed Jun 22 11:08:21 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 11:08:23 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MI8KH9011669 for ; Wed, 22 Jun 2005 11:08:21 -0700 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 76CE31D5CA; Wed, 22 Jun 2005 20:06:59 +0200 (CEST) Date: Wed, 22 Jun 2005 20:06:55 +0200 From: Andi Kleen To: Leonid Grossman Cc: hadi@cyberus.ca, "'Donald Becker'" , "'Andi Kleen'" , "'Rick Jones'" , netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050622180654.GX14251@wotan.suse.de> References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200506221801.j5MI11xS021866@guinness.s2io.com> X-archive-position: 2513 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 723 Lines: 17 > Outside of the filters capability, from the (granted, pretty limited) > testing we see some noticeable improvement from providing status bits but it > is not as big as I would expect, > It looks like the headers are still being touched somewhere... We will look > at this some more. The headers are read of course in the main stack. No way around that. It basically helps you only when you can space the prefetch for the header out long enough that the data is in cache when you need it. However it is tricky because CPUs have only a limited load queue entries and doing too many prefetches will just overflow that. This can be done by batching L2 packet processing, but doing so is not good for your latency. -Andi From hadi@cyberus.ca Wed Jun 22 12:39:17 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 12:39:32 -0700 (PDT) Received: from mx04.cybersurf.com (mx04.cybersurf.com [209.197.145.108]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MJdEH9021624 for ; Wed, 22 Jun 2005 12:39:17 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx04.cybersurf.com with esmtp (Exim 4.30) id 1DlB2r-00013j-7N for netdev@oss.sgi.com; Wed, 22 Jun 2005 15:37:57 -0400 Received: from [216.209.86.2] (helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DlB2m-0003Dq-CU; Wed, 22 Jun 2005 15:37:52 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: P@draigBrady.com Cc: "David S. Miller" , gandalf@wlug.westbo.se, shemminger@osdl.org, mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com In-Reply-To: <42B92490.40005@draigBrady.com> References: <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> <20050621.133704.08321534.davem@davemloft.net> <42B92490.40005@draigBrady.com> Content-Type: text/plain Organization: unknown Date: Wed, 22 Jun 2005 15:37:46 -0400 Message-Id: <1119469066.6918.168.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2514 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 724 Lines: 20 On Wed, 2005-22-06 at 09:42 +0100, P@draigBrady.com wrote: > > Yes the copy is essentially free here as the data is already cached. > > As a data point, I went the whole hog and used buffer recycling > in my essentially packet sniffing application. I.E. there are no > allocs per packet at all, and this make a HUGE difference. On a > 2x3.4GHz 2xe1000 system I can receive 620Kpps per port sustained > into my userspace app which does a LOT of processing per packet. > Without the buffer recycling is was around 250Kpps. > Note I don't reuse an skb until the packet is copied into a > PACKET_MMAP buffer. Was this machine SMP? NAPI involved? I take it nothing interfering in the middle with the headers? cheers, jamal From mchan@broadcom.com Wed Jun 22 13:02:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 13:02:36 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MK2YH9023367 for ; Wed, 22 Jun 2005 13:02:34 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Wed, 22 Jun 2005 13:01:03 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Wed, 22 Jun 2005 13:00:55 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BFN02253; Wed, 22 Jun 2005 13:00:54 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id NAA07955; Wed, 22 Jun 2005 13:00:54 -0700 (PDT) Received: from 10.7.18.153 ([10.7.18.153]) by NT-IRVA-0741.brcm.ad.broadcom.com ([10.8.194.54]) with Microsoft Exchange Server HTTP-DAV ; Wed, 22 Jun 2005 20:00:53 +0000 Received: from rh4 by nt-irva-0741; 22 Jun 2005 12:03:32 -0700 Subject: Re: [TG3]: About hw coalescing infrastructure. From: "Michael Chan" To: "Eric Dumazet" cc: "David S. Miller" , netdev@oss.sgi.com In-Reply-To: <42B982D4.9040704@cosmosbay.com> References: <20050511.141530.57445142.davem@davemloft.net> <42B982D4.9040704@cosmosbay.com> Date: Wed, 22 Jun 2005 12:03:32 -0700 Message-ID: <1119467012.5325.15.camel@rh4> MIME-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) X-WSS-ID: 6EA71CF52782643824-01-01 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-archive-position: 2515 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 882 Lines: 22 On Wed, 2005-06-22 at 17:25 +0200, Eric Dumazet wrote: > Is there anything I can try to tune the coalescing ? > Being able to handle 100 packets each interrupt instead of one or two would certainly help. > I dont mind about latency. But of course I would like not to drop packets :) > But maybe the BCM5702 is not able to delay an interrupt ? > On the 5702 that supports CLRTCKS mode, you need to play around with the following parameters in tg3.h. To reduce interrupts, you generally have to increase the values. #define LOW_RXCOL_TICKS_CLRTCKS 0x00000014 #define LOW_TXCOL_TICKS_CLRTCKS 0x00000048 #define LOW_RXMAX_FRAMES 0x00000005 #define LOW_TXMAX_FRAMES 0x00000035 #define DEFAULT_RXCOAL_TICK_INT_CLRTCKS 0x00000014 #define DEFAULT_TXCOAL_TICK_INT_CLRTCKS 0x00000014 #define DEFAULT_RXCOAL_MAXF_INT 0x00000005 #define DEFAULT_TXCOAL_MAXF_INT 0x00000005 From davem@davemloft.net Wed Jun 22 13:24:27 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 13:24:34 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MKORH9003467 for ; Wed, 22 Jun 2005 13:24:27 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DlBk9-0003i1-F3; Wed, 22 Jun 2005 13:22:41 -0700 Date: Wed, 22 Jun 2005 13:22:41 -0700 (PDT) Message-Id: <20050622.132241.21929037.davem@davemloft.net> To: ak@suse.de Cc: leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <20050622180654.GX14251@wotan.suse.de> References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> <20050622180654.GX14251@wotan.suse.de> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2516 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 2291 Lines: 77 From: Andi Kleen Date: Wed, 22 Jun 2005 20:06:55 +0200 > However it is tricky because CPUs have only a limited load queue > entries and doing too many prefetches will just overflow that. Several processors can queue about 8 prefetch requests, and these slots are independant of those consumed by a load. Yes, if you queue too many prefetches, the queue overflows. I think the optimal scheme would be: 1) eth_type_trans() info in RX descriptor 2) prefetch(skb->data) done as early as possible in driver RX handling Actually, I believe to most optimal scheme is: foo_driver_rx() { for_each_rx_descriptor() { ... skb = driver_priv->rx_skbs[index]; prefetch(skb->data); skb = realloc_or_recycle_rx_descriptor(skb, index); if (skb == NULL) goto next_rxd; skb->prot = eth_type_trans(skb, driver_priv->dev); netif_receive_skb(skb); ... next_rxd: ... } } The idea is that first the prefetch goes into flight, then you do the recycle or reallocation of the RX descriptor SKB, then you try to touch the data. This makes it very likely the prefetch will be in the cpu in time. Everyone seems to have this absolute fetish about batching the RX descriptor refilling work. It's wrong, it should be done when you pull a receive packet off the ring, for many reasons. Off the top of my head: 1) Descriptors are refilled as soon as possible, decreasing the chance of the device hitting the end of the RX ring and thus unable to receive a packet. 2) As shown above, it gives you compute time which can be used to schedule the prefetch. This nearly makes RX replenishment free. Instead of having the CPU spin on a cache miss when we run eth_type_trans() during those cycles, we do useful work. I'm going to play around with these ideas in the tg3 driver. Obvious patch below. --- 1/drivers/net/tg3.c.~1~ 2005-06-22 12:33:07.000000000 -0700 +++ 2/drivers/net/tg3.c 2005-06-22 13:19:13.000000000 -0700 @@ -2772,6 +2772,13 @@ goto next_pkt_nopost; } + /* Prefetch now. The recycle/realloc of the RX + * entry is moderately expensive, so by the time + * that is complete the data should have reached + * the cpu. + */ + prefetch(skb->data); + work_mask |= opaque_key; if ((desc->err_vlan & RXD_ERR_MASK) != 0 && From rick.jones2@hp.com Wed Jun 22 13:37:14 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 13:37:17 -0700 (PDT) Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MKbCH9004692 for ; Wed, 22 Jun 2005 13:37:14 -0700 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.244.44.58]) by palrel13.hp.com (Postfix) with ESMTP id 84B311C00117; Wed, 22 Jun 2005 13:35:47 -0700 (PDT) Received: from hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_28810)/8.9.3 SMKit7.02) with ESMTP id NAA13841; Wed, 22 Jun 2005 13:35:46 -0700 (PDT) Message-ID: <42B9CBA2.5050208@hp.com> Date: Wed, 22 Jun 2005 13:35:46 -0700 From: Rick Jones User-Agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Cc: hadi@cyberus.ca, becker@scyld.com Subject: Re: RFC: NAPI packet weighting patch References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> In-Reply-To: <20050622.132241.21929037.davem@davemloft.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2517 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rick.jones2@hp.com Precedence: bulk X-list: netdev Content-Length: 447 Lines: 13 > Everyone seems to have this absolute fetish about batching the RX > descriptor refilling work. It's wrong, it should be done when you > pull a receive packet off the ring, for many reasons. Off the top of > my head: > > 1) Descriptors are refilled as soon as possible, decreasing > the chance of the device hitting the end of the RX ring > and thus unable to receive a packet. IFF one pokes the NIC for each buffer right? rick jones From davem@davemloft.net Wed Jun 22 13:45:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 13:45:22 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MKjIH9005588 for ; Wed, 22 Jun 2005 13:45:18 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DlC4a-0005kX-C7; Wed, 22 Jun 2005 13:43:48 -0700 Date: Wed, 22 Jun 2005 13:43:48 -0700 (PDT) Message-Id: <20050622.134348.23010554.davem@davemloft.net> To: rick.jones2@hp.com Cc: netdev@oss.sgi.com, hadi@cyberus.ca, becker@scyld.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <42B9CBA2.5050208@hp.com> References: <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <42B9CBA2.5050208@hp.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2518 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 570 Lines: 16 From: Rick Jones Date: Wed, 22 Jun 2005 13:35:46 -0700 > > Everyone seems to have this absolute fetish about batching the RX > > descriptor refilling work. It's wrong, it should be done when you > > pull a receive packet off the ring, for many reasons. Off the top of > > my head: > > > > 1) Descriptors are refilled as soon as possible, decreasing > > the chance of the device hitting the end of the RX ring > > and thus unable to receive a packet. > > IFF one pokes the NIC for each buffer right? Or "every 5" or something like that. From jmoyer@redhat.com Wed Jun 22 14:06:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 14:06:42 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5ML6bH9007911 for ; Wed, 22 Jun 2005 14:06:37 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5ML5GJ5027103; Wed, 22 Jun 2005 17:05:16 -0400 Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5ML5Fu29063; Wed, 22 Jun 2005 17:05:15 -0400 Received: from segfault.boston.redhat.com (segfault.boston.redhat.com [172.16.80.57]) by mail.boston.redhat.com (8.12.8/8.12.8) with ESMTP id j5ML5Fc0013094; Wed, 22 Jun 2005 17:05:15 -0400 Received: from segfault.boston.redhat.com (localhost.localdomain [127.0.0.1]) by segfault.boston.redhat.com (8.13.1/8.13.1) with ESMTP id j5ML5FtW031740; Wed, 22 Jun 2005 17:05:15 -0400 Received: (from jmoyer@localhost) by segfault.boston.redhat.com (8.13.1/8.13.1/Submit) id j5ML5F5x031737; Wed, 22 Jun 2005 17:05:15 -0400 From: Jeff Moyer MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17081.53899.201190.106025@segfault.boston.redhat.com> Date: Wed, 22 Jun 2005 17:05:15 -0400 To: Matt Mackall Cc: netdev@oss.sgi.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [patch,rfc] allow registration of multiple netpolls per interface In-Reply-To: <20050622170128.GV27572@waste.org> References: <17080.35214.507402.998984@segfault.boston.redhat.com> <20050621225252.GY27572@waste.org> <17081.20441.714191.528270@segfault.boston.redhat.com> <20050622170128.GV27572@waste.org> X-Mailer: VM 7.17 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid Reply-To: jmoyer@redhat.com X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? X-archive-position: 2519 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jmoyer@redhat.com Precedence: bulk X-list: netdev Content-Length: 11838 Lines: 405 ==> Regarding Re: [patch,rfc] allow registration of multiple netpolls per interface; Matt Mackall adds: mpm> On Wed, Jun 22, 2005 at 07:47:37AM -0400, Jeff Moyer wrote: mpm> Hmm. It's conceivable we'll want netdump and kgdb on the same mpm> interface so we'll have to address this eventually.. >> >> Well, do you want to address it eventually, or now? As I said, it's never >> bitten anyone before. mpm> Later's fine. I just don't want to design it out by accident again. OK. >> >> struct netpoll_info { >> >> spinlock_t poll_lock; >> >> int poll_owner; >> >> int rx_flags; >> >> spinlock_t rx_lock; >> >> struct netpoll *rx_np; /* netpoll that registered an rx_hook */ >> >> }; [snip] mpm> It might be simpler to have a single lock here..? >> >> Maybe. You can't really have netpoll code running on multiple cpus at the >> same time, right? This is the rx path, remember, so the other cpu should >> be spinning on the poll_lock. >> >> Keeping separate locks would allow you to unregister a struct netpoll >> associated with another net device without causing lock contention. This >> is a very minor win, obviously. >> >> I still feel like this npinfo struct is the right place for this, though. >> If you're strongly opposed to that, I'll change it. mpm> No, certainly having it in npinfo makes sense. I just was wondering if mpm> we really need two locks in there. Oh, I misunderstood. Well, one protects recursing into the driver's poll routine, the other protects access to the np_rx pointer, which may later become a list. I don't think we can lump these two together, do you? >> >> + spin_lock_irqsave(&npinfo->rx_lock, flags); >> >> + if (npinfo->rx_np->dev == skb->dev) >> >> + np = npinfo->rx_np; >> >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags); >> mpm> And I think that means we don't need the lock here either. >> >> Sure we do. We need to protect against rmmod's. mpm> How can we have an rmmmod when we're trapped? Looking over the code, I don't see what would prevent this. Could you point me the code which prevents this? It's a good thing we're discussing it, since I found that I didn't take the lock in netpoll_cleanup. Okay, so here's the full patch again, with the changes we've discussed. I've also included an interdiff. As you can see, the first version I sent didn't have some basic compile fixes, sorry about that. Anyway, I have booted and tested this version with multiple netpoll clients. Barring any negative feedback, I'll break this up and send it as a patch series. Thanks, Jeff (Interdiff first) diff -u linux-2.6.12-rc6/net/core/netpoll.c linux-2.6.12-rc6/net/core/netpoll.c --- linux-2.6.12-rc6/net/core/netpoll.c 2005-06-21 16:03:22.409620400 -0400 +++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-22 16:51:24.336062231 -0400 @@ -130,8 +130,8 @@ */ static void poll_napi(struct netpoll *np) { - int budget = 16; struct netpoll_info *npinfo = np->dev->npinfo; + int budget = 16; if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) && npinfo->poll_owner != smp_processor_id() && @@ -344,22 +344,22 @@ static void arp_reply(struct sk_buff *skb) { + struct netpoll_info *npinfo = skb->dev->npinfo; struct arphdr *arp; unsigned char *arp_ptr; int size, type = ARPOP_REPLY, ptype = ETH_P_ARP; u32 sip, tip; + unsigned long flags; struct sk_buff *send_skb; - struct netpoll *np; - struct netpoll_info *npinfo = skb->dev->npinfo; - - if (!npinfo) return; + struct netpoll *np = NULL; spin_lock_irqsave(&npinfo->rx_lock, flags); - if (npinfo->rx_np->dev == skb->dev) + if (npinfo->rx_np && npinfo->rx_np->dev == skb->dev) np = npinfo->rx_np; spin_unlock_irqrestore(&npinfo->rx_lock, flags); - if (!np) return; + if (!np) + return; /* No arp on this interface */ if (skb->dev->flags & IFF_NOARP) @@ -716,7 +716,7 @@ spin_lock_irqsave(&npinfo->rx_lock, flags); npinfo->rx_flags |= NETPOLL_RX_ENABLED; npinfo->rx_np = np; - spin_unlock_irqsave(&npinfo->rx_lock, flags); + spin_unlock_irqrestore(&npinfo->rx_lock, flags); } /* last thing to do is link it to the net device structure */ ndev->npinfo = npinfo; @@ -734,12 +734,15 @@ void netpoll_cleanup(struct netpoll *np) { struct netpoll_info *npinfo; + unsigned long flags; if (np->dev) { npinfo = np->dev->npinfo; if (npinfo && npinfo->rx_np == np) { + spin_lock_irqsave(&npinfo->rx_lock, flags); npinfo->rx_np = NULL; npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); } dev_put(np->dev); } reverted: --- linux-2.6.12-rc6/net/core/dev.c 2005-06-21 13:53:51.583407710 -0400 +++ linux-2.6.12-rc6/net/core/dev.c.orig 2005-06-20 19:51:59.000000000 -0400 @@ -1656,7 +1656,6 @@ unsigned short type; /* if we've gotten here through NAPI, check netpoll */ - /* how else can we get here? --phro */ if (skb->dev->poll && netpoll_rx(skb)) return NET_RX_DROP; And now, the full diff: --- linux-2.6.12-rc6/net/core/netpoll.c.orig 2005-06-20 19:51:56.000000000 -0400 +++ linux-2.6.12-rc6/net/core/netpoll.c 2005-06-22 16:51:24.336062231 -0400 @@ -130,19 +130,20 @@ static int checksum_udp(struct sk_buff * */ static void poll_napi(struct netpoll *np) { + struct netpoll_info *npinfo = np->dev->npinfo; int budget = 16; if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) && - np->poll_owner != smp_processor_id() && - spin_trylock(&np->poll_lock)) { - np->rx_flags |= NETPOLL_RX_DROP; + npinfo->poll_owner != smp_processor_id() && + spin_trylock(&npinfo->poll_lock)) { + npinfo->rx_flags |= NETPOLL_RX_DROP; atomic_inc(&trapped); np->dev->poll(np->dev, &budget); atomic_dec(&trapped); - np->rx_flags &= ~NETPOLL_RX_DROP; - spin_unlock(&np->poll_lock); + npinfo->rx_flags &= ~NETPOLL_RX_DROP; + spin_unlock(&npinfo->poll_lock); } } @@ -245,6 +246,7 @@ repeat: static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) { int status; + struct netpoll_info *npinfo; repeat: if(!np || !np->dev || !netif_running(np->dev)) { @@ -253,7 +255,8 @@ repeat: } /* avoid recursion */ - if(np->poll_owner == smp_processor_id() || + npinfo = np->dev->npinfo; + if(npinfo->poll_owner == smp_processor_id() || np->dev->xmit_lock_owner == smp_processor_id()) { if (np->drop) np->drop(skb); @@ -341,14 +344,22 @@ void netpoll_send_udp(struct netpoll *np static void arp_reply(struct sk_buff *skb) { + struct netpoll_info *npinfo = skb->dev->npinfo; struct arphdr *arp; unsigned char *arp_ptr; int size, type = ARPOP_REPLY, ptype = ETH_P_ARP; u32 sip, tip; + unsigned long flags; struct sk_buff *send_skb; - struct netpoll *np = skb->dev->np; + struct netpoll *np = NULL; + + spin_lock_irqsave(&npinfo->rx_lock, flags); + if (npinfo->rx_np && npinfo->rx_np->dev == skb->dev) + np = npinfo->rx_np; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); - if (!np) return; + if (!np) + return; /* No arp on this interface */ if (skb->dev->flags & IFF_NOARP) @@ -429,9 +440,9 @@ int __netpoll_rx(struct sk_buff *skb) int proto, len, ulen; struct iphdr *iph; struct udphdr *uh; - struct netpoll *np = skb->dev->np; + struct netpoll *np = skb->dev->npinfo->rx_np; - if (!np->rx_hook) + if (!np) goto out; if (skb->dev->type != ARPHRD_ETHER) goto out; @@ -611,9 +622,8 @@ int netpoll_setup(struct netpoll *np) { struct net_device *ndev = NULL; struct in_device *in_dev; - - np->poll_lock = SPIN_LOCK_UNLOCKED; - np->poll_owner = -1; + struct netpoll_info *npinfo; + unsigned long flags; if (np->dev_name) ndev = dev_get_by_name(np->dev_name); @@ -624,7 +634,17 @@ int netpoll_setup(struct netpoll *np) } np->dev = ndev; - ndev->np = np; + if (!ndev->npinfo) { + npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL); + if (!npinfo) + goto release; + + npinfo->rx_np = NULL; + npinfo->poll_lock = SPIN_LOCK_UNLOCKED; + npinfo->poll_owner = -1; + npinfo->rx_lock = SPIN_LOCK_UNLOCKED; + } else + npinfo = ndev->npinfo; if (!ndev->poll_controller) { printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n", @@ -692,13 +712,20 @@ int netpoll_setup(struct netpoll *np) np->name, HIPQUAD(np->local_ip)); } - if(np->rx_hook) - np->rx_flags = NETPOLL_RX_ENABLED; + if(np->rx_hook) { + spin_lock_irqsave(&npinfo->rx_lock, flags); + npinfo->rx_flags |= NETPOLL_RX_ENABLED; + npinfo->rx_np = np; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); + } + /* last thing to do is link it to the net device structure */ + ndev->npinfo = npinfo; return 0; release: - ndev->np = NULL; + if (!ndev->npinfo) + kfree(npinfo); np->dev = NULL; dev_put(ndev); return -1; @@ -706,9 +733,20 @@ int netpoll_setup(struct netpoll *np) void netpoll_cleanup(struct netpoll *np) { - if (np->dev) - np->dev->np = NULL; - dev_put(np->dev); + struct netpoll_info *npinfo; + unsigned long flags; + + if (np->dev) { + npinfo = np->dev->npinfo; + if (npinfo && npinfo->rx_np == np) { + spin_lock_irqsave(&npinfo->rx_lock, flags); + npinfo->rx_np = NULL; + npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); + } + dev_put(np->dev); + } + np->dev = NULL; } --- linux-2.6.12-rc6/include/linux/netpoll.h.orig 2005-06-20 19:51:47.000000000 -0400 +++ linux-2.6.12-rc6/include/linux/netpoll.h 2005-06-21 15:29:48.000000000 -0400 @@ -16,14 +16,19 @@ struct netpoll; struct netpoll { struct net_device *dev; char dev_name[16], *name; - int rx_flags; void (*rx_hook)(struct netpoll *, int, char *, int); void (*drop)(struct sk_buff *skb); u32 local_ip, remote_ip; u16 local_port, remote_port; unsigned char local_mac[6], remote_mac[6]; +}; + +struct netpoll_info { spinlock_t poll_lock; int poll_owner; + int rx_flags; + spinlock_t rx_lock; + struct netpoll *rx_np; /* netpoll that registered an rx_hook */ }; void netpoll_poll(struct netpoll *np); @@ -39,22 +44,35 @@ void netpoll_queue(struct sk_buff *skb); #ifdef CONFIG_NETPOLL static inline int netpoll_rx(struct sk_buff *skb) { - return skb->dev->np && skb->dev->np->rx_flags && __netpoll_rx(skb); + struct netpoll_info *npinfo = skb->dev->npinfo; + unsigned long flags; + int ret = 0; + + if (!npinfo || (!npinfo->rx_np && !npinfo->rx_flags)) + return 0; + + spin_lock_irqsave(&npinfo->rx_lock, flags); + /* check rx_flags again with the lock held */ + if (npinfo->rx_flags && __netpoll_rx(skb)) + ret = 1; + spin_unlock_irqrestore(&npinfo->rx_lock, flags); + + return ret; } static inline void netpoll_poll_lock(struct net_device *dev) { - if (dev->np) { - spin_lock(&dev->np->poll_lock); - dev->np->poll_owner = smp_processor_id(); + if (dev->npinfo) { + spin_lock(&dev->npinfo->poll_lock); + dev->npinfo->poll_owner = smp_processor_id(); } } static inline void netpoll_poll_unlock(struct net_device *dev) { - if (dev->np) { - spin_unlock(&dev->np->poll_lock); - dev->np->poll_owner = -1; + if (dev->npinfo) { + dev->npinfo->poll_owner = -1; + spin_unlock(&dev->npinfo->poll_lock); } } --- linux-2.6.12-rc6/include/linux/netdevice.h.orig 2005-06-20 20:26:21.000000000 -0400 +++ linux-2.6.12-rc6/include/linux/netdevice.h 2005-06-21 14:46:52.000000000 -0400 @@ -41,7 +41,7 @@ struct divert_blk; struct vlan_group; struct ethtool_ops; -struct netpoll; +struct netpoll_info; /* source back-compat hooks */ #define SET_ETHTOOL_OPS(netdev,ops) \ ( (netdev)->ethtool_ops = (ops) ) @@ -468,7 +468,7 @@ struct net_device unsigned char *haddr); int (*neigh_setup)(struct net_device *dev, struct neigh_parms *); #ifdef CONFIG_NETPOLL - struct netpoll *np; + struct netpoll_info *npinfo; #endif #ifdef CONFIG_NET_POLL_CONTROLLER void (*poll_controller)(struct net_device *dev); From ak@suse.de Wed Jun 22 14:12:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 14:12:43 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MLCdH9008711 for ; Wed, 22 Jun 2005 14:12:40 -0700 Received: from Relay1.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id E9A211B18F; Wed, 22 Jun 2005 23:11:13 +0200 (CEST) Date: Wed, 22 Jun 2005 23:10:58 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050622211058.GY14251@wotan.suse.de> References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050622.132241.21929037.davem@davemloft.net> X-archive-position: 2520 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 1000 Lines: 29 On Wed, Jun 22, 2005 at 01:22:41PM -0700, David S. Miller wrote: > From: Andi Kleen > Date: Wed, 22 Jun 2005 20:06:55 +0200 > > > However it is tricky because CPUs have only a limited load queue > > entries and doing too many prefetches will just overflow that. > > Several processors can queue about 8 prefetch requests, and > these slots are independant of those consumed by a load. 8 entries? That sounds very small. Is that an old Sparc or something? :) An Opteron has 44 entries, effectively 32 for L2. Netburst or POWER4 derived CPUs have more than that. > Yes, if you queue too many prefetches, the queue overflows. > > I think the optimal scheme would be: > > 1) eth_type_trans() info in RX descriptor > 2) prefetch(skb->data) done as early as possible in driver > RX handling > > Actually, I believe to most optimal scheme is: Looks reasonable. Not sure about the "most optimal" though, some benchmarking of different schemes would be probably a good idea. -Andi From davem@davemloft.net Wed Jun 22 14:18:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 14:18:13 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MLI9H9009556 for ; Wed, 22 Jun 2005 14:18:10 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DlCaA-0004zQ-OI; Wed, 22 Jun 2005 14:16:26 -0700 Date: Wed, 22 Jun 2005 14:16:26 -0700 (PDT) Message-Id: <20050622.141626.03111803.davem@davemloft.net> To: ak@suse.de Cc: leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <20050622211058.GY14251@wotan.suse.de> References: <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <20050622211058.GY14251@wotan.suse.de> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2521 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 346 Lines: 11 From: Andi Kleen Date: Wed, 22 Jun 2005 23:10:58 +0200 > 8 entries? That sounds very small. Is that an old Sparc or something? :) Hey, Sparc does suck, this isn't news for anyone :-) > Looks reasonable. Not sure about the "most optimal" though, some benchmarking > of different schemes would be probably a good idea. Absolutely. From oxymoron@waste.org Wed Jun 22 14:28:38 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 14:28:44 -0700 (PDT) Received: from waste.org ([216.27.176.166]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MLScH9010760 for ; Wed, 22 Jun 2005 14:28:38 -0700 Received: from waste.org (localhost [127.0.0.1]) by waste.org (8.13.4/8.13.4/Debian-3) with ESMTP id j5MLR7VZ016072 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 22 Jun 2005 16:27:07 -0500 Received: (from oxymoron@localhost) by waste.org (8.13.4/8.13.4/Submit) id j5MLR7DS016069; Wed, 22 Jun 2005 16:27:07 -0500 Date: Wed, 22 Jun 2005 14:27:07 -0700 From: Matt Mackall To: Jeff Moyer Cc: netdev@oss.sgi.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [patch,rfc] allow registration of multiple netpolls per interface Message-ID: <20050622212707.GD27572@waste.org> References: <17080.35214.507402.998984@segfault.boston.redhat.com> <20050621225252.GY27572@waste.org> <17081.20441.714191.528270@segfault.boston.redhat.com> <20050622170128.GV27572@waste.org> <17081.53899.201190.106025@segfault.boston.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17081.53899.201190.106025@segfault.boston.redhat.com> User-Agent: Mutt/1.5.9i X-archive-position: 2522 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mpm@selenic.com Precedence: bulk X-list: netdev Content-Length: 1874 Lines: 48 On Wed, Jun 22, 2005 at 05:05:15PM -0400, Jeff Moyer wrote: > mpm> It might be simpler to have a single lock here..? > >> > >> Maybe. You can't really have netpoll code running on multiple cpus at the > >> same time, right? This is the rx path, remember, so the other cpu should > >> be spinning on the poll_lock. > >> > >> Keeping separate locks would allow you to unregister a struct netpoll > >> associated with another net device without causing lock contention. This > >> is a very minor win, obviously. > >> > >> I still feel like this npinfo struct is the right place for this, though. > >> If you're strongly opposed to that, I'll change it. > > mpm> No, certainly having it in npinfo makes sense. I just was wondering if > mpm> we really need two locks in there. > > Oh, I misunderstood. Well, one protects recursing into the driver's poll > routine, the other protects access to the np_rx pointer, which may later > become a list. I don't think we can lump these two together, do you? I don't see why we couldn't, but let's worry about it later. > >> >> + spin_lock_irqsave(&npinfo->rx_lock, flags); > >> >> + if (npinfo->rx_np->dev == skb->dev) > >> >> + np = npinfo->rx_np; > >> >> + spin_unlock_irqrestore(&npinfo->rx_lock, flags); > >> > mpm> And I think that means we don't need the lock here either. > >> > >> Sure we do. We need to protect against rmmod's. > > mpm> How can we have an rmmmod when we're trapped? > > Looking over the code, I don't see what would prevent this. Could you > point me the code which prevents this? I forgot we overloaded trapped for dealing with NAPI. Formerly trapping meant "I'm stopping the box, drop every packet that's not addressed to me" which also implied no one should be pulling the rug out from under us. > (Interdiff first) Looks fine. -- Mathematics is the supreme nostalgia of our time. From dada1@cosmosbay.com Wed Jun 22 14:39:51 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 14:39:53 -0700 (PDT) Received: from smtp.cegetel.net (mf00.sitadelle.com [212.94.174.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MLdoH9011715 for ; Wed, 22 Jun 2005 14:39:50 -0700 Received: from [192.168.0.5] (84-4-148-199.dti.cegetel.net [84.4.148.199]) by smtp.cegetel.net (Postfix) with ESMTP id 69AC01A4EF5; Wed, 22 Jun 2005 23:38:21 +0200 (CEST) Message-ID: <42B9DA4D.5090103@cosmosbay.com> Date: Wed, 22 Jun 2005 23:38:21 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: "David S. Miller" Cc: ak@suse.de, leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> In-Reply-To: <20050622.132241.21929037.davem@davemloft.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2523 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 721 Lines: 26 David S. Miller a écrit : > > 2) As shown above, it gives you compute time which can be used to > schedule the prefetch. This nearly makes RX replenishment free. > Instead of having the CPU spin on a cache miss when we run > eth_type_trans() during those cycles, we do useful work. > > I'm going to play around with these ideas in the tg3 driver. > Obvious patch below. Then maybe we could also play with prefetchw() in the case the incoming frame is small enough to be copied to a new skb. drivers/net/tg3.c copy_skb = dev_alloc_skb(len + 2); if (copy_skb == NULL) goto drop_it_no_recycle; + prefetchw(copy_skb->data); copy_skb->dev = tp->dev; skb_reserve(copy_skb, 2); skb_put(copy_skb, len); From cfriesen@nortel.com Wed Jun 22 14:56:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 14:56:38 -0700 (PDT) Received: from zcars04f.nortelnetworks.com (zcars04f.nortelnetworks.com [47.129.242.57]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MLuZH9013004 for ; Wed, 22 Jun 2005 14:56:36 -0700 Received: from zcard303.ca.nortel.com (zcard303.ca.nortel.com [47.129.242.59]) by zcars04f.nortelnetworks.com (Switch-2.2.6/Switch-2.2.0) with ESMTP id j5MLrm313773; Wed, 22 Jun 2005 17:53:49 -0400 (EDT) Received: from nortel.com (acart266.ca.nortel.com [47.130.17.135]) by zcard303.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id MRAFPB4R; Wed, 22 Jun 2005 17:53:34 -0400 Message-ID: <42B9DDDA.5040405@nortel.com> Date: Wed, 22 Jun 2005 15:53:30 -0600 X-Sybari-Space: 00000000 00000000 00000000 00000000 From: Chris Friesen User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andi Kleen CC: "David S. Miller" , leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com Subject: Re: RFC: NAPI packet weighting patch References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <20050622211058.GY14251@wotan.suse.de> In-Reply-To: <20050622211058.GY14251@wotan.suse.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2524 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: cfriesen@nortel.com Precedence: bulk X-list: netdev Content-Length: 155 Lines: 8 Andi Kleen wrote: > 8 entries? That sounds very small. Is that an old Sparc or something? :) The G5 has 8 prefetch streams. Not an ancient cpu. Chris From ak@suse.de Wed Jun 22 15:13:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 15:13:43 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MMDeH9014204 for ; Wed, 22 Jun 2005 15:13:40 -0700 Received: from Relay2.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 8402D1D7E9; Thu, 23 Jun 2005 00:12:14 +0200 (CEST) Date: Thu, 23 Jun 2005 00:11:58 +0200 From: Andi Kleen To: Chris Friesen Cc: Andi Kleen , "David S. Miller" , leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050622221158.GZ14251@wotan.suse.de> References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <20050622211058.GY14251@wotan.suse.de> <42B9DDDA.5040405@nortel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42B9DDDA.5040405@nortel.com> X-archive-position: 2525 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 862 Lines: 22 On Wed, Jun 22, 2005 at 03:53:30PM -0600, Chris Friesen wrote: > Andi Kleen wrote: > > >8 entries? That sounds very small. Is that an old Sparc or something? :) > > The G5 has 8 prefetch streams. Not an ancient cpu. prefetch stream means a context of the auto prefetcher. It different from a load queue entry which is just a load of a cache line which can be triggered by user instructions or the auto prefetcher. Each prefetch stream would consume a lot of them, so just for your 8 streams above you probably need a large two digit number or more. I don't have exact numbers for the PPC970, but afaik its LS unit has a very long queue. On POWER4 (which is a very similar CPU) we see a lot of races that don't happen on other platforms. That seems to be because it reorders writes every aggressively. I suppose this is true for reads as well. -Andi From dada1@cosmosbay.com Wed Jun 22 15:14:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 15:14:50 -0700 (PDT) Received: from smtp.cegetel.net (mf01.sitadelle.com [212.94.174.68]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MMEjH9014397 for ; Wed, 22 Jun 2005 15:14:47 -0700 Received: from [192.168.0.5] (84-4-148-199.dti.cegetel.net [84.4.148.199]) by smtp.cegetel.net (Postfix) with ESMTP id 02956318A81; Thu, 23 Jun 2005 00:13:20 +0200 (CEST) Message-ID: <42B9E281.1090109@cosmosbay.com> Date: Thu, 23 Jun 2005 00:13:21 +0200 From: Eric Dumazet User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: fr, en MIME-Version: 1.0 To: Eric Dumazet Cc: "David S. Miller" , ak@suse.de, leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch References: <1119458226.6918.142.camel@localhost.localdomain> <200506221801.j5MI11xS021866@guinness.s2io.com> <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <42B9DA4D.5090103@cosmosbay.com> In-Reply-To: <42B9DA4D.5090103@cosmosbay.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2526 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: dada1@cosmosbay.com Precedence: bulk X-list: netdev Content-Length: 1162 Lines: 41 Eric Dumazet a écrit : > > Then maybe we could also play with prefetchw() in the case the incoming > frame > is small enough to be copied to a new skb. > > drivers/net/tg3.c > > copy_skb = dev_alloc_skb(len + 2); > if (copy_skb == NULL) > goto drop_it_no_recycle; > + prefetchw(copy_skb->data); > > copy_skb->dev = tp->dev; > skb_reserve(copy_skb, 2); > skb_put(copy_skb, len); > > > I also found that the memcpy() done to copy the data to the new skb suffers from misalignment. This is because of skb_reserve(skbs, 2) that was done on both skb, and memcpy() (at least on x86_64) doing long word copies without checking alignment of source or destination. Maybe we could : 1) make sure both skbs had the same skb_reserve() of 2 (thats not clear because tg3.c mixes the '2' and tp->rx_offset, and according to a comment : rx_offset != 2 iff this is a 5701 card running in PCI-X mode 2) and do : - memcpy(copy_skb->data, skb->data, len); + memcpy(copy_skb->data-2, skb->data-2, len+2); (That is copy 2 more bytes, but gain aligned copy to speedup memcpy()) Eric Dumazet From davem@davemloft.net Wed Jun 22 15:25:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 15:25:25 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MMPJH9016032 for ; Wed, 22 Jun 2005 15:25:19 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DlDcz-0005UN-MR; Wed, 22 Jun 2005 15:23:25 -0700 Date: Wed, 22 Jun 2005 15:23:25 -0700 (PDT) Message-Id: <20050622.152325.15263910.davem@davemloft.net> To: dada1@cosmosbay.com Cc: ak@suse.de, leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <42B9DA4D.5090103@cosmosbay.com> References: <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <42B9DA4D.5090103@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2527 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 334 Lines: 8 From: Eric Dumazet Date: Wed, 22 Jun 2005 23:38:21 +0200 > Then maybe we could also play with prefetchw() in the case the > incoming frame is small enough to be copied to a new skb. That's a good idea too. In fact, this would deal with platforms that use non-temporal stores in their memcpy() implementation. From davem@davemloft.net Wed Jun 22 15:32:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 15:32:27 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MMWNH9016877 for ; Wed, 22 Jun 2005 15:32:23 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DlDju-0005VU-VE; Wed, 22 Jun 2005 15:30:35 -0700 Date: Wed, 22 Jun 2005 15:30:34 -0700 (PDT) Message-Id: <20050622.153034.107939995.davem@davemloft.net> To: dada1@cosmosbay.com Cc: ak@suse.de, leonid.grossman@neterion.com, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <42B9E281.1090109@cosmosbay.com> References: <20050622.132241.21929037.davem@davemloft.net> <42B9DA4D.5090103@cosmosbay.com> <42B9E281.1090109@cosmosbay.com> X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2528 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 973 Lines: 25 From: Eric Dumazet Date: Thu, 23 Jun 2005 00:13:21 +0200 > I also found that the memcpy() done to copy the data to the new skb suffers from misalignment. > > This is because of skb_reserve(skbs, 2) that was done on both skb, and memcpy() (at least on x86_64) doing long word copies without checking > alignment of source or destination. > > Maybe we could : > > 1) make sure both skbs had the same skb_reserve() of 2 (thats not clear because tg3.c mixes the '2' and tp->rx_offset, > and according to a comment : > rx_offset != 2 iff this is a 5701 card running > in PCI-X mode > > 2) and do : > > - memcpy(copy_skb->data, skb->data, len); > + memcpy(copy_skb->data-2, skb->data-2, len+2); > > (That is copy 2 more bytes, but gain aligned copy to speedup memcpy()) Yep, good idea. Actually, the driver should be using NET_IP_ALIGN for rx_offset unless it's the 5701 card running in PCI-X mode case. From leonid.grossman@neterion.com Wed Jun 22 15:45:08 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 15:45:11 -0700 (PDT) Received: from ns1.s2io.com (ns1.s2io.com [142.46.200.198]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MMj7H9017950 for ; Wed, 22 Jun 2005 15:45:07 -0700 Received: from guinness.s2io.com (sentry.s2io.com [142.46.200.199]) by ns1.s2io.com (8.12.10/8.12.10) with ESMTP id j5MMggcx022438; Wed, 22 Jun 2005 18:42:42 -0400 (EDT) Received: from lgt40 ([10.16.16.68]) by guinness.s2io.com (8.12.6/8.12.6) with ESMTP id j5MMgbxS009935; Wed, 22 Jun 2005 18:42:38 -0400 (EDT) Message-Id: <200506222242.j5MMgbxS009935@guinness.s2io.com> From: "Leonid Grossman" To: "'David S. Miller'" , Cc: , , , , Subject: RE: RFC: NAPI packet weighting patch Date: Wed, 22 Jun 2005 15:42:30 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 In-Reply-To: <20050622.132241.21929037.davem@davemloft.net> Thread-Index: AcV3aC+r7kcNEVWzR+yi+7i2r8Xf7QAEqubw X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Scanned-By: MIMEDefang 2.34 X-archive-position: 2529 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: leonid.grossman@neterion.com Precedence: bulk X-list: netdev Content-Length: 3070 Lines: 99 > -----Original Message----- > From: David S. Miller [mailto:davem@davemloft.net] > Sent: Wednesday, June 22, 2005 1:23 PM > To: ak@suse.de > Cc: leonid.grossman@neterion.com; hadi@cyberus.ca; > becker@scyld.com; rick.jones2@hp.com; netdev@oss.sgi.com; > davem@redhat.com > Subject: Re: RFC: NAPI packet weighting patch > > From: Andi Kleen > Date: Wed, 22 Jun 2005 20:06:55 +0200 > > > However it is tricky because CPUs have only a limited load queue > > entries and doing too many prefetches will just overflow that. > > Several processors can queue about 8 prefetch requests, and > these slots are independant of those consumed by a load. > > Yes, if you queue too many prefetches, the queue overflows. > > I think the optimal scheme would be: > > 1) eth_type_trans() info in RX descriptor > 2) prefetch(skb->data) done as early as possible in driver > RX handling > > Actually, I believe to most optimal scheme is: > > foo_driver_rx() > { > for_each_rx_descriptor() { > ... > skb = driver_priv->rx_skbs[index]; > prefetch(skb->data); > > skb = realloc_or_recycle_rx_descriptor(skb, index); > if (skb == NULL) > goto next_rxd; > > skb->prot = eth_type_trans(skb, driver_priv->dev); > netif_receive_skb(skb); > ... > next_rxd: > ... > } > } > > The idea is that first the prefetch goes into flight, then > you do the recycle or reallocation of the RX descriptor SKB, > then you try to touch the data. > > This makes it very likely the prefetch will be in the cpu in time. > > Everyone seems to have this absolute fetish about batching > the RX descriptor refilling work. It's wrong, it should be > done when you pull a receive packet off the ring, for many > reasons. Off the top of my head: This is very hw-dependent, since there are NICs that read descriptors in batches anyways - but the second argument below is compelling. > > 1) Descriptors are refilled as soon as possible, decreasing > the chance of the device hitting the end of the RX ring > and thus unable to receive a packet. > > 2) As shown above, it gives you compute time which can be used to > schedule the prefetch. This nearly makes RX replenishment free. > Instead of having the CPU spin on a cache miss when we run > eth_type_trans() during those cycles, we do useful work. > > I'm going to play around with these ideas in the tg3 driver. > Obvious patch below. We will play around with the s2io driver as well, there seem to be several interesting ideas to try - thanks a lot for the input! Cheers, Leonid > > --- 1/drivers/net/tg3.c.~1~ 2005-06-22 12:33:07.000000000 -0700 > +++ 2/drivers/net/tg3.c 2005-06-22 13:19:13.000000000 -0700 > @@ -2772,6 +2772,13 @@ > goto next_pkt_nopost; > } > > + /* Prefetch now. The recycle/realloc of the RX > + * entry is moderately expensive, so by the time > + * that is complete the data should have reached > + * the cpu. > + */ > + prefetch(skb->data); > + > work_mask |= opaque_key; > > if ((desc->err_vlan & RXD_ERR_MASK) != 0 && > From ak@suse.de Wed Jun 22 16:14:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 16:14:48 -0700 (PDT) Received: from mx2.suse.de (ns2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MNEfH9019685 for ; Wed, 22 Jun 2005 16:14:42 -0700 Received: from Relay1.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id C04651D7A4; Thu, 23 Jun 2005 01:13:15 +0200 (CEST) Date: Thu, 23 Jun 2005 01:13:00 +0200 From: Andi Kleen To: Leonid Grossman Cc: "'David S. Miller'" , ak@suse.de, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050622231300.GC14251@wotan.suse.de> References: <20050622.132241.21929037.davem@davemloft.net> <200506222242.j5MMgbxS009935@guinness.s2io.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200506222242.j5MMgbxS009935@guinness.s2io.com> X-archive-position: 2530 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 553 Lines: 12 > This is very hw-dependent, since there are NICs that read descriptors in > batches anyways - but the second argument below is compelling. The computing time must be quite long to be really a win. You need to waste a few hundred cycles at least on a modern fast CPU. -Andi > > > > 2) As shown above, it gives you compute time which can be used to > > schedule the prefetch. This nearly makes RX replenishment free. > > Instead of having the CPU spin on a cache miss when we run > > eth_type_trans() during those cycles, we do useful work. From davem@redhat.com Wed Jun 22 16:21:41 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 16:21:45 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MNLfH9024203 for ; Wed, 22 Jun 2005 16:21:41 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5MNK7JB028412; Wed, 22 Jun 2005 19:20:07 -0400 Received: from devserv.devel.redhat.com (devserv.devel.redhat.com [172.16.58.1]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5MNJvu27504; Wed, 22 Jun 2005 19:19:57 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by devserv.devel.redhat.com (8.12.11/8.12.11) with ESMTP id j5MNJusB013779; Wed, 22 Jun 2005 19:19:56 -0400 Date: Wed, 22 Jun 2005 19:19:56 -0400 (EDT) Message-Id: <20050622.191956.39166724.davem@redhat.com> To: ak@suse.de Cc: leonid.grossman@neterion.com, davem@davemloft.net, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com Subject: Re: RFC: NAPI packet weighting patch From: "David S. Miller" In-Reply-To: <20050622231300.GC14251@wotan.suse.de> References: <20050622.132241.21929037.davem@davemloft.net> <200506222242.j5MMgbxS009935@guinness.s2io.com> <20050622231300.GC14251@wotan.suse.de> X-Mailer: Mew version 4.2.52 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2531 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev Content-Length: 439 Lines: 11 From: Andi Kleen Date: Thu, 23 Jun 2005 01:13:00 +0200 > The computing time must be quite long to be really a win. > You need to waste a few hundred cycles at least on a modern fast CPU. SKB allocation more than fits this requirement, and that is exactly what the RX descriptor replenishment will do. Even if SKB allocation was only half the necessary number of cycles for the prefetch to hit the cpu, it'd still be a win. From ak@suse.de Wed Jun 22 16:25:18 2005 Received: with ECARTIS (v1.0.0; list netdev); Wed, 22 Jun 2005 16:25:21 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5MNPIH9024813 for ; Wed, 22 Jun 2005 16:25:18 -0700 Received: from Relay1.suse.de (mail2.suse.de [195.135.221.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 8C9171D7A4; Thu, 23 Jun 2005 01:23:56 +0200 (CEST) Date: Thu, 23 Jun 2005 01:23:45 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@suse.de, leonid.grossman@neterion.com, davem@davemloft.net, hadi@cyberus.ca, becker@scyld.com, rick.jones2@hp.com, netdev@oss.sgi.com Subject: Re: RFC: NAPI packet weighting patch Message-ID: <20050622232345.GD14251@wotan.suse.de> References: <20050622.132241.21929037.davem@davemloft.net> <200506222242.j5MMgbxS009935@guinness.s2io.com> <20050622231300.GC14251@wotan.suse.de> <20050622.191956.39166724.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050622.191956.39166724.davem@redhat.com> X-archive-position: 2532 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@suse.de Precedence: bulk X-list: netdev Content-Length: 699 Lines: 20 On Wed, Jun 22, 2005 at 07:19:56PM -0400, David S. Miller wrote: > From: Andi Kleen > Date: Thu, 23 Jun 2005 01:13:00 +0200 > > > The computing time must be quite long to be really a win. > > You need to waste a few hundred cycles at least on a modern fast CPU. > > SKB allocation more than fits this requirement, and that is exactly > what the RX descriptor replenishment will do. It shouldn't in theory. Unless they did something bad to the slab allocator again when I wasn't looking ;-) > > Even if SKB allocation was only half the necessary number of cycles > for the prefetch to hit the cpu, it'd still be a win. If it's too small then it might be left in the noise. -Andi From P@draigBrady.com Thu Jun 23 01:58:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 01:59:06 -0700 (PDT) Received: from corvil.com (gate.corvil.net [213.94.219.177]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5N8wrH9032520 for ; Thu, 23 Jun 2005 01:58:54 -0700 Received: from draigBrady.com (pixelbeat.local.corvil.com [172.18.1.170]) by corvil.com (8.13.3/8.13.3) with ESMTP id j5N8uUeB055457; Thu, 23 Jun 2005 09:56:36 +0100 (IST) (envelope-from P@draigBrady.com) Message-ID: <42BA793E.4080008@draigBrady.com> Date: Thu, 23 Jun 2005 09:56:30 +0100 From: P@draigBrady.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124 X-Accept-Language: en-us, en MIME-Version: 1.0 To: hadi@cyberus.ca CC: "David S. Miller" , gandalf@wlug.westbo.se, shemminger@osdl.org, mitch.a.williams@intel.com, john.ronciak@intel.com, mchan@broadcom.com, buytenh@wantstofly.org, jdmason@us.ibm.com, netdev@oss.sgi.com, Robert.Olsson@data.slu.se, ganesh.venkatesan@intel.com, jesse.brandeburg@intel.com Subject: Re: RFC: NAPI packet weighting patch References: <42A5284C.3060808@osdl.org> <1118147904.6320.108.camel@localhost.localdomain> <20050621.133704.08321534.davem@davemloft.net> <42B92490.40005@draigBrady.com> <1119469066.6918.168.camel@localhost.localdomain> In-Reply-To: <1119469066.6918.168.camel@localhost.localdomain> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2533 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: P@draigBrady.com Precedence: bulk X-list: netdev Content-Length: 1017 Lines: 34 jamal wrote: > On Wed, 2005-22-06 at 09:42 +0100, P@draigBrady.com wrote: > > >>Yes the copy is essentially free here as the data is already cached. >> >>As a data point, I went the whole hog and used buffer recycling >>in my essentially packet sniffing application. I.E. there are no >>allocs per packet at all, and this make a HUGE difference. On a >>2x3.4GHz 2xe1000 system I can receive 620Kpps per port sustained >>into my userspace app which does a LOT of processing per packet. >>Without the buffer recycling is was around 250Kpps. >>Note I don't reuse an skb until the packet is copied into a >>PACKET_MMAP buffer. > > > Was this machine SMP? Yes. 2 x 3.4GHz P4s 1 logical CPU per port (irq affinity) 1 thread (NB on same logical CPU as irq (sched_affinity)) to do user space per packet processing. > NAPI involved? Yep. > I take it nothing interfering in > the middle with the headers? It uses the standard path to PACKET_MMAP buffer e1000_clean_rx_irq -> netif_receive_skb -> tpacket_rcv Pádraig. From herbert@gondor.apana.org.au Thu Jun 23 04:40:36 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 04:40:44 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5NBeYH9013886 for ; Thu, 23 Jun 2005 04:40:35 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DlQ29-0002zv-00; Thu, 23 Jun 2005 21:38:14 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DlQ1w-0003yI-00; Thu, 23 Jun 2005 21:38:00 +1000 From: Herbert Xu To: davem@davemloft.net (David S. Miller) Subject: Re: [patch] devinet: cleanup if statements Cc: pmeda@akamai.com, jgarzik@pobox.com, akpm@osdl.org, netdev@oss.sgi.com Organization: Core In-Reply-To: <20050621.134822.21926602.davem@davemloft.net> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Thu, 23 Jun 2005 21:38:00 +1000 X-archive-position: 2534 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 898 Lines: 22 David S. Miller wrote: > > The second hunk of your patch seems to defeat the intention > of that code. I believe the idea is that if the label and > the device name differ, use the label. Actually I think Prasanna is right. The if conditional is testing whether ifa->ifa_label is NULL. As ifa->ifa_label is an array and it's not the first element in the structure, it can't possibly be NULL. With your interpretation above his patch is correct as well. If we want to use the label when it is different from the device name, then it is equivalent to always use the label since the only time we'd use the device name is when it's equal to the label :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hadi@cyberus.ca Thu Jun 23 05:15:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 05:15:52 -0700 (PDT) Received: from mx01.cybersurf.com (mx01.cybersurf.com [209.197.145.104]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5NCFjH9016095 for ; Thu, 23 Jun 2005 05:15:47 -0700 Received: from mail.cyberus.ca ([209.197.145.21]) by mx01.cybersurf.com with esmtp (Exim 4.30) id 1DlQb5-0004WI-2l for netdev@oss.sgi.com; Thu, 23 Jun 2005 06:14:19 -0600 Received: from cpe0030ab124d2f-cm014500000962.cpe.net.cable.rogers.com ([24.103.99.32] helo=[10.0.0.229]) by mail.cyberus.ca with esmtp (Exim 4.20) id 1DlQb4-0007yj-LU; Thu, 23 Jun 2005 08:14:18 -0400 Subject: Re: RFC: NAPI packet weighting patch From: jamal Reply-To: hadi@cyberus.ca To: "David S. Miller" Cc: Lennert Buytenhek , davidm@hpl.hp.com, netdev , dada1@cosmosbay.com, ak@suse.de, leonid.grossman@neterion.com, becker@scyld.com, rick.jones2@hp.com, davem@redhat.com In-Reply-To: <20050622.152325.15263910.davem@davemloft.net> References: <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <42B9DA4D.5090103@cosmosbay.com> <20050622.152325.15263910.davem@davemloft.net> Content-Type: text/plain Organization: unknown Date: Thu, 23 Jun 2005 08:14:11 -0400 Message-Id: <1119528852.11975.65.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-archive-position: 2535 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: hadi@cyberus.ca Precedence: bulk X-list: netdev Content-Length: 2533 Lines: 70 On Wed, 2005-22-06 at 15:23 -0700, David S. Miller wrote: > From: Eric Dumazet > Date: Wed, 22 Jun 2005 23:38:21 +0200 > > > Then maybe we could also play with prefetchw() in the case the > > incoming frame is small enough to be copied to a new skb. > > That's a good idea too. In fact, this would deal with platforms > that use non-temporal stores in their memcpy() implementation. For the fans of the e1000 (or even the tg3 deprived people), heres a patch which originated from David Mosberger that i played around (about 9 months back) - it will need some hand patching for the latest driver. Similar approach: prefetch skb->data,twiddle twiddle not little star, touch header. I found the aggressive mode effective on a xeon but i belive David is using this on x86_64. So Lennert, I lied to you saying it was never effective on x86. You just have to do the right juju such as factoring in the memory load-latency and how much cache you have on your specific CPU. CCing davidm (in addition To: davem of course ;->) so he may provide more insight on his tests. Interesting of course is if you miss the "twiddle here" (as i saw in my experiments) and do the obvious (such as defining AGGRESSIVE to 0), you infact end up paying a penalty in performance. cheers, jamal ===== drivers/net/e1000/e1000_main.c 1.134 vs edited ===== --- 1.134/drivers/net/e1000/e1000_main.c 2004-09-12 16:52:48 -07:00 +++ edited/drivers/net/e1000/e1000_main.c 2004-09-30 06:05:11 -07:00 @@ -2278,12 +2278,30 @@ uint8_t last_byte; unsigned int i; boolean_t cleaned = FALSE; +#define AGGRESSIVE 1 i = rx_ring->next_to_clean; +#if AGGRESSIVE + prefetch(rx_ring->buffer_info[i].skb->data); +#endif rx_desc = E1000_RX_DESC(*rx_ring, i); while(rx_desc->status & E1000_RXD_STAT_DD) { buffer_info = &rx_ring->buffer_info[i]; +# if AGGRESSIVE + { + struct e1000_rx_desc *next_rx; + unsigned int j = i + 1; + + if (j == rx_ring->count) + j = 0; + next_rx = E1000_RX_DESC(*rx_ring, j); + if (next_rx->status & E1000_RXD_STAT_DD) + prefetch(rx_ring->buffer_info[j].skb->data); + } +# else + prefetch(buffer_info->skb->data); +# endif #ifdef CONFIG_E1000_NAPI if(*work_done >= work_to_do) break; From akepner@sgi.com Thu Jun 23 09:40:34 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 09:40:39 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5NGeYH9000916 for ; Thu, 23 Jun 2005 09:40:34 -0700 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j5NITANG018716 for ; Thu, 23 Jun 2005 11:29:10 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by nodin.corp.sgi.com (SGI-8.12.5/8.12.10/SGI_generic_relay-1.2) with ESMTP id j5NGcCbT87405850 for ; Thu, 23 Jun 2005 09:38:12 -0700 (PDT) Received: from [192.168.2.20] (mtv-vpn-sw-corp-0-69.corp.sgi.com [134.15.0.69]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id j5NGb5dP40289365; Thu, 23 Jun 2005 09:37:05 -0700 (PDT) Date: Thu, 23 Jun 2005 09:33:35 -0700 (PDT) From: Arthur Kepner X-X-Sender: akepner@resonance.WorkGroup To: netdev@oss.sgi.com cc: Rick Jones , Herbert Xu Subject: [RFC/PATCH] "safer ipv4 reassembly" (fwd) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2536 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akepner@sgi.com Precedence: bulk X-list: netdev Content-Length: 14527 Lines: 473 What with the recent migration to vger.kernel.org, I'm forwarding this to oss.sgi.com, just in case any interested parties missed it. ---------- Forwarded message ---------- Date: Wed, 22 Jun 2005 16:00:55 -0700 (PDT) From: Arthur Kepner To: netdev@vger.kernel.org Subject: [RFC/PATCH] "safer ipv4 reassembly" A little more than a month ago I sent a RFC/PATCH for something I called "strict ipv4 reassembly". This was an attempt to make it much less likely that IP fragments from different IP datagrams were reassembled together when the IP id wraps. That patch was considered unacceptable because it required fragments to arrive in order or they'd be dropped. One idea that resulted from that thread was to keep a count of IP datagrams for a (src,dst,proto) and use that as a kind of sequence number to check that a fragment is valid. (I believe that Rick Jones and Herbert Xu each independently came up with this idea, or something very close to it.) Following is a patch which implements that idea. A new sysctl "sysctl_ip_reassembly_count" is used to control how much reordering of IP fragments we'll tolerate. If it's zero, the patch is a no-op. If sysctl_ip_reassembly_count is non-zero, it defines a "window size" for IP fragments. When a new fragment queue is made, the "bottom" of the window is defined by the number if IP packets which have been received for the associated (src,dst,proto), and each time a fragment is added to the queue, the bottom of the window is advanced. But before adding a fragment to the queue, a check is made that the number of IP fragments in the queue falls within the window. If not, the queue is dropped. Comments? include/linux/sysctl.h | 1 include/net/ip.h | 1 net/ipv4/ip_fragment.c | 206 +++++++++++++++++++++++++++++++++++++++++++++ net/ipv4/ip_input.c | 24 ++++- net/ipv4/sysctl_net_ipv4.c | 11 ++ 5 files changed, 240 insertions(+), 3 deletions(-) Signed-off-by: Arthur Kepner diff -rup linux.orig/include/linux/sysctl.h linux.new/include/linux/sysctl.h --- linux.orig/include/linux/sysctl.h 2005-06-14 11:35:18.611069887 -0700 +++ linux.new/include/linux/sysctl.h 2005-06-22 14:04:17.384853993 -0700 @@ -347,6 +347,7 @@ enum NET_TCP_MODERATE_RCVBUF=106, NET_TCP_TSO_WIN_DIVISOR=107, NET_TCP_BIC_BETA=108, + NET_IPV4_REASM_COUNT=109, }; enum { diff -rup linux.orig/include/net/ip.h linux.new/include/net/ip.h --- linux.orig/include/net/ip.h 2005-06-14 11:52:09.878700520 -0700 +++ linux.new/include/net/ip.h 2005-06-22 14:04:33.508057469 -0700 @@ -300,6 +300,7 @@ enum ip_defrag_users }; struct sk_buff *ip_defrag(struct sk_buff *skb, u32 user); +void ip_count(u32 saddr, u32 daddr, u8 protocol); extern int ip_frag_nqueues; extern atomic_t ip_frag_mem; diff -rup linux.orig/net/ipv4/ip_fragment.c linux.new/net/ipv4/ip_fragment.c --- linux.orig/net/ipv4/ip_fragment.c 2005-06-13 16:49:55.290992303 -0700 +++ linux.new/net/ipv4/ip_fragment.c 2005-06-22 14:17:54.136940893 -0700 @@ -56,6 +56,8 @@ int sysctl_ipfrag_high_thresh = 256*1024; int sysctl_ipfrag_low_thresh = 192*1024; +extern int sysctl_ip_reassembly_count; + /* Important NOTE! Fragment queue must be destroyed before MSL expires. * RFC791 is wrong proposing to prolongate timer each fragment arrival by TTL. */ @@ -69,6 +71,25 @@ struct ipfrag_skb_cb #define FRAG_CB(skb) ((struct ipfrag_skb_cb*)((skb)->cb)) +/* struct ipc contains a count of the number of IP datagrams + * received for a (saddr, daddr, protocol) tuple - but one of + * these structures exists for a given (saddr, daddr, protocol) + * if and only if there is a queue of IP fragments associated + * with that 3-tuple and sysctl_ip_reassembly_count is non-zero. + */ +struct ipc { + struct hlist_node node; + u32 saddr; + u32 daddr; + u8 protocol; + atomic_t refcnt; /* how many ipqs hold refs to us */ + atomic_t seq; /* how many ip datagrams for this + * (saddr,daddr,protocol) since we + * were created */ + struct timer_list timer; + struct rcu_head rcu; +}; + /* Describe an entry in the "incomplete datagrams" queue. */ struct ipq { struct ipq *next; /* linked list pointers */ @@ -92,6 +113,14 @@ struct ipq { struct ipq **pprev; int iif; struct timeval stamp; + struct ipc *ipc; + atomic_t seq; + /* ipq->seq defines the "bottom" of the window of sequence numbers + * that are valid for this fragment - the "top" of the window is + * (ipq->seq + sysctl_ip_reassembly_count). ipq->seq is initialized + * to the value in the associated ipc when the fragment queue is + * created, and incremented each time a fragment is added to the + * queue */ }; /* Hash table. */ @@ -105,6 +134,12 @@ static u32 ipfrag_hash_rnd; static LIST_HEAD(ipq_lru_list); int ip_frag_nqueues = 0; +#define IPC_HASHSZ IPQ_HASHSZ +static struct { + struct hlist_head head; + spinlock_t lock; +} ipc_hash[IPC_HASHSZ]; + static __inline__ void __ipq_unlink(struct ipq *qp) { if(qp->next) @@ -121,6 +156,11 @@ static __inline__ void ipq_unlink(struct write_unlock(&ipfrag_lock); } +static unsigned int ipchashfn(u32 saddr, u32 daddr, u8 prot) +{ + return jhash_3words(prot, saddr, daddr, 0) & (IPC_HASHSZ - 1); +} + static unsigned int ipqhashfn(u16 id, u32 saddr, u32 daddr, u8 prot) { return jhash_3words((u32)id << 16 | prot, saddr, daddr, @@ -231,8 +271,16 @@ static __inline__ void ipq_put(struct ip */ static void ipq_kill(struct ipq *ipq) { + struct ipc *cp = ipq->ipc; + if (del_timer(&ipq->timer)) atomic_dec(&ipq->refcnt); + if (cp) { + atomic_dec(&cp->refcnt); + /* no particular reason to use sysctl_ipfrag_time + * for this timer */ + mod_timer(&cp->timer, jiffies + sysctl_ipfrag_time); + } if (!(ipq->last_in & COMPLETE)) { ipq_unlink(ipq); @@ -348,10 +396,109 @@ static struct ipq *ip_frag_intern(unsign return qp; } +static inline void __ipc_destroy(struct rcu_head *head) +{ + kfree(container_of(head, struct ipc, rcu)); +} + +static void ipc_destroy(unsigned long arg) +{ + struct ipc *cp = (struct ipc *) arg; + unsigned int hash = ipchashfn(cp->saddr, cp->daddr, cp->protocol); + + spin_lock(&ipc_hash[hash].lock); + BUG_ON((atomic_read(&cp->refcnt)) < 0); + if (atomic_read(&cp->refcnt) == 0) { + hlist_del_rcu(&cp->node); + call_rcu(&cp->rcu, __ipc_destroy); + } + spin_unlock(&ipc_hash[hash].lock); +} + +/* + * must hold spinlock for the appropriate hash list head when + * __ipc_create is called + */ + +static inline struct ipc *__ipc_create(struct iphdr *iph, + const unsigned int hash) +{ + struct ipc *cp = kmalloc(sizeof(struct ipc), GFP_ATOMIC); + /* XXX should we account size to ip_frag_mem ??? */ + if (cp) { + cp->saddr = iph->saddr; + cp->daddr = iph->daddr; + cp->protocol = iph->protocol; + atomic_set(&cp->seq, 0); + atomic_set(&cp->refcnt, 1); + INIT_HLIST_NODE(&cp->node); + hlist_add_head_rcu(&cp->node, &ipc_hash[hash].head); + init_timer(&cp->timer); + cp->timer.data = (unsigned long) cp; + cp->timer.function = ipc_destroy; + } else { + NETDEBUG(if (net_ratelimit()) + printk(KERN_ERR "__ipc_create: no memory left !\n")); + } + return cp; +} + +/* + * must be "rcu safe" when __ipc_find is called - either use + * rcu_read_lock (if you intend only to read the returned struct) + * or grab the spinlock for the appropriate hash list head (if + * you might modify the returned struct) + */ +static inline struct ipc *__ipc_find(u32 saddr, u32 daddr, u8 protocol, + const unsigned int hash) +{ + struct hlist_node *p; + + hlist_for_each_rcu(p, &ipc_hash[hash].head) { + struct ipc * cp = (struct ipc *)p; + if(cp->saddr == saddr && + cp->daddr == daddr && + cp->protocol == protocol) { + return cp; + } + } + return NULL; +} + +static struct ipc *ipc_find(struct iphdr *iph) +{ + struct ipc *cp; + unsigned int hash = ipchashfn(iph->saddr, iph->daddr, iph->protocol); + + rcu_read_lock(); + if((cp = __ipc_find(iph->saddr, iph->daddr, + iph->protocol, hash)) != NULL) { + atomic_inc(&cp->refcnt); + rcu_read_unlock(); + return cp; + } + rcu_read_unlock(); + spin_lock(&ipc_hash[hash].lock); + if((cp = __ipc_find(iph->saddr, iph->daddr, + iph->protocol, hash)) != NULL) { + atomic_inc(&cp->refcnt); + spin_unlock(&ipc_hash[hash].lock); + return cp; + } + cp = __ipc_create(iph, hash); + spin_unlock(&ipc_hash[hash].lock); + return cp; +} + + /* Add an entry to the 'ipq' queue for a newly received IP datagram. */ static struct ipq *ip_frag_create(unsigned hash, struct iphdr *iph, u32 user) { struct ipq *qp; + struct ipc *cp = NULL; + + if (sysctl_ip_reassembly_count && (cp = ipc_find(iph)) == NULL) + return NULL; if ((qp = frag_alloc_queue()) == NULL) goto out_nomem; @@ -366,6 +513,10 @@ static struct ipq *ip_frag_create(unsign qp->meat = 0; qp->fragments = NULL; qp->iif = 0; + qp->ipc = cp; + if (sysctl_ip_reassembly_count && cp) { + atomic_set(&qp->seq, atomic_read(&cp->seq)); + } /* Initialize a timer for this entry. */ init_timer(&qp->timer); @@ -381,6 +532,51 @@ out_nomem: return NULL; } +void ip_count(u32 saddr, u32 daddr, u8 protocol) +{ + struct ipc *cp = NULL; + unsigned int hash = ipchashfn(saddr, daddr, protocol); + + rcu_read_lock(); + if((cp = __ipc_find(saddr, daddr, protocol, hash)) != NULL) { + atomic_inc(&cp->seq); + } + rcu_read_unlock(); +} + +static inline int in_window(int bottom, int size, int seq) { + return (((seq - bottom) >= 0) && ((seq - (bottom + size)) < 0)); +} + +static int __ip_reassembly_count_check(const struct iphdr *iph, struct ipq *qp) +{ + struct ipc *cp = qp->ipc; + int cseq, qseq; + + /* qp->ipc may be NULL if sysctl_ip_reassembly_count was off + * at the time the fragment queue was created */ + if (cp == NULL) + return 0; + + cseq = atomic_read(&cp->seq); + qseq = atomic_inc_return(&qp->seq); + + if (!in_window(qseq, sysctl_ip_reassembly_count, cseq)) { + atomic_inc(&qp->refcnt); + read_unlock(&ipfrag_lock); + spin_lock(&qp->lock); + if (!(qp->last_in&COMPLETE)) + ipq_kill(qp); + spin_unlock(&qp->lock); + ipq_put(qp, NULL); + IP_INC_STATS_BH(IPSTATS_MIB_REASMFAILS); + read_lock(&ipfrag_lock); + return 1; + } + return 0; +} + + /* Find the correct entry in the "incomplete datagrams" queue for * this IP datagram, and create new one, if nothing is found. */ @@ -400,6 +596,10 @@ static inline struct ipq *ip_find(struct qp->daddr == daddr && qp->protocol == protocol && qp->user == user) { + if (sysctl_ip_reassembly_count && + __ip_reassembly_count_check(iph, qp)) { + break; + } atomic_inc(&qp->refcnt); read_unlock(&ipfrag_lock); return qp; @@ -679,9 +879,15 @@ struct sk_buff *ip_defrag(struct sk_buff void ipfrag_init(void) { + int i; ipfrag_hash_rnd = (u32) ((num_physpages ^ (num_physpages>>7)) ^ (jiffies ^ (jiffies >> 6))); + for (i = 0; i < IPC_HASHSZ; i++ ) { + INIT_HLIST_HEAD(&ipc_hash[i].head); + spin_lock_init(&ipc_hash[i].lock); + } + init_timer(&ipfrag_secret_timer); ipfrag_secret_timer.function = ipfrag_secret_rebuild; ipfrag_secret_timer.expires = jiffies + sysctl_ipfrag_secret_interval; diff -rup linux.orig/net/ipv4/ip_input.c linux.new/net/ipv4/ip_input.c --- linux.orig/net/ipv4/ip_input.c 2005-06-13 16:23:41.824620856 -0700 +++ linux.new/net/ipv4/ip_input.c 2005-06-22 14:02:35.705155734 -0700 @@ -146,6 +146,14 @@ #include #include +/* + * A non-zero value for sysctl_ip_reassembly_count defines the + * size of the window of ip fragments that are considered valid. + * This is useful for preventing reassembly of fragments from + * different IP datagrams when the 16-bit IP id wraps. + * A value of zero means the window is unlimited. + */ +int sysctl_ip_reassembly_count = 0; /* * SNMP management statistics */ @@ -286,13 +294,17 @@ static inline int ip_rcv_finish(struct s { struct net_device *dev = skb->dev; struct iphdr *iph = skb->nh.iph; + __u32 saddr = iph->saddr; + __u32 daddr = iph->daddr; + __u8 proto = iph->protocol; + int ret; /* * Initialise the virtual path cache for the packet. It describes * how the packet travels inside Linux networking. */ if (skb->dst == NULL) { - if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, dev)) + if (ip_route_input(skb, daddr, saddr, iph->tos, dev)) goto drop; } @@ -334,7 +346,7 @@ static inline int ip_rcv_finish(struct s if (!IN_DEV_SOURCE_ROUTE(in_dev)) { if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit()) printk(KERN_INFO "source route option %u.%u.%u.%u -> %u.%u.%u.%u\n", - NIPQUAD(iph->saddr), NIPQUAD(iph->daddr)); + NIPQUAD(saddr), NIPQUAD(iph->daddr)); in_dev_put(in_dev); goto drop; } @@ -345,7 +357,13 @@ static inline int ip_rcv_finish(struct s } } - return dst_input(skb); + ret = dst_input(skb); + + if (sysctl_ip_reassembly_count) { + ip_count(saddr, daddr, proto); + } + + return ret; inhdr_error: IP_INC_STATS_BH(IPSTATS_MIB_INHDRERRORS); diff -rup linux.orig/net/ipv4/sysctl_net_ipv4.c linux.new/net/ipv4/sysctl_net_ipv4.c --- linux.orig/net/ipv4/sysctl_net_ipv4.c 2005-06-14 11:36:29.923218508 -0700 +++ linux.new/net/ipv4/sysctl_net_ipv4.c 2005-06-22 14:03:50.869948048 -0700 @@ -29,6 +29,7 @@ extern int sysctl_ipfrag_low_thresh; extern int sysctl_ipfrag_high_thresh; extern int sysctl_ipfrag_time; extern int sysctl_ipfrag_secret_interval; +extern int sysctl_ip_reassembly_count; /* From ip_output.c */ extern int sysctl_ip_dynaddr; @@ -49,6 +50,7 @@ extern int inet_peer_gc_mintime; extern int inet_peer_gc_maxtime; #ifdef CONFIG_SYSCTL +static int zero; static int tcp_retr1_max = 255; static int ip_local_port_range_min[] = { 1, 1 }; static int ip_local_port_range_max[] = { 65535, 65535 }; @@ -595,6 +597,15 @@ ctl_table ipv4_table[] = { .strategy = &sysctl_jiffies }, { + .ctl_name = NET_IPV4_REASM_COUNT, + .procname = "ip_reassembly_count", + .data = &sysctl_ip_reassembly_count, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_minmax, + .extra1 = &zero + }, + { .ctl_name = NET_TCP_NO_METRICS_SAVE, .procname = "tcp_no_metrics_save", .data = &sysctl_tcp_nometrics_save, -- Arthur - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From davidm@napali.hpl.hp.com Thu Jun 23 10:37:47 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 10:37:57 -0700 (PDT) Received: from palrel11.hp.com (palrel11.hp.com [156.153.255.246]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5NHblH9006019 for ; Thu, 23 Jun 2005 10:37:47 -0700 Received: from hplms2.hpl.hp.com (hplms2.hpl.hp.com [15.0.152.33]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by palrel11.hp.com (Postfix) with ESMTP id E89A210AFB; Thu, 23 Jun 2005 10:36:24 -0700 (PDT) Received: from napali.hpl.hp.com (napali.hpl.hp.com [15.4.89.123]) by hplms2.hpl.hp.com (8.13.1/8.13.1/HPL-PA Hub) with ESMTP id j5NHaLdc027589; Thu, 23 Jun 2005 10:36:21 -0700 (PDT) Received: from napali.hpl.hp.com (napali [127.0.0.1]) by napali.hpl.hp.com (8.13.4/8.13.4/Debian-3) with ESMTP id j5NHaKkS004115; Thu, 23 Jun 2005 10:36:20 -0700 Received: (from davidm@localhost) by napali.hpl.hp.com (8.13.4/8.13.4/Submit) id j5NHaBai004110; Thu, 23 Jun 2005 10:36:11 -0700 From: David Mosberger MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17082.62219.353794.762348@napali.hpl.hp.com> Date: Thu, 23 Jun 2005 10:36:11 -0700 To: hadi@cyberus.ca Cc: "David S. Miller" , Lennert Buytenhek , davidm@hpl.hp.com, netdev , dada1@cosmosbay.com, ak@suse.de, leonid.grossman@neterion.com, becker@scyld.com, rick.jones2@hp.com, davem@redhat.com Subject: Re: RFC: NAPI packet weighting patch In-Reply-To: <1119528852.11975.65.camel@localhost.localdomain> References: <20050622180654.GX14251@wotan.suse.de> <20050622.132241.21929037.davem@davemloft.net> <42B9DA4D.5090103@cosmosbay.com> <20050622.152325.15263910.davem@davemloft.net> <1119528852.11975.65.camel@localhost.localdomain> X-Mailer: VM 7.19 under Emacs 21.4.1 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ X-archive-position: 2537 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davidm@napali.hpl.hp.com Precedence: bulk X-list: netdev Content-Length: 8268 Lines: 194 >>>>> On Thu, 23 Jun 2005 08:14:11 -0400, jamal said: Jamal> For the fans of the e1000 (or even the tg3 deprived people), Jamal> heres a patch which originated from David Mosberger that i Jamal> played around (about 9 months back) - it will need some hand Jamal> patching for the latest driver. Similar approach: prefetch Jamal> skb->data,twiddle twiddle not little star, touch header. Jamal> I found the aggressive mode effective on a xeon but i belive Jamal> David is using this on x86_64. So Lennert, I lied to you Jamal> saying it was never effective on x86. You just have to do the Jamal> right juju such as factoring in the memory load-latency and Jamal> how much cache you have on your specific CPU. CCing davidm Jamal> (in addition To: davem of course ;->) so he may provide more Jamal> insight on his tests. I didn't remember what experiments I did, but I found the original mail, with all the data. The experiments were done on ia64 (naturally ;-). Enjoy, --david --- From: David Mosberger To: hadi@cyberus.ca Cc: Alexey , "David S. Miller" , Robert Olsson , Lennert Buytenhek , davidm@hpl.hp.com, eranian@linux.hpl.hp.com, grundler@parisc-linux.org Subject: Re: prefetch Date: Thu, 30 Sep 2004 06:51:29 -0700 Reply-To: davidm@hpl.hp.com X-URL: http://www.hpl.hp.com/personal/David_Mosberger/ >>>>> On 27 Sep 2004 11:08:00 -0400, jamal said: Jamal> one of the top abusers of cpu cycles in the netcode is Jamal> eth_type_trans() on x86 type hardware. This is where the Jamal> first time the skb->data is touched (hence a cache miss). Jamal> Clearly a good place to prefecth is in eth_type_trans itself Jamal> maybe right at the top you could prefetch skb->data or after Jamal> skb_pull() you could prefetch skb->mac.ethernet. Jamal> oprofile shows me the cycles being abused Jamal> (GLOBAL_POWER_EVENTS on xeon box) went down when i do either; Jamal> i cut down more cycles on doing skb->mac.ethernet that Jamal> skb->data - but thats a different topic. Jamal> My test is purely forwarding: packets come in through eth0, Jamal> get exercised by routing code and come out eth1. So the Jamal> important parameters for my test case are primarly throughput Jamal> and secondary is latency. Adding the prefetch above while Jamal> showing lower CPU cycles, results in decreeased throughput Jamal> numbers and higher latency numbers. What gives? Jamal> I am CCing the HP folks since they have some interesting Jamal> tools i heard David talk about at SUCON. I don't have a good setup to measure packet forwarding performance. However, prefetching skb->data certainly does reduce CPU utilization on ia64 as the measurements below show. I tried three versions: - original 2.6.9-rc3 (ORIGINAL) - 2.6.9-rc3 with a prefetch in e1000_clean_rx_irq (OBVIOUS) - 2.6.9-rc3 which prefetches the _next_ rx buffer (AGGRESSIVE) All 3 cases use an e1000 board with NAPI enabled. netperf results for 3 runs of ORIGINAL and AGGRESSIVE: ORIGINAL: $ netperf -l30 -c -C -H 192.168.10.15 -- -m1 -D TCP STREAM TEST to 192.168.10.15 : nodelay Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 1 30.00 1.59 99.93 10.94 5155.593 2257.461 87380 16384 1 30.00 1.62 99.87 11.19 5045.549 2260.294 87380 16384 1 30.00 1.62 99.89 11.29 5045.269 2281.327 AGGRESSIVE: $ netperf -l30 -c -C -H 192.168.10.15 -- -m1 -D TCP STREAM TEST to 192.168.10.15 : nodelay Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 1 30.00 1.62 99.98 10.51 5062.204 2128.695 87380 16384 1 30.00 1.62 99.99 10.51 5064.528 2128.940 87380 16384 1 30.00 1.62 99.98 10.67 5053.365 2156.333 As you can see, not much of a throughput difference (I'd not expect that, given the test...), but service demand on the receiver is down significantly. This is also confirmed with the following three profiles (collected with q-syscollect): ORIGINAL: % time self cumul calls self/call tot/call name 53.73 32.05 32.05 471k 68.1u 68.1u default_idle 4.59 2.74 34.79 12.0M 228n 259n eth_type_trans OBVIOUS: % time self cumul calls self/call tot/call name 55.72 33.25 33.25 469k 70.8u 70.8u default_idle 4.49 2.68 35.93 12.0M 222n 278n tcp_v4_rcv 2.84 1.70 37.63 473k 3.59u 32.6u e1000_clean 2.81 1.68 39.30 12.2M 137n 525n tcp_rcv_established 2.71 1.62 40.92 12.1M 134n 711n netif_receive_skb 2.39 1.43 42.34 12.0M 119n 148n eth_type_trans AGGRESSIVE: % time self cumul calls self/call tot/call name 57.51 34.34 34.34 395k 86.9u 86.9u default_idle 4.40 2.62 36.96 12.3M 214n 265n tcp_v4_rcv 3.12 1.86 38.82 455k 4.09u 31.3u e1000_clean 3.09 1.84 40.66 12.0M 154n 584n tcp_rcv_established 2.89 1.72 42.39 12.0M 144n 723n netif_receive_skb 1.94 1.16 43.55 918k 1.26u 1.26u _spin_unlock_irq 1.90 1.13 44.68 12.3M 92.4n 115n ip_route_input 1.87 1.11 45.79 12.6M 88.4n 89.6n kfree 1.87 1.11 46.91 12.1M 91.8n 572n ip_rcv 1.68 1.00 47.91 12.1M 82.4n 351n ip_local_deliver 1.21 0.72 48.63 12.6M 57.7n 58.9n __kmalloc 1.01 0.60 49.23 12.3M 48.8n 53.7n sba_unmap_single 1.00 0.59 49.83 12.0M 49.4n 81.0n eth_type_trans Comparing ORIGINAL and AGGRESSIVE, we see that the latter spends an additional 2.29 seconds in the idle-loop (default_idle), which corresponds closely to the 2.19 seconds savings we're seeing in eth_type_trans(), so the saving the prefetch achieves is real and not offset by extra costs in other places. The above also shows that the OBVIOUS prefetch is unable to cover the entire load-latency. Thus, I suspect it would really be best to use the AGGRESSIVE prefetching policy. If we were to do this, then the code at label next_desc could be simplified, since we already precomputed the next value of i/rx_desc as part of the prefetch. It would be interesting to know how (modern) x86 CPUs behave. If somebody wants to try this, I attached a patch below (setting AGGRESSIVE to 1 gives you the AGGRESSIVE version, seting it to 0 gives you the OBVIOUS version). Cheers, --david ===== drivers/net/e1000/e1000_main.c 1.134 vs edited ===== --- 1.134/drivers/net/e1000/e1000_main.c 2004-09-12 16:52:48 -07:00 +++ edited/drivers/net/e1000/e1000_main.c 2004-09-30 06:05:11 -07:00 @@ -2278,12 +2278,30 @@ uint8_t last_byte; unsigned int i; boolean_t cleaned = FALSE; +#define AGGRESSIVE 1 i = rx_ring->next_to_clean; +#if AGGRESSIVE + prefetch(rx_ring->buffer_info[i].skb->data); +#endif rx_desc = E1000_RX_DESC(*rx_ring, i); while(rx_desc->status & E1000_RXD_STAT_DD) { buffer_info = &rx_ring->buffer_info[i]; +# if AGGRESSIVE + { + struct e1000_rx_desc *next_rx; + unsigned int j = i + 1; + + if (j == rx_ring->count) + j = 0; + next_rx = E1000_RX_DESC(*rx_ring, j); + if (next_rx->status & E1000_RXD_STAT_DD) + prefetch(rx_ring->buffer_info[j].skb->data); + } +# else + prefetch(buffer_info->skb->data); +# endif #ifdef CONFIG_E1000_NAPI if(*work_done >= work_to_do) break; From pmeda@akamai.com Thu Jun 23 11:15:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 11:15:17 -0700 (PDT) Received: from smtp3.akamai.com (smtp3.akamai.com [63.116.109.25]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5NIFCH9008350 for ; Thu, 23 Jun 2005 11:15:15 -0700 Received: from smtp3.akamai.com (vwall1.sanmateo.corp.akamai.com [172.23.1.71]) by smtp3.akamai.com (8.12.10/8.12.10) with ESMTP id j5NIDpS1022252 for ; Thu, 23 Jun 2005 11:13:54 -0700 (PDT) Received: from akamai.com (allur.sanmateo.corp.akamai.com [172.23.11.58]) by smtp3.akamai.com (8.12.10/8.12.10) with ESMTP id j5NIDnB6022250; Thu, 23 Jun 2005 11:13:50 -0700 (PDT) Message-ID: <42BAFBD9.80B28D89@akamai.com> Date: Thu, 23 Jun 2005 11:13:45 -0700 From: Prasanna Meda X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.16-3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Herbert Xu CC: "David S. Miller" , jgarzik@pobox.com, akpm@osdl.org, netdev@oss.sgi.com Subject: Re: [patch] devinet: cleanup if statements References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2538 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pmeda@akamai.com Precedence: bulk X-list: netdev Content-Length: 396 Lines: 15 Herbert Xu wrote: > > > With your interpretation above his patch is correct as well. If > we want to use the label when it is different from the device name, > then it is equivalent to always use the label since the only time > we'd use the device name is when it's equal to the label :) Sounds correct to me. Actually I did not think about second interpretation in first. Thanks, Prasanna. From andre@tomt.net Thu Jun 23 13:33:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 13:33:22 -0700 (PDT) Received: from mx1.skjellin.no (mail1.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5NKXFH9018549 for ; Thu, 23 Jun 2005 13:33:20 -0700 Received: from localhost (localhost [127.0.0.1]) by mx1.skjellin.no (Postfix) with ESMTP id A8649885A2 for ; Thu, 23 Jun 2005 22:31:52 +0200 (CEST) Received: from puppen.pasop.tomt.net (gw-fe-1.pasop.tomt.net [10.255.1.1]) by mail1.skjellin.no (Postfix) with ESMTP id 5A93388597 for ; Thu, 23 Jun 2005 22:31:52 +0200 (CEST) Received: from [10.255.1.10] (slurv.pasop.tomt.net [10.255.1.10]) by puppen.pasop.tomt.net (Postfix) with ESMTP id 2687C22B96 for ; Thu, 23 Jun 2005 22:31:52 +0200 (CEST) Message-ID: <42BB1C3B.20707@tomt.net> Date: Thu, 23 Jun 2005 22:31:55 +0200 From: Andre Tomt User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: status of TSO in 2.6.12 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-archive-position: 2539 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev Content-Length: 251 Lines: 9 What is the status of TSO in 2.6.12? In 2.6.11 I used to disable it by default on production kernels as not beeing ready for prime time, still a wise thing to do? The TSO discussions kind-of overflowed for me eventually :-) -- Cheers, André Tomt From davem@davemloft.net Thu Jun 23 21:06:39 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 23 Jun 2005 21:06:46 -0700 (PDT) Received: from sunset.davemloft.net (dsl027-180-168.sfo1.dsl.speakeasy.net [216.27.180.168]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5O46cH9012654 for ; Thu, 23 Jun 2005 21:06:38 -0700 Received: from localhost ([127.0.0.1] ident=davem) by sunset.davemloft.net with esmtp (Exim 4.50) id 1DlfRF-0003eV-2L; Thu, 23 Jun 2005 21:05:09 -0700 Date: Thu, 23 Jun 2005 21:05:08 -0700 (PDT) Message-Id: <20050623.210508.10293826.davem@davemloft.net> To: tgraf@suug.ch CC: netdev@oss.sgi.com Subject: Kconfig NET_EMATCH_TEXT From: "David S. Miller" X-Mailer: Mew version 4.2 on Emacs 21.4 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2540 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 275 Lines: 8 Shouldn't there be some dependency between NET_EMATCH_TEXT and TEXTSEARCH so that the user gets the right thing to happen if he enables the former? For IPSEC we do this by making things like INET_ESP depend upon CRYPTO, CRYPTO_HMAC, CRYPTO_MD5, etc. see net/ipv4/Kconfig From ja@ssi.bg Fri Jun 24 01:39:05 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 24 Jun 2005 01:39:11 -0700 (PDT) Received: from u.domain.uli (ja.ssi.bg [217.79.71.194]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5O8csH9030420 for ; Fri, 24 Jun 2005 01:39:05 -0700 Received: from localhost (localhost [127.0.0.1]) by u.domain.uli (8.12.10/8.12.10) with ESMTP id j5O8kLOb002021; Fri, 24 Jun 2005 11:46:34 +0300 Date: Fri, 24 Jun 2005 11:46:21 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Neil Horman cc: linux-kernel , Wensong Zhang , akpm@osdl.org, netdev@oss.sgi.com Subject: Re: [Patch] ipvs: close race conditions on ip_vs_conn_tab list modification In-Reply-To: <20050623183926.GI16783@hmsendeavour.rdu.redhat.com> Message-ID: References: <20050623183926.GI16783@hmsendeavour.rdu.redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2541 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Content-Length: 2021 Lines: 60 Hello, adding netdev to CC On Thu, 23 Jun 2005, Neil Horman wrote: > Hello there- > Patch to close a race condition in ip_vs_conn_flush. In an smp system, > it is possible for an connection timer to expire, calling ip_vs_conn_expire > while the connection table is being flushed, before ct_write_lock_bh is > acquired. Since the list iterator loop in ip_vs_con_flush releases and > re-acquires the spinlock (even though it doesn't re-enable softirqs), it is > possible for the expiration function to modify the connection list, while it is > being traversed in ip_vs_conn_flush. The result is that the next pointer gets > set to NULL, and subsequently dereferenced, resulting in an oops. This patch > removes the lock release and re-aquisition from the loop, closing the race > window. Tested by myself, and those who origionally experienced the crash and > reported it to me, with successful results. > > Signed-off-by: Neil Horman > > ip_vs_conn.c | 2 -- > 1 files changed, 2 deletions(-) > > > --- linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c.orig 2005-06-23 13:11:00.910372471 -0400 > +++ linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c 2005-06-23 13:15:54.459852393 -0400 > @@ -840,7 +838,6 @@ > > list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) { > atomic_inc(&cp->refcnt); > - ct_write_unlock(idx); > > if ((ct = cp->control)) > atomic_inc(&ct->refcnt); > @@ -850,7 +847,6 @@ > IP_VS_DBG(4, "del conn template\n"); > ip_vs_conn_expire_now(ct); > } > - ct_write_lock(idx); > } > ct_write_unlock_bh(idx); > } Looks ok but can you test an extended version: - remove these atomic_inc for cp and ct and the corresponding __ip_vs_conn_put from ip_vs_conn_expire_now - do the same for ip_vs_random_dropentry, it looks wrong in the same way because it is not running anymore together with the connection expiration in same sltimer_handler Also, 2.4 needs the same changes, I hope you can continue? Regards -- Julian Anastasov From nhorman@redhat.com Fri Jun 24 07:50:03 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 24 Jun 2005 07:50:09 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5OEo2H9017574 for ; Fri, 24 Jun 2005 07:50:03 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5OEmW0D031575; Fri, 24 Jun 2005 10:48:32 -0400 Received: from hmsendeavour.rdu.redhat.com (hmsendeavour.rdu.redhat.com [172.16.57.156]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5OEmWu03091; Fri, 24 Jun 2005 10:48:32 -0400 Received: from hmsendeavour.rdu.redhat.com (localhost.localdomain [127.0.0.1]) by hmsendeavour.rdu.redhat.com (8.13.1/8.13.1) with ESMTP id j5OEmRg8023651; Fri, 24 Jun 2005 10:48:27 -0400 Received: (from nhorman@localhost) by hmsendeavour.rdu.redhat.com (8.13.1/8.13.1/Submit) id j5OEmMA7023650; Fri, 24 Jun 2005 10:48:22 -0400 Date: Fri, 24 Jun 2005 10:48:22 -0400 From: Neil Horman To: Julian Anastasov Cc: Neil Horman , linux-kernel , Wensong Zhang , akpm@osdl.org, netdev@oss.sgi.com Subject: Re: [Patch] ipvs: close race conditions on ip_vs_conn_tab list modification Message-ID: <20050624144822.GD21499@hmsendeavour.rdu.redhat.com> References: <20050623183926.GI16783@hmsendeavour.rdu.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-archive-position: 2542 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nhorman@redhat.com Precedence: bulk X-list: netdev Content-Length: 2292 Lines: 93 On Fri, Jun 24, 2005 at 11:46:21AM +0300, Julian Anastasov wrote: > > Hello, > > adding netdev to CC > > On Thu, 23 Jun 2005, Neil Horman wrote: > > Looks ok but can you test an extended version: > > - remove these atomic_inc for cp and ct and the corresponding > __ip_vs_conn_put from ip_vs_conn_expire_now > > - do the same for ip_vs_random_dropentry, it looks wrong in the same > way because it is not running anymore together with the connection > expiration in same sltimer_handler > > Also, 2.4 needs the same changes, I hope you can continue? > > Regards > > -- > Julian Anastasov No problem. New patch attached with the above corrections/enhancements made. I've tested them here, and had good results. Signed-off-by: Neil Horman ip_vs_conn.c | 15 --------------- 1 files changed, 15 deletions(-) --- linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c.orig 2005-06-23 13:11:00.000000000 -0400 +++ linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c 2005-06-24 08:57:30.000000000 -0400 @@ -548,7 +548,6 @@ { if (del_timer(&cp->timer)) mod_timer(&cp->timer, jiffies); - __ip_vs_conn_put(cp); } @@ -801,21 +800,12 @@ continue; } - /* - * Drop the entry, and drop its ct if not referenced - */ - atomic_inc(&cp->refcnt); - ct_write_unlock(hash); - - if ((ct = cp->control)) - atomic_inc(&ct->refcnt); IP_VS_DBG(4, "del connection\n"); ip_vs_conn_expire_now(cp); if (ct) { IP_VS_DBG(4, "del conn template\n"); ip_vs_conn_expire_now(ct); } - ct_write_lock(hash); } ct_write_unlock(hash); } @@ -839,18 +829,13 @@ ct_write_lock_bh(idx); list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) { - atomic_inc(&cp->refcnt); - ct_write_unlock(idx); - if ((ct = cp->control)) - atomic_inc(&ct->refcnt); IP_VS_DBG(4, "del connection\n"); ip_vs_conn_expire_now(cp); if (ct) { IP_VS_DBG(4, "del conn template\n"); ip_vs_conn_expire_now(ct); } - ct_write_lock(idx); } ct_write_unlock_bh(idx); } -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ From ja@ssi.bg Fri Jun 24 08:13:52 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 24 Jun 2005 08:13:56 -0700 (PDT) Received: from l.himel.bg ([213.91.247.3]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5OFDoH9019441 for ; Fri, 24 Jun 2005 08:13:51 -0700 Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.11.6/8.9.3) with ESMTP id j5OF9eE05931; Fri, 24 Jun 2005 18:09:40 +0300 Date: Fri, 24 Jun 2005 18:09:40 +0300 (EEST) From: Julian Anastasov X-X-Sender: ja@l To: Neil Horman cc: linux-kernel , Wensong Zhang , , Subject: Re: [Patch] ipvs: close race conditions on ip_vs_conn_tab list modification In-Reply-To: <20050624144822.GD21499@hmsendeavour.rdu.redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2543 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ja@ssi.bg Precedence: bulk X-list: netdev Content-Length: 271 Lines: 18 Hello, On Fri, 24 Jun 2005, Neil Horman wrote: > if (ct) { > IP_VS_DBG(4, "del conn template\n"); > ip_vs_conn_expire_now(ct); > } Don't forget to use cp->control instead of ct, ct is not needed anymore. Regards -- Julian Anastasov From didier@barvaux.org Fri Jun 24 08:49:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 24 Jun 2005 08:50:03 -0700 (PDT) Received: from mail.b2i-toulouse.com (mail.b2i-toulouse.com [195.115.64.51]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5OFnqH9024863 for ; Fri, 24 Jun 2005 08:49:57 -0700 Received: from catherine.b2i-toulouse.com ([172.20.0.127]) by mail.b2i-toulouse.com (8.12.8/8.12.8) with SMTP id j5OFmbu7030671 for ; Fri, 24 Jun 2005 17:48:40 +0200 Date: Fri, 24 Jun 2005 17:48:20 +0200 From: Didier Barvaux To: netdev@oss.sgi.com Subject: IPv6 and QoS Message-Id: <20050624174820.74807f96.didier@barvaux.org> X-Mailer: Sylpheed version 1.0.0 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2544 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: didier@barvaux.org Precedence: bulk X-list: netdev Content-Length: 288 Lines: 14 Hello, I would like to know if it's possible to do some QoS with IPv6. The LARTC howto says no [1], but perhaps the situation has evolved since the writing of this page. What is the current situation ? Regards, Didier Barvaux [1] http://lartc.org/howto/lartc.adv-filter.ipv6.html From nhorman@redhat.com Fri Jun 24 10:42:31 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 24 Jun 2005 10:42:36 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com [66.187.233.31]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5OHgVH9029734 for ; Fri, 24 Jun 2005 10:42:31 -0700 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j5OHexaW024642; Fri, 24 Jun 2005 13:40:59 -0400 Received: from hmsendeavour.rdu.redhat.com (hmsendeavour.rdu.redhat.com [172.16.57.156]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j5OHexu26393; Fri, 24 Jun 2005 13:40:59 -0400 Received: from hmsendeavour.rdu.redhat.com (localhost.localdomain [127.0.0.1]) by hmsendeavour.rdu.redhat.com (8.13.1/8.13.1) with ESMTP id j5OHexlL023875; Fri, 24 Jun 2005 13:40:59 -0400 Received: (from nhorman@localhost) by hmsendeavour.rdu.redhat.com (8.13.1/8.13.1/Submit) id j5OHesd1023874; Fri, 24 Jun 2005 13:40:54 -0400 Date: Fri, 24 Jun 2005 13:40:54 -0400 From: Neil Horman To: Julian Anastasov Cc: Neil Horman , linux-kernel , Wensong Zhang , akpm@osdl.org, netdev@oss.sgi.com, davem@davemloft.net Subject: Re: [Patch] ipvs: close race conditions on ip_vs_conn_tab list modification Message-ID: <20050624174054.GE21499@hmsendeavour.rdu.redhat.com> References: <20050624144822.GD21499@hmsendeavour.rdu.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-archive-position: 2545 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: nhorman@redhat.com Precedence: bulk X-list: netdev Content-Length: 2407 Lines: 106 On Fri, Jun 24, 2005 at 06:09:40PM +0300, Julian Anastasov wrote: > > Hello, > > On Fri, 24 Jun 2005, Neil Horman wrote: > > > if (ct) { > > IP_VS_DBG(4, "del conn template\n"); > > ip_vs_conn_expire_now(ct); > > } > > Don't forget to use cp->control instead of ct, ct is not needed > anymore. > > Regards > > -- > Julian Anastasov > Good catch. Sorry, should have seen that earlier. New patch attached with corrections. When you're comfortable with this, I'll post the 2.4 version of the patch. Regards Neil Signed-off-by: Neil Horman ip_vs_conn.c | 24 ++++-------------------- 1 files changed, 4 insertions(+), 20 deletions(-) --- linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c.orig 2005-06-23 13:11:00.000000000 -0400 +++ linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c 2005-06-24 13:33:03.000000000 -0400 @@ -548,7 +548,6 @@ { if (del_timer(&cp->timer)) mod_timer(&cp->timer, jiffies); - __ip_vs_conn_put(cp); } @@ -801,21 +800,12 @@ continue; } - /* - * Drop the entry, and drop its ct if not referenced - */ - atomic_inc(&cp->refcnt); - ct_write_unlock(hash); - - if ((ct = cp->control)) - atomic_inc(&ct->refcnt); IP_VS_DBG(4, "del connection\n"); ip_vs_conn_expire_now(cp); - if (ct) { + if (cp->control) { IP_VS_DBG(4, "del conn template\n"); - ip_vs_conn_expire_now(ct); + ip_vs_conn_expire_now(cp->control); } - ct_write_lock(hash); } ct_write_unlock(hash); } @@ -829,7 +819,6 @@ { int idx; struct ip_vs_conn *cp; - struct ip_vs_conn *ct; flush_again: for (idx=0; idxrefcnt); - ct_write_unlock(idx); - if ((ct = cp->control)) - atomic_inc(&ct->refcnt); IP_VS_DBG(4, "del connection\n"); ip_vs_conn_expire_now(cp); - if (ct) { + if (cp->control) { IP_VS_DBG(4, "del conn template\n"); - ip_vs_conn_expire_now(ct); + ip_vs_conn_expire_now(cp->control); } - ct_write_lock(idx); } ct_write_unlock_bh(idx); } -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ From mbizon@freebox.fr Fri Jun 24 11:54:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 24 Jun 2005 11:54:46 -0700 (PDT) Received: from sakura.staff.proxad.net (sakura.staff.proxad.net [213.228.1.107]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5OIscH9001290 for ; Fri, 24 Jun 2005 11:54:42 -0700 Received: from localhost ([127.0.0.1]) by sakura.staff.proxad.net with esmtp (Exim 3.36 #1 (Debian)) id 1DltIe-0003Dt-00 for ; Fri, 24 Jun 2005 20:53:12 +0200 Subject: [PATCH] ipconfig.c: fix dhcp timeout behaviour From: Maxime Bizon To: netdev@oss.sgi.com Content-Type: text/plain Date: Fri, 24 Jun 2005 20:53:09 +0200 Message-Id: <1119639189.14765.154.camel@sakura.staff.proxad.net> Mime-Version: 1.0 X-Mailer: Evolution 2.2.2 Content-Transfer-Encoding: 7bit X-archive-position: 2546 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mbizon@freebox.fr Precedence: bulk X-list: netdev Content-Length: 953 Lines: 37 Hello, I think there is a small bug in ipconfig.c in case IPCONFIG_DHCP is set and dhcp is used. When a DHCPOFFER is received, ip address is kept until we get DHCPACK. If no ack is received, ic_dynamic() returns negatively, but leaves the offered ip address in ic_myaddr. This makes the main loop in ip_auto_config() break and uses the maybe incomplete configuration. Not sure if it's the best way to do, but the following trivial patch correct this. Signed-off-by: Maxime Bizon --- linux-2.6.12.1/net/ipv4/ipconfig.c.orig 2005-06-22 21:33:05.000000000 +0200 +++ linux-2.6.12.1/net/ipv4/ipconfig.c 2005-06-24 17:55:11.000000000 +0200 @@ -1149,8 +1149,10 @@ static int __init ic_dynamic(void) ic_rarp_cleanup(); #endif - if (!ic_got_reply) + if (!ic_got_reply) { + ic_myaddr = INADDR_NONE; return -1; + } printk("IP-Config: Got %s answer from %u.%u.%u.%u, ", ((ic_got_reply & IC_RARP) ? "RARP" -- Maxime From res0d26i@verizon.net Fri Jun 24 19:50:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Fri, 24 Jun 2005 19:50:55 -0700 (PDT) Received: from vms048pub.verizon.net (vms048pub.verizon.net [206.46.252.48]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5P2oqH9025823 for ; Fri, 24 Jun 2005 19:50:53 -0700 Received: from [192.168.0.2] ([71.104.112.195]) by vms048.mailsrvcs.net (Sun Java System Messaging Server 6.2 HotFix 0.04 (built Dec 24 2004)) with ESMTPA id <0IIM00CLRD6BC265@vms048.mailsrvcs.net> for netdev@oss.sgi.com; Fri, 24 Jun 2005 21:49:24 -0500 (CDT) Date: Fri, 24 Jun 2005 19:49:25 -0700 From: Damian Subject: sis900 and fedora troubles To: netdev@oss.sgi.com Reply-to: res0d26i@verizon.net Message-id: <1119667765.8496.6.camel@localhost.localdomain> Organization: Lantron, Inc. MIME-version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-type: text/plain Content-transfer-encoding: 7bit X-archive-position: 2547 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: res0d26i@verizon.net Precedence: bulk X-list: netdev Content-Length: 799 Lines: 22 1) I'm having trouble with the sis900 chipset on an Acer Aspire 3000 Laptop using Fedora Core 4, which I believe uses 2.6.11. The NIC is detected perfeclty by fedora but it fails to initialize. When I set it for dhcp the dhcp request simply times out. I have tried this laptop with ubuntu linux and it works perfectly out of the box so I cannot explain why it wouldn't work in fedora. 2) Error: Determining IP information for eth0... failed. (redhat gui network config) Error: eth0: Transmit timeout, status 00000005 000002 (/var/log/messages) 3) dmesg: card is detected. I am so very sorry I can't give you more output, but the laptop is virtually connectionless. All I can tell you is that Ubuntu ran the card fine but Fedora for some reason doesn't. Any info is highly appreciated. Damian From olel@ans.pl Sat Jun 25 16:25:46 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 25 Jun 2005 16:25:48 -0700 (PDT) Received: from bizon.gios.gov.pl (bizon.gios.gov.pl [212.244.124.8]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5PNPiH9003245 for ; Sat, 25 Jun 2005 16:25:45 -0700 Received: from bizon.gios.gov.pl (olel@localhost6 [IPv6:::1]) by bizon.gios.gov.pl (8.13.4/8.13.4) with ESMTP id j5PNOFLQ017057 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 26 Jun 2005 01:24:17 +0200 Received: from localhost (olel@localhost) by bizon.gios.gov.pl (8.13.4/8.13.4/Submit) with ESMTP id j5PNOEK0017054 for ; Sun, 26 Jun 2005 01:24:15 +0200 X-Authentication-Warning: bizon.gios.gov.pl: olel owned process doing -bs Date: Sun, 26 Jun 2005 01:24:14 +0200 (CEST) From: Krzysztof Oledzki X-X-Sender: olel@bizon.gios.gov.pl To: netdev@oss.sgi.com Subject: BCM5721(tg3)&MSI Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-187430788-1833883756-1119741854=:16847" X-archive-position: 2548 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: olel@ans.pl Precedence: bulk X-list: netdev Content-Length: 2343 Lines: 61 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---187430788-1833883756-1119741854=:16847 Content-Type: TEXT/PLAIN; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Hello, Which BCM chips support MSI? I have a mainbord with two BCM5721, it seems= =20 both support MSI (Message Signalled Interrupts entry in Capabilities) but= =20 with 2.6.12.1 tg3 driver does not enable it. Why? 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit= Ethernet PCI Express (rev 11) Subsystem: Super Micro Computer Inc: Unknown device 02c6 Flags: bus master, fast devsel, latency 0, IRQ 201 Memory at d0100000 (64-bit, non-prefetchable) [size=3D64K] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=3D0/= 3 Enable- Capabilities: [d0] #10 [0001] 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit= Ethernet PCI Express (rev 11) Subsystem: Super Micro Computer Inc: Unknown device 02c6 Flags: bus master, fast devsel, latency 0, IRQ 217 Memory at d0200000 (64-bit, non-prefetchable) [size=3D64K] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=3D0/= 3 Enable- Capabilities: [d0] #10 [0001] # cat /proc/interrupts |grep eth 201: 1569474 0 IO-APIC-level uhci_hcd:usb4, eth0 217: 1403373 0 IO-APIC-level libata, uhci_hcd:usb3, eth1 eth0: Tigon3 [partno(BCM95721) rev 4101 PHY(5750)] (PCIX:100MHz:32-bit) 10/= 100/1000BaseT Ethernet 00:30:48:81:60:8e eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap= [1] eth0: dma_rwctrl[76180000] eth1: Tigon3 [partno(BCM95721) rev 4101 PHY(5750)] (PCIX:100MHz:32-bit) 10/= 100/1000BaseT Ethernet 00:30:48:81:60:8f eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap= [1] eth1: dma_rwctrl[76180000] # zgrep MSI /proc/config.gz CONFIG_PCI_MSI=3Dy Best regards, =09=09=09=09Krzysztof Ol=EAdzki ---187430788-1833883756-1119741854=:16847-- From jgarzik@pobox.com Sat Jun 25 17:49:13 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 25 Jun 2005 17:49:16 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5Q0nDH9006763 for ; Sat, 25 Jun 2005 17:49:13 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DmLJH-0007fL-Au; Sun, 26 Jun 2005 00:47:44 +0000 Message-ID: <42BDFB2B.5000103@pobox.com> Date: Sat, 25 Jun 2005 20:47:39 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Krzysztof Oledzki CC: netdev@oss.sgi.com Subject: Re: BCM5721(tg3)&MSI References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2549 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 300 Lines: 13 Krzysztof Oledzki wrote: > Hello, > > Which BCM chips support MSI? I have a mainbord with two BCM5721, it > seems both support MSI (Message Signalled Interrupts entry in > Capabilities) but with 2.6.12.1 tg3 driver does not enable it. Why? Most likely, your system doesn't support MSI? Jeff From mchan@broadcom.com Sat Jun 25 21:50:40 2005 Received: with ECARTIS (v1.0.0; list netdev); Sat, 25 Jun 2005 21:50:42 -0700 (PDT) Received: from MMS2.broadcom.com (mms2.broadcom.com [216.31.210.18]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5Q4odH9020881 for ; Sat, 25 Jun 2005 21:50:40 -0700 Received: from 10.10.64.121 by MMS2.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Sat, 25 Jun 2005 21:49:06 -0700 X-Server-Uuid: 1F20ACF3-9CAF-44F7-AB47-F294E2D5B4EA Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Sat, 25 Jun 2005 21:48:54 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BGD59972; Sat, 25 Jun 2005 21:48:52 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id VAA24047; Sat, 25 Jun 2005 21:48:52 -0700 (PDT) X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Subject: Re: BCM5721(tg3)&MSI Date: Sat, 25 Jun 2005 21:48:51 -0700 Message-ID: Thread-Topic: BCM5721(tg3)&MSI Thread-Index: AcV53hLdTuknS+hbTXSCgCMtErh+LQAK36jA From: "Michael Chan" To: "Krzysztof Oledzki" , netdev@oss.sgi.com X-WSS-ID: 6EA0EC482783874311-01-01 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5Q4odH9020881 X-archive-position: 2550 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 454 Lines: 15 Krzysztof Oledzki wrote: > Which BCM chips support MSI? I have a mainbord with two > BCM5721, it seems > both support MSI (Message Signalled Interrupts entry in > Capabilities) but > with 2.6.12.1 tg3 driver does not enable it. Why? > > 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme > BCM5721 Gigabit Ethernet PCI Express (rev 11) MSI is only enabled on 5721 C0 and newer chips that have fully working MSI. You are using 5721 B1. From romieu@fr.zoreil.com Sun Jun 26 01:52:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 01:52:39 -0700 (PDT) Received: from fr.zoreil.com (electric-eye.fr.zoreil.com [213.41.134.224]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5Q8qZH9002182 for ; Sun, 26 Jun 2005 01:52:36 -0700 Received: from electric-eye.fr.zoreil.com (localhost.localdomain [127.0.0.1]) by fr.zoreil.com (8.13.4/8.12.1) with ESMTP id j5Q8opWf022433; Sun, 26 Jun 2005 10:50:51 +0200 Received: (from romieu@localhost) by electric-eye.fr.zoreil.com (8.13.4/8.12.1) id j5Q8oouJ022432; Sun, 26 Jun 2005 10:50:50 +0200 Date: Sun, 26 Jun 2005 10:50:50 +0200 From: Francois Romieu To: Damian Cc: netdev@oss.sgi.com Subject: Re: sis900 and fedora troubles Message-ID: <20050626085050.GA22215@electric-eye.fr.zoreil.com> References: <1119667765.8496.6.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1119667765.8496.6.camel@localhost.localdomain> User-Agent: Mutt/1.4.2.1i X-Organisation: Land of Sunshine Inc. X-archive-position: 2551 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: romieu@fr.zoreil.com Precedence: bulk X-list: netdev Content-Length: 653 Lines: 16 Damian : [...] > I am so very sorry I can't give you more output, but the laptop is > virtually connectionless. All I can tell you is that Ubuntu ran the card > fine but Fedora for some reason doesn't. Any info is highly appreciated. The exact revisions for both kernels are needed as a start. The complete dmesg would be welcome as well (if it gets truncated, the missing bits are probably stored in some distro-specific localtion, see /var/log for instance). So far it is not even clear if it should be bugzilled in Fedora or in Ubuntu (anyway, if changes can not be tested due to the lack of connection... :o/ ). -- Ueimor From olel@ans.pl Sun Jun 26 03:30:48 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 03:30:52 -0700 (PDT) Received: from bizon.gios.gov.pl (bizon.gios.gov.pl [212.244.124.8]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5QAUkH9006409 for ; Sun, 26 Jun 2005 03:30:47 -0700 Received: from bizon.gios.gov.pl (olel@localhost6 [IPv6:::1]) by bizon.gios.gov.pl (8.13.4/8.13.4) with ESMTP id j5QATGdd002843 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 26 Jun 2005 12:29:18 +0200 Received: from localhost (olel@localhost) by bizon.gios.gov.pl (8.13.4/8.13.4/Submit) with ESMTP id j5QATFDe002840; Sun, 26 Jun 2005 12:29:16 +0200 X-Authentication-Warning: bizon.gios.gov.pl: olel owned process doing -bs Date: Sun, 26 Jun 2005 12:29:15 +0200 (CEST) From: Krzysztof Oledzki X-X-Sender: olel@bizon.gios.gov.pl To: Michael Chan cc: netdev@oss.sgi.com Subject: Re: BCM5721(tg3)&MSI In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-187430788-1989374933-1119781755=:2421" X-archive-position: 2552 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: olel@ans.pl Precedence: bulk X-list: netdev Content-Length: 947 Lines: 30 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---187430788-1989374933-1119781755=:2421 Content-Type: TEXT/PLAIN; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sat, 25 Jun 2005, Michael Chan wrote: > > Krzysztof Oledzki wrote: >> Which BCM chips support MSI? I have a mainbord with two >> BCM5721, it seems >> both support MSI (Message Signalled Interrupts entry in >> Capabilities) but >> with 2.6.12.1 tg3 driver does not enable it. Why? >> >> 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme >> BCM5721 Gigabit Ethernet PCI Express (rev 11) > > MSI is only enabled on 5721 C0 and newer chips that have fully > working MSI. You are using 5721 B1. Oh, I see... BTW: What is wrong with AX/BX revisions? Best regards, =09=09=09Krzysztof Ol=EAdzki ---187430788-1989374933-1119781755=:2421-- From venza@brownhat.org Sun Jun 26 04:32:10 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 04:32:14 -0700 (PDT) Received: from renditai.milesteg.arr (adsl-70-225.38-151.net24.it [151.38.225.70]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id j5QBW4H9013279 for ; Sun, 26 Jun 2005 04:32:10 -0700 Received: (qmail 7446 invoked from network); 26 Jun 2005 13:30:36 +0200 Received: from unknown (HELO ?192.168.0.205?) (192.168.0.205) by renditai.milesteg.arr with SMTP; 26 Jun 2005 13:30:36 +0200 In-Reply-To: <20050626085050.GA22215@electric-eye.fr.zoreil.com> References: <1119667765.8496.6.camel@localhost.localdomain> <20050626085050.GA22215@electric-eye.fr.zoreil.com> Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <5FFAEBAA-3C89-4F61-AE59-2E8D7ED42933@brownhat.org> Cc: Damian , netdev@oss.sgi.com Content-Transfer-Encoding: 7bit From: Daniele Venzano Subject: Re: sis900 and fedora troubles Date: Sun, 26 Jun 2005 13:30:35 +0200 To: Francois Romieu X-Mailer: Apple Mail (2.730) X-archive-position: 2553 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: venza@brownhat.org Precedence: bulk X-list: netdev Content-Length: 777 Lines: 23 Il giorno 26/giu/05, alle ore 10:50, Francois Romieu ha scritto: > The exact revisions for both kernels are needed as a start. The > complete > dmesg would be welcome as well (if it gets truncated, the missing bits > are probably stored in some distro-specific localtion, see /var/log > for > instance). > > So far it is not even clear if it should be bugzilled in Fedora or > in Ubuntu > (anyway, if changes can not be tested due to the lack of > connection... :o/ ). I asked Damian (but forgot to CC the list) to check the sis900 driver version strings printed by the two configurations. I fear that Fedora is still using an old version of sis900, as I remember to have fixed a similar issue on the same hardware. -- Daniele Venzano http://www.brownhat.org From herbert@gondor.apana.org.au Sun Jun 26 05:57:55 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 05:58:01 -0700 (PDT) Received: from arnor.apana.org.au (arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5QCvrH9017203 for ; Sun, 26 Jun 2005 05:57:54 -0700 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1DmWgQ-0008H5-00; Sun, 26 Jun 2005 22:56:22 +1000 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1DmWgN-0003Gm-00; Sun, 26 Jun 2005 22:56:19 +1000 Date: Sun, 26 Jun 2005 22:56:19 +1000 To: Arthur Kepner Cc: netdev@oss.sgi.com, Rick Jones Subject: Re: [RFC/PATCH] "safer ipv4 reassembly" (fwd) Message-ID: <20050626125619.GA31967@gondor.apana.org.au> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i From: Herbert Xu X-archive-position: 2554 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 2098 Lines: 58 On Thu, Jun 23, 2005 at 09:33:35AM -0700, Arthur Kepner wrote: > > What with the recent migration to vger.kernel.org, I'm > forwarding this to oss.sgi.com, just in case any interested > parties missed it. Thanks for writing this patch Arthur. > +struct ipc { > + struct hlist_node node; > + u32 saddr; > + u32 daddr; > + u8 protocol; > + atomic_t refcnt; /* how many ipqs hold refs to us */ > + atomic_t seq; /* how many ip datagrams for this > + * (saddr,daddr,protocol) since we > + * were created */ > + struct timer_list timer; > + struct rcu_head rcu; Is RCU worth it here? The only time we'd be taking the locks on this is when the first fragment of a packet comes in. At that point we'll be taking write_lock(&ipfrag_lock) anyway. The only other use of RCU in your patch is ip_count. That should be changed to be done in ip_defrag instead. At that point you can simply find the ipc by deferencing ipq, so no need for __ipc_find and hence RCU. The reason you need to change it in this way is because you can't make assumptions about ip_rcv_finish being the first place where a packet is defragmented. With connection tracking enabled conntrack is the first place where defragmentation occurs. > +#define IPC_HASHSZ IPQ_HASHSZ > +static struct { > + struct hlist_head head; > + spinlock_t lock; > +} ipc_hash[IPC_HASHSZ]; I'd store ipc entries in the main ipq hash table since they can use the same keys for lookup as ipq entries. You just need to set protocol to zero and map the user to values specific to ipc for ipc entries. One mapping would be to set the top bit of user for ipc entries, e.g. #define IP_DEFRAG_IPC 0x80000000 ipc->user = ipq->user | IP_DEFRAG_IPC; Of course you also need to make sure that the two structures share the leading elements. You can then use the user field to distinguish between ipc/ipq entries. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From wensong@linux-vs.org Sun Jun 26 10:09:58 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 10:10:06 -0700 (PDT) Received: from dragon.linux-vs.org ([202.109.113.90]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5QH9uH9031394 for ; Sun, 26 Jun 2005 10:09:58 -0700 Received: from localhost (localhost.localdomain [127.0.0.1]) by dragon.linux-vs.org (Postfix) with ESMTP id F2EDB10996; Mon, 27 Jun 2005 01:08:28 +0800 (CST) Received: from dragon.linux-vs.org ([127.0.0.1]) by localhost (dragon.linux-vs.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 16705-07; Mon, 27 Jun 2005 01:08:28 +0800 (CST) Received: from penguin.linux-vs.org (unknown [61.149.153.46]) by dragon.linux-vs.org (Postfix) with ESMTP id 8DC0B1005D; Mon, 27 Jun 2005 01:08:28 +0800 (CST) Received: by penguin.linux-vs.org (Postfix, from userid 500) id 2167930B4F; Mon, 27 Jun 2005 01:05:55 +0800 (CST) Received: from localhost (localhost [127.0.0.1]) by penguin.linux-vs.org (Postfix) with ESMTP id 1BA5330B4E; Mon, 27 Jun 2005 01:05:55 +0800 (CST) Date: Mon, 27 Jun 2005 01:05:55 +0800 (CST) From: Wensong Zhang To: Neil Horman Cc: Julian Anastasov , linux-kernel , akpm@osdl.org, netdev@oss.sgi.com, davem@davemloft.net Subject: Re: [Patch] ipvs: close race conditions on ip_vs_conn_tab list modification In-Reply-To: <20050624174054.GE21499@hmsendeavour.rdu.redhat.com> Message-ID: References: <20050624144822.GD21499@hmsendeavour.rdu.redhat.com> <20050624174054.GE21499@hmsendeavour.rdu.redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-archive-position: 2555 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: wensong@linux-vs.org Precedence: bulk X-list: netdev Content-Length: 2681 Lines: 119 Hello Neil, It's a good fix. Please continue on kernel 2.4 version of your patch. Thanks, Wensong On Fri, 24 Jun 2005, Neil Horman wrote: > On Fri, Jun 24, 2005 at 06:09:40PM +0300, Julian Anastasov wrote: >> >> Hello, >> >> On Fri, 24 Jun 2005, Neil Horman wrote: >> >>> if (ct) { >>> IP_VS_DBG(4, "del conn template\n"); >>> ip_vs_conn_expire_now(ct); >>> } >> >> Don't forget to use cp->control instead of ct, ct is not needed >> anymore. >> >> Regards >> >> -- >> Julian Anastasov >> > > > Good catch. Sorry, should have seen that earlier. New patch attached with > corrections. When you're comfortable with this, I'll post the 2.4 version of > the patch. > > Regards > Neil > > Signed-off-by: Neil Horman > > ip_vs_conn.c | 24 ++++-------------------- > 1 files changed, 4 insertions(+), 20 deletions(-) > > > --- linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c.orig 2005-06-23 13:11:00.000000000 -0400 > +++ linux-2.6.git/net/ipv4/ipvs/ip_vs_conn.c 2005-06-24 13:33:03.000000000 -0400 > @@ -548,7 +548,6 @@ > { > if (del_timer(&cp->timer)) > mod_timer(&cp->timer, jiffies); > - __ip_vs_conn_put(cp); > } > > > @@ -801,21 +800,12 @@ > continue; > } > > - /* > - * Drop the entry, and drop its ct if not referenced > - */ > - atomic_inc(&cp->refcnt); > - ct_write_unlock(hash); > - > - if ((ct = cp->control)) > - atomic_inc(&ct->refcnt); > IP_VS_DBG(4, "del connection\n"); > ip_vs_conn_expire_now(cp); > - if (ct) { > + if (cp->control) { > IP_VS_DBG(4, "del conn template\n"); > - ip_vs_conn_expire_now(ct); > + ip_vs_conn_expire_now(cp->control); > } > - ct_write_lock(hash); > } > ct_write_unlock(hash); > } > @@ -829,7 +819,6 @@ > { > int idx; > struct ip_vs_conn *cp; > - struct ip_vs_conn *ct; > > flush_again: > for (idx=0; idx @@ -839,18 +828,13 @@ > ct_write_lock_bh(idx); > > list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) { > - atomic_inc(&cp->refcnt); > - ct_write_unlock(idx); > > - if ((ct = cp->control)) > - atomic_inc(&ct->refcnt); > IP_VS_DBG(4, "del connection\n"); > ip_vs_conn_expire_now(cp); > - if (ct) { > + if (cp->control) { > IP_VS_DBG(4, "del conn template\n"); > - ip_vs_conn_expire_now(ct); > + ip_vs_conn_expire_now(cp->control); > } > - ct_write_lock(idx); > } > ct_write_unlock_bh(idx); > } > -- > /*************************************************** > *Neil Horman > *Software Engineer > *Red Hat, Inc. > *nhorman@redhat.com > *gpg keyid: 1024D / 0x92A74FA1 > *http://pgp.mit.edu > ***************************************************/ > From jgarzik@pobox.com Sun Jun 26 15:00:42 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 15:00:46 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5QM0gH9014968 for ; Sun, 26 Jun 2005 15:00:42 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1Dmf9j-0006xo-Ra; Sun, 26 Jun 2005 21:59:13 +0000 Message-ID: <42BF252B.4040605@pobox.com> Date: Sun, 26 Jun 2005 17:59:07 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kumar Gala CC: linuxppc-embedded@ozlabs.org, netdev@oss.sgi.com Subject: Re: [PATCH] gianfar: Update Marvell PHY name References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2556 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 9 Lines: 2 applied From jgarzik@pobox.com Sun Jun 26 15:28:23 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 15:28:30 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5QMSNH9016449 for ; Sun, 26 Jun 2005 15:28:23 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DmfaV-0006yV-RM; Sun, 26 Jun 2005 22:26:53 +0000 Message-ID: <42BF2BA9.8060502@pobox.com> Date: Sun, 26 Jun 2005 18:26:49 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Adrian Bunk CC: Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [2.6 patch] drivers/net/hamradio/: cleanups References: <20050502014637.GQ3592@stusta.de> In-Reply-To: <20050502014637.GQ3592@stusta.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2557 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 170 Lines: 9 Adrian Bunk wrote: > This patch contains the following cleanups: > - dmascc.c: remove the unused global function dmascc_setup Better to use it, then remove it. Jeff From rdunlap@xenotime.net Sun Jun 26 15:55:07 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 15:55:09 -0700 (PDT) Received: from chretien.genwebhost.com (chretien.genwebhost.com [209.59.175.22]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5QMt6H9017760 for ; Sun, 26 Jun 2005 15:55:07 -0700 Received: from pool-71-111-147-75.ptldor.dsl-w.verizon.net ([71.111.147.75]:58073 helo=midway.verizon.net) by chretien.genwebhost.com with esmtpa (Exim 4.51) id 1Dmg0K-0004ys-JZ; Sun, 26 Jun 2005 18:53:32 -0400 Date: Sun, 26 Jun 2005 15:53:18 -0700 From: randy_dunlap To: Jeff Garzik Cc: bunk@stusta.de, akpm@osdl.org, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [2.6 patch] drivers/net/hamradio/: cleanups Message-Id: <20050626155318.7f065d5b.rdunlap@xenotime.net> In-Reply-To: <42BF2BA9.8060502@pobox.com> References: <20050502014637.GQ3592@stusta.de> <42BF2BA9.8060502@pobox.com> Organization: YPO4 X-Mailer: Sylpheed version 1.0.5 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Antivirus-Scanner: Clean mail though you should still use an Antivirus X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - chretien.genwebhost.com X-AntiAbuse: Original Domain - oss.sgi.com X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - xenotime.net X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2558 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rdunlap@xenotime.net Precedence: bulk X-list: netdev Content-Length: 267 Lines: 12 On Sun, 26 Jun 2005 18:26:49 -0400 Jeff Garzik wrote: | Adrian Bunk wrote: | > This patch contains the following cleanups: | > - dmascc.c: remove the unused global function dmascc_setup | | Better to use it, then remove it. than ?? --- ~Randy From jgarzik@pobox.com Sun Jun 26 16:01:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 16:01:55 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5QN1rH9018475 for ; Sun, 26 Jun 2005 16:01:53 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1Dmg6q-0006zS-WB; Sun, 26 Jun 2005 23:00:18 +0000 Message-ID: <42BF337D.1050904@pobox.com> Date: Sun, 26 Jun 2005 19:00:13 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: randy_dunlap CC: bunk@stusta.de, akpm@osdl.org, linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [2.6 patch] drivers/net/hamradio/: cleanups References: <20050502014637.GQ3592@stusta.de> <42BF2BA9.8060502@pobox.com> <20050626155318.7f065d5b.rdunlap@xenotime.net> In-Reply-To: <20050626155318.7f065d5b.rdunlap@xenotime.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2559 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 343 Lines: 17 randy_dunlap wrote: > On Sun, 26 Jun 2005 18:26:49 -0400 Jeff Garzik wrote: > > | Adrian Bunk wrote: > | > This patch contains the following cleanups: > | > - dmascc.c: remove the unused global function dmascc_setup > | > | Better to use it, then remove it. > > than ?? Yes. Use it via __setup() or similar. Jeff From jgarzik@pobox.com Sun Jun 26 21:39:11 2005 Received: with ECARTIS (v1.0.0; list netdev); Sun, 26 Jun 2005 21:39:20 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5R4dAH9008973 for ; Sun, 26 Jun 2005 21:39:11 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1DmlNR-0007D1-0E; Mon, 27 Jun 2005 04:37:45 +0000 Message-ID: <42BF8297.9050503@pobox.com> Date: Mon, 27 Jun 2005 00:37:43 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Malli Chilakala CC: netdev Subject: Re: [PATCH net-drivers-2.6 0/9] ixgb: driver update References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2560 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 760 Lines: 20 Malli Chilakala wrote: > ixgb: driver update > > Signed-off-by: Mallikarjuna R Chilakala > Signed-off-by: Ganesh Venkatesan > Signed-off-by: John Ronciak > > 1. Set RXDCTL:PTHRESH/HTHRESH to zero > 2. Fix unnecessary link state messages > 3. Use netdev_priv() instead of netdev->priv > 4. Fix Broadcast/Multicast packets received statistics > 5. Fix data output by ethtool -d > 6. Ethtool cleanup patch from Stephen Hemminger > 7. Remove unused functions, render some variable static instead of global > 8. Redefined buffer_info-dma to be dma_addr_t instead of uint64 > 9. Driver version & white space fixes also, please take Francois' comments into account... From mchan@broadcom.com Mon Jun 27 00:31:02 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 27 Jun 2005 00:31:04 -0700 (PDT) Received: from MMS1.broadcom.com (mms1.broadcom.com [216.31.210.17]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5R7V2H9023485 for ; Mon, 27 Jun 2005 00:31:02 -0700 Received: from 10.10.64.121 by MMS1.broadcom.com with SMTP (Broadcom SMTP Relay (Email Firewall v6.1.0)); Mon, 27 Jun 2005 00:29:48 -0700 X-Server-Uuid: 146C3151-C1DE-4F71-9D02-C3BE503878DD Received: from mail-irva-8.broadcom.com ([10.10.64.221]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-72233U7200L2200S0V35) with ESMTP id com; Mon, 27 Jun 2005 00:29:28 -0700 Received: from mon-irva-10.broadcom.com (mon-irva-10.broadcom.com [10.10.64.171]) by mail-irva-8.broadcom.com (MOS 3.5.6-GR) with ESMTP id BGG44040; Mon, 27 Jun 2005 00:29:24 -0700 (PDT) Received: from nt-irva-0741.brcm.ad.broadcom.com ( nt-irva-0741.brcm.ad.broadcom.com [10.8.194.54]) by mon-irva-10.broadcom.com (8.9.1/8.9.1) with ESMTP id AAA04867; Mon, 27 Jun 2005 00:29:24 -0700 (PDT) X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Subject: RE: BCM5721(tg3)&MSI Date: Mon, 27 Jun 2005 00:29:23 -0700 Message-ID: Thread-Topic: BCM5721(tg3)&MSI Thread-Index: AcV6OfKMwD9sGVPATraBj6jh2ug3igAr7slA From: "Michael Chan" To: "Krzysztof Oledzki" cc: netdev@oss.sgi.com X-WSS-ID: 6EA175662AW4375766-01-01 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5R7V2H9023485 X-archive-position: 2561 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: mchan@broadcom.com Precedence: bulk X-list: netdev Content-Length: 137 Lines: 7 Krzysztof Oledzki wrote: > Oh, I see... BTW: What is wrong with AX/BX revisions? Interrupts cannot be disabled properly in MSI mode. From glen.turner@aarnet.edu.au Mon Jun 27 23:40:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Mon, 27 Jun 2005 23:40:42 -0700 (PDT) Received: from clix.aarnet.edu.au (clix.aarnet.edu.au [192.94.63.10]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5S6eZH9007919 for ; Mon, 27 Jun 2005 23:40:37 -0700 Received: from [202.158.193.5] (andromache.adelaide.aarnet.edu.au [202.158.193.5]) (authenticated bits=0) by clix.aarnet.edu.au (8.12.8/8.12.8) with ESMTP id j5S6cpQ4026830 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Tue, 28 Jun 2005 16:38:52 +1000 Message-ID: <42C0F07B.8040506@aarnet.edu.au> Date: Tue, 28 Jun 2005 16:08:51 +0930 From: Glen Turner Organization: Australia's Academic & Research Network User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andy Fleming CC: Stephen Hemminger , Netdev , Kumar Gala Subject: Re: RFC: PHY Abstraction Layer II References: <1107b64b01fb8e9a6c84359bb56881a6@freescale.com> <20050531105939.7486e071@dxpl.pdx.osdl.net> <92F1428A-0B26-428B-8C06-35C7E5B9EEE3@freescale.com> <20050601144123.2bc11c06@dxpl.pdx.osdl.net> <9A2D608A-D818-455B-96F4-ED42413556C0@freescale.com> <42A360A0.1040902@aarnet.edu.au> <0A9010B9-D24A-4762-8069-F19607ADD416@freescale.com> In-Reply-To: <0A9010B9-D24A-4762-8069-F19607ADD416@freescale.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-MDSA: Yes X-Scanned-By: MIMEDefang 2.39 X-archive-position: 2562 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: glen.turner@aarnet.edu.au Precedence: bulk X-list: netdev Content-Length: 1201 Lines: 35 Andy Fleming wrote: > I was thinking that it would be easier for the ethernet driver to do > this in the adjust_link() function, since it's going to need to track > when these things change, anyway. But if the general consensus is that > it should be in the generic code, I can see about adding it there. Hi Andy, I wasn't at all saying where it should be done -- rather that it's very useful for networking staff like myself that it is done *somewhere*. Without too much knowledge of the code, it would be nice if it were done in the PHY layer, so that the messages can be grepped for in logs. If you leave it to drivers there'll be inconsistent levels of detail and a variety of formats. >> Also, it would be nice to be able to retrieve PHY data >> independent of the interface status (eg, to retrieve >> asset serial numbers, GBIC make/models, etc). > > > I'm not sure what you mean, here. The driver can use phy_read/write to > get/set information anytime it wants. Is there a user space utility to pull the details from a GBIC's EEPROM? Thanks, Glen -- Glen Turner Tel: (08) 8303 3936 or +61 8 8303 3936 Australia's Academic & Research Network www.aarnet.edu.au From jbenc@suse.cz Tue Jun 28 05:24:37 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 28 Jun 2005 05:24:42 -0700 (PDT) Received: from mail.suse.cz (styx.suse.cz [82.119.242.94]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5SCOaH9005036 for ; Tue, 28 Jun 2005 05:24:37 -0700 Received: from griffin.suse.cz (griffin.suse.cz [10.20.1.99]) by mail.suse.cz (SUSE CR ESMTP Mailer) with ESMTP id 91304628316; Tue, 28 Jun 2005 14:23:05 +0200 (CEST) Date: Tue, 28 Jun 2005 14:23:05 +0200 From: Jiri Benc To: NetDev Cc: Jeff Garzik , jbohac@suse.cz Subject: ieee80211 patches Message-ID: <20050628142305.349d70fd@griffin.suse.cz> X-Mailer: Sylpheed-Claws 1.0.4a (GTK+ 1.2.10; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2563 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jbenc@suse.cz Precedence: bulk X-list: netdev Content-Length: 518 Lines: 18 Our patches against latest ieee80211 branch of netdev tree can be found at http://forge.novell.com/modules/xfmod/cvs/cvsbrowse.php/ieee80211/patches-upstream/ (it is possible to download a tarball from this link too). For CVS checkout enter cvs -z3 -d:ext:anonymous@forgecvs1.novell.com:/cvsroot/ieee80211 co patches-upstream Patches have to be applied in order specified in 'series' file (e. g. using quilt). Jeff, should we also post those patches to netdev? Jiri Benc and Jirka Bohac -- Jiri Benc SUSE Labs From pgaltieri@mvista.com Tue Jun 28 08:14:15 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 28 Jun 2005 08:14:18 -0700 (PDT) Received: from av.mvista.com (gateway-1237.mvista.com [12.44.186.158]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5SFEFH9025399 for ; Tue, 28 Jun 2005 08:14:15 -0700 Received: from playin.mvista.com (av [127.0.0.1]) by av.mvista.com (8.9.3/8.9.3) with ESMTP id IAA11290 for ; Tue, 28 Jun 2005 08:12:46 -0700 Subject: Infinite loop in autoconfig with no dhcp server From: Paolo Galtieri To: netdev@oss.sgi.com Content-Type: text/plain Date: Tue, 28 Jun 2005 08:14:09 -0700 Message-Id: <1119971649.21672.12.camel@playin.mvista.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-16) Content-Transfer-Encoding: 7bit X-archive-position: 2564 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: pgaltieri@mvista.com Precedence: bulk X-list: netdev Content-Length: 2480 Lines: 91 Folks, it appears that the attempt to do auto configuration will go on forever if there is no DHCP server available. The question I have is is this the intended behavior when IPCONFIG_DYNAMIC is set, but CONFIG_ROOT_NFS is not? The function in question is ip_auto_config() in net/ipv4/ipconfig.c Here is the situation: At the top of ip_auto_config() we have: #ifdef IPCONFIG_DYNAMIC try_try_again: #endif A bit further down we have: if (ic_myaddr == INADDR_NONE || #ifdef CONFIG_ROOT_NFS (MAJOR(ROOT_DEV) == UNNAMED_MAJOR && root_server_addr == INADDR_NONE && ic_servaddr == INADDR_NONE) || #endif ic_first_dev->next) { #ifdef IPCONFIG_DYNAMIC We are doing dynamic configuration so we drop into the if and initialize the retry count: int retries = CONF_OPEN_RETRIES; We call ic_dynamic() to attempt the connection: if (ic_dynamic() < 0) { ic_close_devs(); There is no DHCP server running so ic_dynamic() returns -1 indicating a timeout occured. We decrement retries to 1 and go to try_try_again. if (--retries) { printk(KERN_ERR "IP-Config: Reopening network devices...\n"); goto try_try_again; } We drop back into the if set retries to 2 again, since it is a local variable, call ic_dynamic() which returns -1 due to a timeout. We decrement retries to 1 and go to try_try_again. We drop into the if again, set retries to 2, call ic_dynamic() ad infinitum. We have the Energizer Bunny auto configuration :-), it goes on and on and on. We never reach the "Auto-configuration of network failed" printk. One way to fix the problem is attached below. Does this seem reasonable or should I go find a rock and crawl under it? Please reply directly to me as I am not on this mailing list. Paolo Galtieri --- linux-2.6.12.1/net/ipv4/ipconfig.c 2005-06-23 06:49:56.000000000 -0700 +++ linux-2.6.12.1-new/net/ipv4/ipconfig.c 2005-06-28 08:09:11.143292280 -0700 @@ -1248,6 +1248,10 @@ { u32 addr; +#ifdef IPCONFIG_DYNAMIC + int retries = CONF_OPEN_RETRIES; +#endif + #ifdef CONFIG_PROC_FS proc_net_fops_create("pnp", S_IRUGO, &pnp_seq_fops); #endif /* CONFIG_PROC_FS */ @@ -1284,8 +1288,6 @@ ic_first_dev->next) { #ifdef IPCONFIG_DYNAMIC - int retries = CONF_OPEN_RETRIES; - if (ic_dynamic() < 0) { ic_close_devs(); From akepner@sgi.com Tue Jun 28 15:16:53 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 28 Jun 2005 15:16:58 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5SMGpH9015209 for ; Tue, 28 Jun 2005 15:16:52 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j5T069k9011024 for ; Tue, 28 Jun 2005 17:06:09 -0700 Received: from [192.168.2.20] (mtv-vpn-sw-corp-0-69.corp.sgi.com [134.15.0.69]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id j5SMFMdP42335699; Tue, 28 Jun 2005 15:15:23 -0700 (PDT) Date: Tue, 28 Jun 2005 15:11:39 -0700 (PDT) From: Arthur Kepner X-X-Sender: akepner@resonance.WorkGroup To: Herbert Xu cc: netdev@vger.kernel.org, netdev@oss.sgi.com, Rick Jones Subject: Re: [RFC/PATCH] "safer ipv4 reassembly" (fwd) In-Reply-To: <20050626125619.GA31967@gondor.apana.org.au> Message-ID: References: <20050626125619.GA31967@gondor.apana.org.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 2565 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: akepner@sgi.com Precedence: bulk X-list: netdev Content-Length: 2940 Lines: 84 On Sun, 26 Jun 2005, Herbert Xu wrote: > ..... > Thanks for writing this patch Arthur. Likewise, thanks for reviewing it. > > > +struct ipc { > > + struct hlist_node node; > > + u32 saddr; > > + u32 daddr; > > + u8 protocol; > > + atomic_t refcnt; /* how many ipqs hold refs to us */ > > + atomic_t seq; /* how many ip datagrams for this > > + * (saddr,daddr,protocol) since we > > + * were created */ > > + struct timer_list timer; > > + struct rcu_head rcu; > > Is RCU worth it here? The only time we'd be taking the locks on this > is when the first fragment of a packet comes in. At that point we'll > be taking write_lock(&ipfrag_lock) anyway. > > The only other use of RCU in your patch is ip_count. That should be > changed to be done in ip_defrag instead. At that point you can simply > find the ipc by deferencing ipq, so no need for __ipc_find and hence > RCU. > > The reason you need to change it in this way is because you can't make > assumptions about ip_rcv_finish being the first place where a packet > is defragmented. With connection tracking enabled conntrack is the first > place where defragmentation occurs. > Right, I see that now. (I'm not well acquainted with the conntrack code...) One reason I used RCU for the "ipc" structures is that I wanted to be able to find find them (in ip_rcv_finish()) without locking. Since ip_rcv_finish() is the wrong place to do that, that reason is invalid. There is a (big) advantage to doing this in ip_defrag() - this becomes a no-op for non-fragmented datagrams. The disadvantage is that there could be a situation where you receive: 1) first fragment of datagram X [for a particular (src,dst,proto)] 2) a zillion non-fragmented datagrams [for the same (src,dst,proto)] 3) last fragment of datagram X [for (src,dst,proto)] and no "disorder" would be detected for the datagrams associated with (src,dst,proto), even though the ip id could have wrapped in the meantime. This seems like a very uncommon case, however. > > +#define IPC_HASHSZ IPQ_HASHSZ > > +static struct { > > + struct hlist_head head; > > + spinlock_t lock; > > +} ipc_hash[IPC_HASHSZ]; > > I'd store ipc entries in the main ipq hash table since they can use > the same keys for lookup as ipq entries. You just need to set protocol > to zero and map the user to values specific to ipc for ipc entries. > One mapping would be to set the top bit of user for ipc entries, e.g. > > #define IP_DEFRAG_IPC 0x80000000 > ipc->user = ipq->user | IP_DEFRAG_IPC; > > Of course you also need to make sure that the two structures share > the leading elements. You can then use the user field to distinguish > between ipc/ipq entries. Hmmm, let me think about combining the ipc/ipq structures, and also the related question of whether to use RCU for the ipc structures. I'll try to spin another version of the patch before the end of the week. -- Arthur From rscop@matrix.com.br Tue Jun 28 15:30:01 2005 Received: with ECARTIS (v1.0.0; list netdev); Tue, 28 Jun 2005 15:30:06 -0700 (PDT) Received: from hermes.digitel.com.br (hermes.digitel.com.br [200.198.105.36]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5SMTvH9016437 for ; Tue, 28 Jun 2005 15:30:00 -0700 Received: from [10.10.10.4] by hermes.digitel.com.br (GMS 10.03.3304/NU1492.00.9f6ce814) with ESMTP id vcfrpbaa for netdev@oss.sgi.com; Tue, 28 Jun 2005 19:33:13 -0300 Content-Type: text/plain; charset="us-ascii" From: Ricardo Scop Organization: R SCOP Consult. To: netdev@oss.sgi.com Subject: Fwd: Re: GRE tunnel keepalive support? Date: Tue, 28 Jun 2005 19:34:41 -0300 User-Agent: KMail/1.4.3 MIME-Version: 1.0 Message-Id: <200506281934.41732.rscop@matrix.com.br> X-AntiSpam: Checked for restricted content by Gordano's AntiSpam Software Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id j5SMTvH9016437 X-archive-position: 2566 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rscop@matrix.com.br Precedence: bulk X-list: netdev Content-Length: 1253 Lines: 48 Hi, all Any further development of the subject above? Neither I want to duplicate efforts... ;) Please CC to my private address as I'm not yet subscribed to this list. Many thanks, -Scop. On Tue, Dec 09, 2003 at 10:33:21PM -0800, David S. Miller wrote: > > > is anybody aware of anyone implementing keepalives for GRE tunnels? > > > I don't want to duplicate efforts... > > > > > > http://www.cisco.com/en/US/products/sw/iosswrel/ps1838/ > > > products_feature_guide09186a0080134a36.html > > > > Not to my knowledge. It looks like a rather simple mechanism, > > and should be easy to code up. Feel free. > > OK, anything else than "easy" would be out of scope anyway, given > significant time constraints and never did any kernel work at all > (except some micropatches). Given that there is no documentation > whatsoever, I will have to do some reverse engineering though. > > Best regards, > Daniel > -- Ricardo Scop. \|/ ___ -*- (@ @)/|\ / V \| R SCOP Consult. /( )\ Linux-based communications --^^---^^+------------------------------ rscop@matrix.com.br +55 51 999-36-777 Porto Alegre, RS - BRazil -- P. S.: "If you don't have time to do it right, when will you have time to do it over?" -- Penny Hines From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:30 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMGH9001254 for ; Thu, 30 Jun 2005 03:22:18 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANIes080798 for ; Thu, 30 Jun 2005 20:23:19 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.250.244]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANbdu037748 for ; Thu, 30 Jun 2005 20:23:37 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKgd5030677 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av03.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKglH030665; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 94EA3736CD; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 7/12] iseries_veth: Remove redundant message stack lock In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.569702.406803544780.qpatch@concordia> X-archive-position: 2570 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 1970 Lines: 64 The iseries_veth driver keeps a stack of messages for each connection and a lock to protect the stack. However there is also a per-connection lock which makes the message stack redundant. Remove the message stack lock and document the fact that callers of the stack-manipulation functions must hold the connection's lock. --- drivers/net/iseries_veth.c | 12 +++--------- 1 files changed, 3 insertions(+), 9 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -143,7 +143,6 @@ struct veth_lpar_connection { struct VethCapData remote_caps; u32 ack_timeout; - spinlock_t msg_stack_lock; struct veth_msg *msg_stack_head; }; @@ -190,27 +189,23 @@ static void veth_timed_ack(unsigned long #define veth_debug(fmt, args...) do {} while (0) #endif +/* You must hold the connection's lock when you call this function. */ static inline void veth_stack_push(struct veth_lpar_connection *cnx, struct veth_msg *msg) { - unsigned long flags; - - spin_lock_irqsave(&cnx->msg_stack_lock, flags); msg->next = cnx->msg_stack_head; cnx->msg_stack_head = msg; - spin_unlock_irqrestore(&cnx->msg_stack_lock, flags); } +/* You must hold the connection's lock when you call this function. */ static inline struct veth_msg *veth_stack_pop(struct veth_lpar_connection *cnx) { - unsigned long flags; struct veth_msg *msg; - spin_lock_irqsave(&cnx->msg_stack_lock, flags); msg = cnx->msg_stack_head; if (msg) cnx->msg_stack_head = cnx->msg_stack_head->next; - spin_unlock_irqrestore(&cnx->msg_stack_lock, flags); + return msg; } @@ -643,7 +638,6 @@ static int veth_init_connection(u8 rlp) cnx->msgs = msgs; memset(msgs, 0, VETH_NUMBUFFERS * sizeof(struct veth_msg)); - spin_lock_init(&cnx->msg_stack_lock); for (i = 0; i < VETH_NUMBUFFERS; i++) { msgs[i].token = i; From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:28 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMGH9001261 for ; Thu, 30 Jun 2005 03:22:18 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANKes405208 for ; Thu, 30 Jun 2005 20:23:21 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.250.244]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANbdu045594 for ; Thu, 30 Jun 2005 20:23:38 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKgqC030686 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av03.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgfS030678; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id BA0AA736CA; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 8/12] iseries_veth: Replace lock-protected atomic with an ordinary variable In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.641415.625704757570.qpatch@concordia> X-archive-position: 2568 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 2058 Lines: 69 The iseries_veth driver uses atomic ops to manipulate the in_use field of one of its per-connection structures. However all references to the flag occur while the connection's lock is held, so the atomic ops aren't necessary. --- drivers/net/iseries_veth.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -117,7 +117,7 @@ struct veth_msg { struct veth_msg *next; struct VethFramesData data; int token; - unsigned long in_use; + int in_use; struct sk_buff *skb; struct device *dev; }; @@ -959,6 +959,8 @@ static int veth_transmit_to_one(struct s goto drop; } + msg->in_use = 1; + dma_length = skb->len; dma_address = dma_map_single(port->dev, skb->data, dma_length, DMA_TO_DEVICE); @@ -973,7 +975,6 @@ static int veth_transmit_to_one(struct s msg->data.addr[0] = dma_address; msg->data.len[0] = dma_length; msg->data.eofmask = 1 << VETH_EOF_SHIFT; - set_bit(0, &(msg->in_use)); rc = veth_signaldata(cnx, VethEventTypeFrames, msg->token, &msg->data); if (rc != HvLpEvent_Rc_Good) @@ -983,10 +984,8 @@ static int veth_transmit_to_one(struct s return 0; recycle_and_drop: + /* we free the skb below, so tell veth_recycle_msg() not to. */ msg->skb = NULL; - /* need to set in use to make veth_recycle_msg in case this - * was a mapping failure */ - set_bit(0, &msg->in_use); veth_recycle_msg(cnx, msg); drop: port->stats.tx_errors++; @@ -1068,12 +1067,14 @@ static int veth_start_xmit(struct sk_buf return 0; } +/* You musT hold the connection's lock when you call this function. */ static void veth_recycle_msg(struct veth_lpar_connection *cnx, struct veth_msg *msg) { u32 dma_address, dma_length; - if (test_and_clear_bit(0, &msg->in_use)) { + if (msg->in_use) { + msg->in_use = 0; dma_address = msg->data.addr[0]; dma_length = msg->data.len[0]; From michael@ellerman.id.au Thu Jun 30 03:22:25 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:39 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMMH9001328 for ; Thu, 30 Jun 2005 03:22:24 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANRes155040 for ; Thu, 30 Jun 2005 20:23:27 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.250.243]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANjdu129274 for ; Thu, 30 Jun 2005 20:23:45 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKlVL027358 for ; Thu, 30 Jun 2005 20:20:49 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av02.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKfEG027296; Thu, 30 Jun 2005 20:20:41 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 2CCEA73673; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 1/12] iseries_veth: Make error messages more user friendly, and add a debug macro In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.106943.128468321759.qpatch@concordia> X-archive-position: 2577 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 1438 Lines: 48 Currently the iseries_veth driver prints the file name and line number in its error messages. This isn't very useful for most users, so just print "iseries_veth: message" instead. Also add a veth_debug() and veth_info() macro to replace the current veth_printk(). --- drivers/net/iseries_veth.c | 15 ++++++++++++--- 1 files changed, 12 insertions(+), 3 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -79,6 +79,8 @@ #include #include +#define DEBUG 1 + #include "iseries_veth.h" MODULE_AUTHOR("Kyle Lucke "); @@ -176,11 +178,18 @@ static void veth_timed_ack(unsigned long * Utility functions */ -#define veth_printk(prio, fmt, args...) \ - printk(prio "%s: " fmt, __FILE__, ## args) +#define veth_info(fmt, args...) \ + printk(KERN_INFO "iseries_veth: " fmt, ## args) #define veth_error(fmt, args...) \ - printk(KERN_ERR "(%s:%3.3d) ERROR: " fmt, __FILE__, __LINE__ , ## args) + printk(KERN_ERR "iseries_veth: Error: " fmt, ## args) + +#ifdef DEBUG +#define veth_debug(fmt, args...) \ + printk(KERN_DEBUG "iseries_veth: " fmt, ## args) +#else +#define veth_debug(fmt, args...) do {} while (0) +#endif static inline void veth_stack_push(struct veth_lpar_connection *cnx, struct veth_msg *msg) From michael@ellerman.id.au Thu Jun 30 03:22:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:35 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMLH9001322 for ; Thu, 30 Jun 2005 03:22:24 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANQes402168 for ; Thu, 30 Jun 2005 20:23:27 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.250.243]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANjdu120640 for ; Thu, 30 Jun 2005 20:23:45 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKmrm027373 for ; Thu, 30 Jun 2005 20:20:49 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av02.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgBI027308; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 30458736CB; Thu, 30 Jun 2005 20:21:04 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:40 +1000 Date: Thu, 30 Jun 2005 20:20:40 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 12/12] iseries_veth: Simplify full-queue handling In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126840.155491.927718131055.qpatch@concordia> X-archive-position: 2572 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 6755 Lines: 227 The iseries_veth driver may have multiple netdevices sending packets over a single connection to another LPAR. If the bandwidth to the other LPAR is exceeded all the netdevices must have their queue's stopped. The current code achieves this by queueing one incoming skb on the per-netdevice port structure. When the connection is able to send more packets it flushes the queued packet for all netdevices and restarts their queues. This arrangement makes less sense now that we have per-connection TX timers, rather than the per-netdevice generic TX timer. The new code simply detects when one of the connections is full, and stops the queue of all associated netdevices. Then when a packet is acked on that connection (ie. there is space again) all the queues are woken up. --- drivers/net/iseries_veth.c | 108 ++++++++++++++++++++++++++------------------- 1 files changed, 64 insertions(+), 44 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -158,10 +158,11 @@ struct veth_port { u64 mac_addr; HvLpIndexMap lpar_map; - spinlock_t pending_gate; - struct sk_buff *pending_skb; - HvLpIndexMap pending_lpmask; + /* queue_lock protects the stopped_map and dev's queue. */ + spinlock_t queue_lock; + HvLpIndexMap stopped_map; + /* mcast_gate protects promiscuous, num_mcast & mcast_addr. */ rwlock_t mcast_gate; int promiscuous; int num_mcast; @@ -174,7 +175,8 @@ static struct net_device *veth_dev[HVMAX static int veth_start_xmit(struct sk_buff *skb, struct net_device *dev); static void veth_recycle_msg(struct veth_lpar_connection *, struct veth_msg *); -static void veth_flush_pending(struct veth_lpar_connection *cnx); +static void veth_wake_queues(struct veth_lpar_connection *cnx); +static void veth_stop_queues(struct veth_lpar_connection *cnx); static void veth_receive(struct veth_lpar_connection *, struct VethLpEvent *); static void veth_timed_ack(unsigned long ptr); static void veth_timed_reset(unsigned long ptr); @@ -216,6 +218,12 @@ static inline struct veth_msg *veth_stac return msg; } +/* You must hold the connection's lock when you call this function. */ +static inline int veth_stack_is_empty(struct veth_lpar_connection *cnx) +{ + return cnx->msg_stack_head == NULL; +} + static inline HvLpEvent_Rc veth_signalevent(struct veth_lpar_connection *cnx, u16 subtype, HvLpEvent_AckInd ackind, HvLpEvent_AckType acktype, @@ -384,12 +392,12 @@ static void veth_handle_int(struct VethL } } - if (acked > 0) + if (acked > 0) { cnx->last_contact = jiffies; + veth_wake_queues(cnx); + } spin_unlock_irqrestore(&cnx->lock, flags); - - veth_flush_pending(cnx); break; case VethEventTypeFrames: veth_receive(cnx, event); @@ -485,7 +493,9 @@ static void veth_statemachine(void *p) for (i = 0; i < VETH_NUMBUFFERS; ++i) veth_recycle_msg(cnx, cnx->msgs + i); } + cnx->outstanding_tx = 0; + veth_wake_queues(cnx); /* Drop the lock so we can do stuff that might sleep or * take other locks. */ @@ -494,8 +504,6 @@ static void veth_statemachine(void *p) del_timer_sync(&cnx->ack_timer); del_timer_sync(&cnx->reset_timer); - veth_flush_pending(cnx); - spin_lock_irq(&cnx->lock); if (cnx->state & VETH_STATE_RESET) @@ -852,8 +860,9 @@ static struct net_device * __init veth_p port = (struct veth_port *) dev->priv; - spin_lock_init(&port->pending_gate); + spin_lock_init(&port->queue_lock); rwlock_init(&port->mcast_gate); + port->stopped_map = 0; for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { HvLpVirtualLanIndexMap map; @@ -969,6 +978,9 @@ static int veth_transmit_to_one(struct s cnx->last_contact = jiffies; cnx->outstanding_tx++; + if (veth_stack_is_empty(cnx)) + veth_stop_queues(cnx); + spin_unlock_irqrestore(&cnx->lock, flags); return 0; @@ -1012,7 +1024,6 @@ static int veth_start_xmit(struct sk_buf { unsigned char *frame = skb->data; struct veth_port *port = (struct veth_port *) dev->priv; - unsigned long flags; HvLpIndexMap lpmask; if (! (frame[0] & 0x01)) { @@ -1029,27 +1040,9 @@ static int veth_start_xmit(struct sk_buf lpmask = port->lpar_map; } - spin_lock_irqsave(&port->pending_gate, flags); - - lpmask = veth_transmit_to_many(skb, lpmask, dev); + veth_transmit_to_many(skb, lpmask, dev); - if (! lpmask) { - dev_kfree_skb(skb); - } else { - if (port->pending_skb) { - veth_error("%s: TX while skb was pending!\n", - dev->name); - dev_kfree_skb(skb); - spin_unlock_irqrestore(&port->pending_gate, flags); - return 1; - } - - port->pending_skb = skb; - port->pending_lpmask = lpmask; - netif_stop_queue(dev); - } - - spin_unlock_irqrestore(&port->pending_gate, flags); + dev_kfree_skb(skb); return 0; } @@ -1081,9 +1074,10 @@ static void veth_recycle_msg(struct veth } } -static void veth_flush_pending(struct veth_lpar_connection *cnx) +static void veth_wake_queues(struct veth_lpar_connection *cnx) { int i; + for (i = 0; i < HVMAXARCHITECTEDVIRTUALLANS; i++) { struct net_device *dev = veth_dev[i]; struct veth_port *port; @@ -1097,19 +1091,45 @@ static void veth_flush_pending(struct ve if (! (port->lpar_map & (1<remote_lp))) continue; - spin_lock_irqsave(&port->pending_gate, flags); - if (port->pending_skb) { - port->pending_lpmask = - veth_transmit_to_many(port->pending_skb, - port->pending_lpmask, - dev); - if (! port->pending_lpmask) { - dev_kfree_skb_any(port->pending_skb); - port->pending_skb = NULL; - netif_wake_queue(dev); - } + spin_lock_irqsave(&port->queue_lock, flags); + + port->stopped_map &= ~(1 << cnx->remote_lp); + + if (0 == port->stopped_map && netif_queue_stopped(dev)) { + veth_debug("cnx %d: woke queue for %s.\n", + cnx->remote_lp, dev->name); + netif_wake_queue(dev); } - spin_unlock_irqrestore(&port->pending_gate, flags); + spin_unlock_irqrestore(&port->queue_lock, flags); + } +} + +static void veth_stop_queues(struct veth_lpar_connection *cnx) +{ + int i; + + for (i = 0; i < HVMAXARCHITECTEDVIRTUALLANS; i++) { + struct net_device *dev = veth_dev[i]; + struct veth_port *port; + + if (! dev) + continue; + + port = (struct veth_port *)dev->priv; + + /* If this cnx is not on the vlan for this port, continue */ + if (! (port->lpar_map & (1 << cnx->remote_lp))) + continue; + + spin_lock(&port->queue_lock); + + netif_stop_queue(dev); + port->stopped_map |= (1 << cnx->remote_lp); + + veth_debug("cnx %d: stopped queue for %s, map = 0x%x.\n", + cnx->remote_lp, dev->name, port->stopped_map); + + spin_unlock(&port->queue_lock); } } From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:31 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMGH9001262 for ; Thu, 30 Jun 2005 03:22:18 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANLes354642 for ; Thu, 30 Jun 2005 20:23:21 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.250.244]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANcdu046032 for ; Thu, 30 Jun 2005 20:23:38 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKgA4030699 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av03.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgDT030688; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 019CC73681; Thu, 30 Jun 2005 20:21:04 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 10/12] iseries_veth: Remove TX timeout code In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.908017.660889424014.qpatch@concordia> X-archive-position: 2571 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 2364 Lines: 82 The iseries_veth driver uses the generic TX timeout watchdog, however a better solution is in the works, so remove this code. --- drivers/net/iseries_veth.c | 48 --------------------------------------------- 1 files changed, 48 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -813,49 +813,6 @@ static struct ethtool_ops ops = { .get_link = veth_get_link, }; -static void veth_tx_timeout(struct net_device *dev) -{ - struct veth_port *port = (struct veth_port *)dev->priv; - struct net_device_stats *stats = &port->stats; - unsigned long flags; - int i; - - stats->tx_errors++; - - spin_lock_irqsave(&port->pending_gate, flags); - - if (!port->pending_lpmask) { - spin_unlock_irqrestore(&port->pending_gate, flags); - return; - } - - printk(KERN_WARNING "%s: Tx timeout! Resetting lp connections: %08x\n", - dev->name, port->pending_lpmask); - - for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { - struct veth_lpar_connection *cnx = veth_cnx[i]; - - if (! (port->pending_lpmask & (1<lock); - cnx->state |= VETH_STATE_RESET; - veth_kick_statemachine(cnx); - spin_unlock(&cnx->lock); - } - - spin_unlock_irqrestore(&port->pending_gate, flags); -} - static struct net_device * __init veth_probe_one(int vlan, struct device *vdev) { struct net_device *dev; @@ -904,9 +861,6 @@ static struct net_device * __init veth_p dev->set_multicast_list = veth_set_multicast_list; SET_ETHTOOL_OPS(dev, &ops); - dev->watchdog_timeo = 2 * (VETH_ACKTIMEOUT * HZ / 1000000); - dev->tx_timeout = veth_tx_timeout; - SET_NETDEV_DEV(dev, vdev); rc = register_netdev(dev); @@ -1047,8 +1001,6 @@ static int veth_start_xmit(struct sk_buf lpmask = veth_transmit_to_many(skb, lpmask, dev); - dev->trans_start = jiffies; - if (! lpmask) { dev_kfree_skb(skb); } else { From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:28 -0700 (PDT) Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.com [202.81.18.187]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMHH9001267 for ; Thu, 30 Jun 2005 03:22:19 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp02.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UAG9hV324012 for ; Thu, 30 Jun 2005 20:16:11 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.250.237]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANbdu111742 for ; Thu, 30 Jun 2005 20:23:37 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKgjF007503 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av04.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgVn007479; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 838FB736CC; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 6/12] iseries_veth: Fix broken promiscuous handling In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.505562.276017099853.qpatch@concordia> X-archive-position: 2569 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 1806 Lines: 62 Due to a logic bug, once promiscuous mode is enabled in the iseries_veth driver it is never disabled. The driver keeps two flags, promiscuous and all_mcast which have exactly the same effect. This is because we only ever receive packets destined for us, or multicast packets. So consolidate them into one promiscuous flag for simplicity. --- drivers/net/iseries_veth.c | 16 +++++----------- 1 files changed, 5 insertions(+), 11 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -159,7 +159,6 @@ struct veth_port { rwlock_t mcast_gate; int promiscuous; - int all_mcast; int num_mcast; u64 mcast_addr[VETH_MAX_MCAST]; }; @@ -754,17 +753,15 @@ static void veth_set_multicast_list(stru write_lock_irqsave(&port->mcast_gate, flags); - if (dev->flags & IFF_PROMISC) { /* set promiscuous mode */ - printk(KERN_INFO "%s: Promiscuous mode enabled.\n", - dev->name); + if ((dev->flags & IFF_PROMISC) || (dev->flags & IFF_ALLMULTI) || + (dev->mc_count > VETH_MAX_MCAST)) { port->promiscuous = 1; - } else if ( (dev->flags & IFF_ALLMULTI) - || (dev->mc_count > VETH_MAX_MCAST) ) { - port->all_mcast = 1; } else { struct dev_mc_list *dmi = dev->mc_list; int i; + port->promiscuous = 0; + /* Update table */ port->num_mcast = 0; @@ -1147,12 +1144,9 @@ static inline int veth_frame_wanted(stru if ( (mac_addr == port->mac_addr) || (mac_addr == 0xffffffffffff0000) ) return 1; - if (! (((char *) &mac_addr)[0] & 0x01)) - return 0; - read_lock_irqsave(&port->mcast_gate, flags); - if (port->promiscuous || port->all_mcast) { + if (port->promiscuous) { wanted = 1; goto out; } From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:42 -0700 (PDT) Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.com [202.81.18.187]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMHH9001252 for ; Thu, 30 Jun 2005 03:22:19 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp02.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UAG8hV324008 for ; Thu, 30 Jun 2005 20:16:08 +1000 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.250.242]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANadu133422 for ; Thu, 30 Jun 2005 20:23:37 +1000 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKfQD016225 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av01.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKfIc016214; Thu, 30 Jun 2005 20:20:41 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 63D75736CA; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 4/12] iseries_veth: Remove a FIXME WRT deletion of the ack_timer In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.355585.362623134076.qpatch@concordia> X-archive-position: 2579 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 2295 Lines: 71 The iseries_veth driver has a timer which we use to send acks. When the connection is reset or stopped we need to delete the timer. Currently we only call del_timer() when resetting a connection, which means the timer might run again while the connection is being re-setup. As it turns out that's ok, because the flags the timer consults have been reset. It's cleaner though to call del_timer_sync() once we've dropped the lock, although the timer may still run between us dropping the lock and calling del_timer_sync(), but as above that's ok. --- drivers/net/iseries_veth.c | 21 +++++++++++++-------- 1 files changed, 13 insertions(+), 8 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -450,13 +450,15 @@ static void veth_statemachine(void *p) if (cnx->state & VETH_STATE_RESET) { int i; - del_timer(&cnx->ack_timer); - if (cnx->state & VETH_STATE_OPEN) HvCallEvent_closeLpEventPath(cnx->remote_lp, HvLpEvent_Type_VirtualLan); - /* reset ack data */ + /* + * Reset ack data. This prevents the ack_timer actually + * doing anything, even if it runs one more time when + * we drop the lock below. + */ memset(&cnx->pending_acks, 0xff, sizeof (cnx->pending_acks)); cnx->num_pending_acks = 0; @@ -469,9 +471,16 @@ static void veth_statemachine(void *p) if (cnx->msgs) for (i = 0; i < VETH_NUMBUFFERS; ++i) veth_recycle_msg(cnx, cnx->msgs + i); + + /* Drop the lock so we can do stuff that might sleep or + * take other locks. */ spin_unlock_irq(&cnx->lock); + + del_timer_sync(&cnx->ack_timer); veth_flush_pending(cnx); + spin_lock_irq(&cnx->lock); + if (cnx->state & VETH_STATE_RESET) goto restart; } @@ -658,12 +667,8 @@ static void veth_stop_connection(u8 rlp) veth_kick_statemachine(cnx); spin_unlock_irq(&cnx->lock); + /* Wait for the state machine to run. */ flush_scheduled_work(); - - /* FIXME: not sure if this is necessary - will already have - * been deleted by the state machine, just want to make sure - * its not running any more */ - del_timer_sync(&cnx->ack_timer); } static void veth_destroy_connection(u8 rlp) From michael@ellerman.id.au Thu Jun 30 03:22:26 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:41 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMLH9001326 for ; Thu, 30 Jun 2005 03:22:24 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANQes243644 for ; Thu, 30 Jun 2005 20:23:26 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.250.243]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANjdu122620 for ; Thu, 30 Jun 2005 20:23:45 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKmUx027372 for ; Thu, 30 Jun 2005 20:20:49 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av02.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgS0027302; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id D5BC1736CE; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 9/12] iseries_veth: Use ref counts to track lifecycle of connection structs In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.794259.894526862881.qpatch@concordia> X-archive-position: 2578 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 5501 Lines: 177 The iseries_veth driver can attach to multiple vlans, which correspond to multiple net devices. However there is only 1 connection between each LPAR, so the connection structure may be shared by multiple net devices. This makes module removal messy, because we can't deallocate the connections until we know there are no net devices still using them. The solution is to use ref counts on the connections, so we can delete them (actually stop) as soon as the ref count hits zero. This patch fixes (part of) a bug we were seeing with IPv6 sending probes to a dead LPAR, which would then hang us forever due to leftover skbs. This patch has the (minor?) side effect that we only start negotiating a connection with LPARs which are on one of our vlans. The previous behaviour was to start negotiation with all LPARs unconditionally, will have the think about that one. --- drivers/net/iseries_veth.c | 89 ++++++++++++++++++++++++++++++--------------- 1 files changed, 61 insertions(+), 28 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -129,6 +129,7 @@ struct veth_lpar_connection { int num_events; struct VethCapData local_caps; + struct kref refcount; struct timer_list ack_timer; spinlock_t lock; @@ -620,6 +621,10 @@ static int veth_init_connection(u8 rlp) return -ENOMEM; memset(cnx, 0, sizeof(*cnx)); + /* This gets us 1 reference, which is held on behalf of the driver + * infrastructure. It's released at module unload. */ + kref_init(&cnx->refcount); + cnx->remote_lp = rlp; spin_lock_init(&cnx->lock); INIT_WORK(&cnx->statemachine_wq, veth_statemachine, cnx); @@ -658,12 +663,10 @@ static int veth_init_connection(u8 rlp) return 0; } -static void veth_stop_connection(u8 rlp) +static void veth_stop_connection(struct kref *ref) { - struct veth_lpar_connection *cnx = veth_cnx[rlp]; - - if (! cnx) - return; + struct veth_lpar_connection *cnx; + cnx = container_of(ref, struct veth_lpar_connection, refcount); spin_lock_irq(&cnx->lock); cnx->state |= VETH_STATE_RESET | VETH_STATE_SHUTDOWN; @@ -1352,15 +1355,31 @@ static void veth_timed_ack(unsigned long static int veth_remove(struct vio_dev *vdev) { - int i = vdev->unit_address; + struct veth_lpar_connection *cnx; struct net_device *dev; + struct veth_port *port; + int i; - dev = veth_dev[i]; - if (dev != NULL) { - veth_dev[i] = NULL; - unregister_netdev(dev); - free_netdev(dev); + dev = veth_dev[vdev->unit_address]; + + if (! dev) + return 0; + + port = netdev_priv(dev); + + for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { + cnx = veth_cnx[i]; + + if (cnx && (port->lpar_map & (1 << i))) { + /* Drop our reference to connections on our VLAN */ + kref_put(&cnx->refcount, veth_stop_connection); + } } + + veth_dev[vdev->unit_address] = NULL; + unregister_netdev(dev); + free_netdev(dev); + return 0; } @@ -1368,6 +1387,7 @@ static int veth_probe(struct vio_dev *vd { int i = vdev->unit_address; struct net_device *dev; + struct veth_port *port; dev = veth_probe_one(i, &vdev->dev); if (dev == NULL) { @@ -1376,11 +1396,19 @@ static int veth_probe(struct vio_dev *vd } veth_dev[i] = dev; - /* Start the state machine on each connection, to commence - * link negotiation */ - for (i = 0; i < HVMAXARCHITECTEDLPS; i++) - if (veth_cnx[i]) + port = (struct veth_port*)netdev_priv(dev); + + /* Start the state machine on each connection on this vlan. If we're + * the first dev to do so this will commence link negotiation */ + for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { + if (! (port->lpar_map & (1 << i))) + continue; + + if (veth_cnx[i]) { + kref_get(&(veth_cnx[i]->refcount)); veth_kick_statemachine(veth_cnx[i]); + } + } return 0; } @@ -1409,26 +1437,31 @@ static struct vio_driver veth_driver = { void __exit veth_module_cleanup(void) { int i; + struct veth_lpar_connection *cnx; - /* Stop the queues first to stop any new packets being sent. */ - for (i = 0; i < HVMAXARCHITECTEDVIRTUALLANS; i++) - if (veth_dev[i]) - netif_stop_queue(veth_dev[i]); + /* Drop the driver's references to the connections. */ + for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) { + cnx = veth_cnx[i]; - /* Stop the connections before we unregister the driver. This - * ensures there's no skbs lying around holding the device open. */ - for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) - veth_stop_connection(i); + if (cnx) { + kref_put(&cnx->refcount, veth_stop_connection); + } + } - HvLpEvent_unregisterHandler(HvLpEvent_Type_VirtualLan); + /* Unregister the driver, which will close all the netdevs and stop + * the connections when they're no longer referenced. */ + vio_unregister_driver(&veth_driver); - /* Hypervisor callbacks may have scheduled more work while we - * were stoping connections. Now that we've disconnected from - * the hypervisor make sure everything's finished. */ + /* Make sure each connection's state machine has run to completion. */ flush_scheduled_work(); - vio_unregister_driver(&veth_driver); + /* Disconnect our "irq" to stop events coming from the Hypervisor. */ + HvLpEvent_unregisterHandler(HvLpEvent_Type_VirtualLan); + + /* Make sure any work queued from Hypervisor callbacks is finished. */ + flush_scheduled_work(); + /* Deallocate everything. */ for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) veth_destroy_connection(i); From michael@ellerman.id.au Thu Jun 30 03:18:33 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:18:38 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAIUH9000747 for ; Thu, 30 Jun 2005 03:18:32 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UAJYes376888 for ; Thu, 30 Jun 2005 20:19:35 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.250.243]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UAJrdu114344 for ; Thu, 30 Jun 2005 20:19:54 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAGw8r022834 for ; Thu, 30 Jun 2005 20:16:58 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av02.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAGwsC022827; Thu, 30 Jun 2005 20:16:58 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (Client did not present a certificate) by ozlabs.au.ibm.com (Postfix) with ESMTP id E053473673; Thu, 30 Jun 2005 20:17:19 +1000 (EST) From: Michael Ellerman Reply-To: michael@ellerman.id.au To: "PPC64-dev" , netdev@oss.sgi.com, LKML Subject: [RFC/PATCH 0/12] Updates & bug fixes for iseries_veth network driver Date: Thu, 30 Jun 2005 20:16:49 +1000 User-Agent: KMail/1.8 MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1509026.vOgD9QmspS"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200506302016.55125.michael@ellerman.id.au> X-archive-position: 2567 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 937 Lines: 40 --nextPart1509026.vOgD9QmspS Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi y'all, The following is a series of patches for the iseries_veth driver. They're not ready for merging yet, as we need to do more extensive testing.= =20 However any feedback you have will be greatly appreciated. cheers =2D-=20 Michael Ellerman IBM OzLabs email: michael:ellerman.id.au inmsg: mpe:jabber.org wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person --nextPart1509026.vOgD9QmspS Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQBCw8aXdSjSd0sB4dIRAkhxAKCQId0wJxv/bZLgOoEifQMR5AkmOgCeJIUu dQ6d0lmlSZwBL6ipT6dw0WU= =8omQ -----END PGP SIGNATURE----- --nextPart1509026.vOgD9QmspS-- From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:37 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMGH9001251 for ; Thu, 30 Jun 2005 03:22:18 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANHes257388 for ; Thu, 30 Jun 2005 20:23:17 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.250.244]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANbdu093684 for ; Thu, 30 Jun 2005 20:23:37 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKgOd030673 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av03.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgFa030656; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 3EBEF73681; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 2/12] iseries_veth: Cleanup error and debug messages In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.217047.4847506912.qpatch@concordia> X-archive-position: 2575 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 8483 Lines: 240 This patch: * converts uses of veth_printk() to veth_debug()/veth_error() * makes terminology consistent, ie. always refer to LPAR not lpar * be consistent about printing return codes as %d not %x * make printf formats fit in 80 columns --- drivers/net/iseries_veth.c | 87 ++++++++++++++++++++++----------------------- 1 files changed, 43 insertions(+), 44 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -287,7 +287,7 @@ static void veth_take_cap(struct veth_lp HvLpEvent_Type_VirtualLan); if (cnx->state & VETH_STATE_GOTCAPS) { - veth_error("Received a second capabilities from lpar %d\n", + veth_error("Received a second capabilities from LPAR %d.\n", cnx->remote_lp); event->base_event.xRc = HvLpEvent_Rc_BufferNotAvailable; HvCallEvent_ackLpEvent((struct HvLpEvent *) event); @@ -306,7 +306,7 @@ static void veth_take_cap_ack(struct vet spin_lock_irqsave(&cnx->lock, flags); if (cnx->state & VETH_STATE_GOTCAPACK) { - veth_error("Received a second capabilities ack from lpar %d\n", + veth_error("Received a second capabilities ack from LPAR %d.\n", cnx->remote_lp); } else { memcpy(&cnx->cap_ack_event, event, @@ -323,8 +323,7 @@ static void veth_take_monitor_ack(struct unsigned long flags; spin_lock_irqsave(&cnx->lock, flags); - veth_printk(KERN_DEBUG, "Monitor ack returned for lpar %d\n", - cnx->remote_lp); + veth_debug("cnx %d: lost connection.\n", cnx->remote_lp); cnx->state |= VETH_STATE_RESET; veth_kick_statemachine(cnx); spin_unlock_irqrestore(&cnx->lock, flags); @@ -345,8 +344,8 @@ static void veth_handle_ack(struct VethL veth_take_monitor_ack(cnx, event); break; default: - veth_error("Unknown ack type %d from lpar %d\n", - event->base_event.xSubtype, rlp); + veth_error("Unknown ack type %d from LPAR %d.\n", + event->base_event.xSubtype, rlp); }; } @@ -382,8 +381,8 @@ static void veth_handle_int(struct VethL veth_receive(cnx, event); break; default: - veth_error("Unknown interrupt type %d from lpar %d\n", - event->base_event.xSubtype, rlp); + veth_error("Unknown interrupt type %d from LPAR %d.\n", + event->base_event.xSubtype, rlp); }; } @@ -409,8 +408,8 @@ static int veth_process_caps(struct veth || (remote_caps->ack_threshold > VETH_MAX_ACKS_PER_MSG) || (remote_caps->ack_threshold == 0) || (cnx->ack_timeout == 0) ) { - veth_error("Received incompatible capabilities from lpar %d\n", - cnx->remote_lp); + veth_error("Received incompatible capabilities from LPAR %d.\n", + cnx->remote_lp); return HvLpEvent_Rc_InvalidSubtypeData; } @@ -427,8 +426,8 @@ static int veth_process_caps(struct veth cnx->num_ack_events += num; if (cnx->num_ack_events < num_acks_needed) { - veth_error("Couldn't allocate enough ack events for lpar %d\n", - cnx->remote_lp); + veth_error("Couldn't allocate enough ack events " + "for LPAR %d.\n", cnx->remote_lp); return HvLpEvent_Rc_BufferNotAvailable; } @@ -507,9 +506,8 @@ static void veth_statemachine(void *p) } else { if ( (rc != HvLpEvent_Rc_PartitionDead) && (rc != HvLpEvent_Rc_PathClosed) ) - veth_error("Error sending monitor to " - "lpar %d, rc=%x\n", - rlp, (int) rc); + veth_error("Error sending monitor to LPAR %d, " + "rc = %d\n", rlp, rc); /* Oh well, hope we get a cap from the other * end and do better when that kicks us */ @@ -532,9 +530,9 @@ static void veth_statemachine(void *p) } else { if ( (rc != HvLpEvent_Rc_PartitionDead) && (rc != HvLpEvent_Rc_PathClosed) ) - veth_error("Error sending caps to " - "lpar %d, rc=%x\n", - rlp, (int) rc); + veth_error("Error sending caps to LPAR %d, " + "rc = %d\n", rlp, rc); + /* Oh well, hope we get a cap from the other * end and do better when that kicks us */ goto out; @@ -574,10 +572,8 @@ static void veth_statemachine(void *p) add_timer(&cnx->ack_timer); cnx->state |= VETH_STATE_READY; } else { - veth_printk(KERN_ERR, "Caps rejected (rc=%d) by " - "lpar %d\n", - cnx->cap_ack_event.base_event.xRc, - rlp); + veth_error("Caps rejected by LPAR %d, rc = %d\n", + rlp, cnx->cap_ack_event.base_event.xRc); goto cant_cope; } } @@ -590,8 +586,8 @@ static void veth_statemachine(void *p) /* FIXME: we get here if something happens we really can't * cope with. The link will never work once we get here, and * all we can do is not lock the rest of the system up */ - veth_error("Badness on connection to lpar %d (state=%04lx) " - " - shutting down\n", rlp, cnx->state); + veth_error("Unrecoverable error on connection to LPAR %d, shutting down" + " (state = 0x%04lx)\n", rlp, cnx->state); cnx->state |= VETH_STATE_SHUTDOWN; spin_unlock_irq(&cnx->lock); } @@ -623,7 +619,7 @@ static int veth_init_connection(u8 rlp) msgs = kmalloc(VETH_NUMBUFFERS * sizeof(struct veth_msg), GFP_KERNEL); if (! msgs) { - veth_error("Can't allocate buffers for lpar %d\n", rlp); + veth_error("Can't allocate buffers for LPAR %d.\n", rlp); return -ENOMEM; } @@ -639,8 +635,7 @@ static int veth_init_connection(u8 rlp) cnx->num_events = veth_allocate_events(rlp, 2 + VETH_NUMBUFFERS); if (cnx->num_events < (2 + VETH_NUMBUFFERS)) { - veth_error("Can't allocate events for lpar %d, only got %d\n", - rlp, cnx->num_events); + veth_error("Can't allocate enough events for LPAR %d.\n", rlp); return -ENOMEM; } @@ -898,15 +893,17 @@ static struct net_device * __init veth_p rc = register_netdev(dev); if (rc != 0) { - veth_printk(KERN_ERR, - "Failed to register ethernet device for vlan %d\n", - vlan); + veth_error("Failed registering net device for vlan%d.\n", vlan); free_netdev(dev); return NULL; } - veth_printk(KERN_DEBUG, "%s attached to iSeries vlan %d (lpar_map=0x%04x)\n", - dev->name, vlan, port->lpar_map); + veth_info("%s attached to iSeries vlan %d.\n", dev->name, vlan); + + for (i = 0; i < HVMAXARCHITECTEDLPS; i++) { + if (port->lpar_map & (1 << i)) + veth_info("%s connected to LPAR %d.\n", dev->name, i); + } return dev; } @@ -1039,7 +1036,7 @@ static int veth_start_xmit(struct sk_buf dev_kfree_skb(skb); } else { if (port->pending_skb) { - veth_error("%s: Tx while skb was pending!\n", + veth_error("%s: TX while skb was pending!\n", dev->name); dev_kfree_skb(skb); spin_unlock_irqrestore(&port->pending_gate, flags); @@ -1075,10 +1072,10 @@ static void veth_recycle_msg(struct veth memset(&msg->data, 0, sizeof(msg->data)); veth_stack_push(cnx, msg); - } else - if (cnx->state & VETH_STATE_OPEN) - veth_error("Bogus frames ack from lpar %d (#%d)\n", - cnx->remote_lp, msg->token); + } else if (cnx->state & VETH_STATE_OPEN) { + veth_error("Non-pending frame (# %d) acked by LPAR %d.\n", + cnx->remote_lp, msg->token); + } } static void veth_flush_pending(struct veth_lpar_connection *cnx) @@ -1188,8 +1185,8 @@ static void veth_flush_acks(struct veth_ 0, &cnx->pending_acks); if (rc != HvLpEvent_Rc_Good) - veth_error("Error 0x%x acking frames from lpar %d!\n", - (unsigned)rc, cnx->remote_lp); + veth_error("Failed acking frames from LPAR %d, rc = %d\n", + cnx->remote_lp, (int)rc); cnx->num_pending_acks = 0; memset(&cnx->pending_acks, 0xff, sizeof(cnx->pending_acks)); @@ -1225,9 +1222,10 @@ static void veth_receive(struct veth_lpa /* make sure that we have at least 1 EOF entry in the * remaining entries */ if (! (senddata->eofmask >> (startchunk + VETH_EOF_SHIFT))) { - veth_error("missing EOF frag in event " - "eofmask=0x%x startchunk=%d\n", - (unsigned) senddata->eofmask, startchunk); + veth_error("Missing EOF fragment in event " + "eofmask = 0x%x startchunk = %d\n", + (unsigned)senddata->eofmask, + startchunk); break; } @@ -1246,8 +1244,9 @@ static void veth_receive(struct veth_lpa /* nchunks == # of chunks in this frame */ if ((length - ETH_HLEN) > VETH_MAX_MTU) { - veth_error("Received oversize frame from lpar %d " - "(length=%d)\n", cnx->remote_lp, length); + veth_error("Received oversize frame from LPAR %d " + "(length = %d)\n", + cnx->remote_lp, length); continue; } From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:36 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMGH9001256 for ; Thu, 30 Jun 2005 03:22:18 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANJes241770 for ; Thu, 30 Jun 2005 20:23:19 +1000 Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.250.237]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANbdu114830 for ; Thu, 30 Jun 2005 20:23:37 +1000 Received: from d23av04.au.ibm.com (loopback [127.0.0.1]) by d23av04.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKgIP007502 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av04.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgP3007469; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 4F41473686; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 3/12] iseries_veth: Make init_connection() & destroy_connection() symmetrical In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.290253.340047065213.qpatch@concordia> X-archive-position: 2573 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 1923 Lines: 70 This patch makes veth_init_connection() and veth_destroy_connection() symmetrical in that they allocate/deallocate the same data. Currently if there's an error while initialising connections (ie. ENOMEM) we call veth_module_cleanup(), however this will oops because we call driver_unregister() before we've called driver_register(). I've never seen this actually happen though. So instead we explicitly call veth_destroy_connection() in a reverse loop for the connections we've successfully initialised. --- drivers/net/iseries_veth.c | 22 +++++++++++----------- 1 files changed, 11 insertions(+), 11 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -664,6 +664,14 @@ static void veth_stop_connection(u8 rlp) * been deleted by the state machine, just want to make sure * its not running any more */ del_timer_sync(&cnx->ack_timer); +} + +static void veth_destroy_connection(u8 rlp) +{ + struct veth_lpar_connection *cnx = veth_cnx[rlp]; + + if (! cnx) + return; if (cnx->num_events > 0) mf_deallocate_lp_events(cnx->remote_lp, @@ -675,14 +683,6 @@ static void veth_stop_connection(u8 rlp) HvLpEvent_Type_VirtualLan, cnx->num_ack_events, NULL, NULL); -} - -static void veth_destroy_connection(u8 rlp) -{ - struct veth_lpar_connection *cnx = veth_cnx[rlp]; - - if (! cnx) - return; kfree(cnx->msgs); kfree(cnx); @@ -1424,15 +1424,15 @@ module_exit(veth_module_cleanup); int __init veth_module_init(void) { - int i; - int rc; + int i, rc; this_lp = HvLpConfig_getLpIndex_outline(); for (i = 0; i < HVMAXARCHITECTEDLPS; ++i) { rc = veth_init_connection(i); if (rc != 0) { - veth_module_cleanup(); + for (; i >= 0; i--) + veth_destroy_connection(i); return rc; } } From michael@ellerman.id.au Thu Jun 30 03:22:29 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:38 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMPH9001333 for ; Thu, 30 Jun 2005 03:22:27 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANTes191838 for ; Thu, 30 Jun 2005 20:23:30 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.250.243]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANldu127280 for ; Thu, 30 Jun 2005 20:23:47 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKnDn027397 for ; Thu, 30 Jun 2005 20:20:49 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av02.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKgWt027307; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 1DFE0736CD; Thu, 30 Jun 2005 20:21:04 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:40 +1000 Date: Thu, 30 Jun 2005 20:20:40 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 11/12] iseries_veth: Add a per-connection ack timer In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126840.39112.35278125306.qpatch@concordia> X-archive-position: 2576 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 5187 Lines: 173 Currently the iseries_veth driver contravenes the specification in Documentation/networking/driver.txt, in that if packets are not acked by the other LPAR they will sit around forever. This patch adds a per-connection timer which fires if we've had no acks for five seconds. This is superior to the generic TX timer because it catches the case of a small number of packets being sent and never acked. --- drivers/net/iseries_veth.c | 75 +++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 69 insertions(+), 6 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -132,6 +132,11 @@ struct veth_lpar_connection { struct kref refcount; struct timer_list ack_timer; + struct timer_list reset_timer; + unsigned int reset_timeout; + unsigned long last_contact; + int outstanding_tx; + spinlock_t lock; unsigned long state; HvLpInstanceId src_inst; @@ -171,7 +176,8 @@ static int veth_start_xmit(struct sk_buf static void veth_recycle_msg(struct veth_lpar_connection *, struct veth_msg *); static void veth_flush_pending(struct veth_lpar_connection *cnx); static void veth_receive(struct veth_lpar_connection *, struct VethLpEvent *); -static void veth_timed_ack(unsigned long connectionPtr); +static void veth_timed_ack(unsigned long ptr); +static void veth_timed_reset(unsigned long ptr); /* * Utility functions @@ -353,7 +359,7 @@ static void veth_handle_int(struct VethL HvLpIndex rlp = event->base_event.xSourceLp; struct veth_lpar_connection *cnx = veth_cnx[rlp]; unsigned long flags; - int i; + int i, acked = 0; BUG_ON(! cnx); @@ -367,13 +373,22 @@ static void veth_handle_int(struct VethL break; case VethEventTypeFramesAck: spin_lock_irqsave(&cnx->lock, flags); + for (i = 0; i < VETH_MAX_ACKS_PER_MSG; ++i) { u16 msgnum = event->u.frames_ack_data.token[i]; - if (msgnum < VETH_NUMBUFFERS) + if (msgnum < VETH_NUMBUFFERS) { veth_recycle_msg(cnx, cnx->msgs + msgnum); + cnx->outstanding_tx--; + acked++; + } } + + if (acked > 0) + cnx->last_contact = jiffies; + spin_unlock_irqrestore(&cnx->lock, flags); + veth_flush_pending(cnx); break; case VethEventTypeFrames: @@ -447,8 +462,6 @@ static void veth_statemachine(void *p) restart: if (cnx->state & VETH_STATE_RESET) { - int i; - if (cnx->state & VETH_STATE_OPEN) HvCallEvent_closeLpEventPath(cnx->remote_lp, HvLpEvent_Type_VirtualLan); @@ -467,15 +480,20 @@ static void veth_statemachine(void *p) | VETH_STATE_SENTCAPACK | VETH_STATE_READY); /* Clean up any leftover messages */ - if (cnx->msgs) + if (cnx->msgs) { + int i; for (i = 0; i < VETH_NUMBUFFERS; ++i) veth_recycle_msg(cnx, cnx->msgs + i); + } + cnx->outstanding_tx = 0; /* Drop the lock so we can do stuff that might sleep or * take other locks. */ spin_unlock_irq(&cnx->lock); del_timer_sync(&cnx->ack_timer); + del_timer_sync(&cnx->reset_timer); + veth_flush_pending(cnx); spin_lock_irq(&cnx->lock); @@ -628,9 +646,16 @@ static int veth_init_connection(u8 rlp) cnx->remote_lp = rlp; spin_lock_init(&cnx->lock); INIT_WORK(&cnx->statemachine_wq, veth_statemachine, cnx); + init_timer(&cnx->ack_timer); cnx->ack_timer.function = veth_timed_ack; cnx->ack_timer.data = (unsigned long) cnx; + + init_timer(&cnx->reset_timer); + cnx->reset_timer.function = veth_timed_reset; + cnx->reset_timer.data = (unsigned long) cnx; + cnx->reset_timeout = 5 * HZ * (VETH_ACKTIMEOUT / 1000000); + memset(&cnx->pending_acks, 0xff, sizeof (cnx->pending_acks)); veth_cnx[rlp] = cnx; @@ -937,6 +962,13 @@ static int veth_transmit_to_one(struct s if (rc != HvLpEvent_Rc_Good) goto recycle_and_drop; + /* If the timer's not already running, start it now. */ + if (0 == cnx->outstanding_tx) + mod_timer(&cnx->reset_timer, jiffies + cnx->reset_timeout); + + cnx->last_contact = jiffies; + cnx->outstanding_tx++; + spin_unlock_irqrestore(&cnx->lock, flags); return 0; @@ -1081,6 +1113,37 @@ static void veth_flush_pending(struct ve } } +static void veth_timed_reset(unsigned long ptr) +{ + struct veth_lpar_connection *cnx = (struct veth_lpar_connection *)ptr; + unsigned long trigger_time, flags; + + /* FIXME is it possible this fires after veth_stop_connection()? + * That would reschedule the statemachine for 5 seconds and probably + * execute it after the module's been unloaded. Hmm. */ + + spin_lock_irqsave(&cnx->lock, flags); + + if (cnx->outstanding_tx > 0) { + trigger_time = cnx->last_contact + cnx->reset_timeout; + + if (trigger_time < jiffies) { + cnx->state |= VETH_STATE_RESET; + veth_kick_statemachine(cnx); + veth_error("%d packets not acked by LPAR %d within %d " + "seconds, resetting.\n", + cnx->outstanding_tx, cnx->remote_lp, + cnx->reset_timeout / HZ); + } else { + /* Reschedule the timer */ + trigger_time = jiffies + cnx->reset_timeout; + mod_timer(&cnx->reset_timer, trigger_time); + } + } + + spin_unlock_irqrestore(&cnx->lock, flags); +} + /* * Rx path */ From michael@ellerman.id.au Thu Jun 30 03:22:20 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 03:22:37 -0700 (PDT) Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.com [202.81.18.186]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UAMGH9001255 for ; Thu, 30 Jun 2005 03:22:18 -0700 Received: from sd0208e0.au.ibm.com (d23rh904.au.ibm.com [202.81.18.202]) by ausmtp01.au.ibm.com (8.12.10/8.12.10) with ESMTP id j5UANJes302396 for ; Thu, 30 Jun 2005 20:23:19 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.250.244]) by sd0208e0.au.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5UANbdu136560 for ; Thu, 30 Jun 2005 20:23:37 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.12.11/8.13.3) with ESMTP id j5UAKgHG030683 for ; Thu, 30 Jun 2005 20:20:42 +1000 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av03.au.ibm.com (8.12.11/8.12.11) with ESMTP id j5UAKglf030662; Thu, 30 Jun 2005 20:20:42 +1000 Received: from concordia.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with SMTP id 7418D736CB; Thu, 30 Jun 2005 20:21:03 +1000 (EST) Received: by concordia.ozlabs.ibm.com (sSMTP sendmail emulation); Thu, 30 Jun 2005 20:20:39 +1000 Date: Thu, 30 Jun 2005 20:20:39 +1000 To: linuxppc64-dev@ozlabs.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org From: Michael Ellerman Subject: [PATCH 5/12] iseries_veth: Try to avoid pathological reset behaviour In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Message-Id: <1120126839.441162.530324669503.qpatch@concordia> X-archive-position: 2574 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: michael@ellerman.id.au Precedence: bulk X-list: netdev Content-Length: 3108 Lines: 81 The iseries_veth driver contains a state machine which is used to manage how connections are setup and neogotiated between LPARs. If one side of a connection resets for some reason, the two LPARs can get stuck in a race to re-setup the connection. This can lead to the connection being declared dead by one or both ends. In practice this happens ~8/10 times a connection is reset, although it's rare for connections to be reset. (an example here: http://michael.ellerman.id.au/files/misc/veth-trace.html) The core of the problem is that the end that resets the connection doesn't wait for the other end to become aware of the reset. So the resetting end starts setting the connection back up, and then receives a reset from the other end (which is the response to the initial reset). And so on. We're severely limited in what we can do to fix this. The protocol between LPARs is essentially fixed, as we have to interoperate with both OS/400 and old Linux drivers. Which also means we need a fix that only changes the code on one end. The only fix I've found given that, is to just blindly sleep for a bit when resetting the connection, in the hope that the other end will get itself sorted. Needless to say I'd love it if someone has a better idea. This does work, I've so far been unable to get it to break, whereas without the fix a reset of one end will lead to a dead connection ~8/10 times. --- drivers/net/iseries_veth.c | 23 +++++++++++++++++++++-- 1 files changed, 21 insertions(+), 2 deletions(-) Index: veth-dev/drivers/net/iseries_veth.c =================================================================== --- veth-dev.orig/drivers/net/iseries_veth.c +++ veth-dev/drivers/net/iseries_veth.c @@ -324,8 +324,12 @@ static void veth_take_monitor_ack(struct spin_lock_irqsave(&cnx->lock, flags); veth_debug("cnx %d: lost connection.\n", cnx->remote_lp); - cnx->state |= VETH_STATE_RESET; - veth_kick_statemachine(cnx); + /* Avoid kicking the statemachine once we're shutdown. + * It's unnecessary and it could break veth_stop_connection(). */ + if (! (cnx->state & VETH_STATE_SHUTDOWN)) { + cnx->state |= VETH_STATE_RESET; + veth_kick_statemachine(cnx); + } spin_unlock_irqrestore(&cnx->lock, flags); } @@ -483,6 +487,12 @@ static void veth_statemachine(void *p) if (cnx->state & VETH_STATE_RESET) goto restart; + + /* Hack, wait for the other end to reset itself. */ + if (! (cnx->state & VETH_STATE_SHUTDOWN)) { + schedule_delayed_work(&cnx->statemachine_wq, 5 * HZ); + goto out; + } } if (cnx->state & VETH_STATE_SHUTDOWN) @@ -667,6 +677,15 @@ static void veth_stop_connection(u8 rlp) veth_kick_statemachine(cnx); spin_unlock_irq(&cnx->lock); + /* There's a slim chance the reset code has just queued the + * statemachine to run in five seconds. If so we need to cancel + * that and requeue the work to run now. */ + if (cancel_delayed_work(&cnx->statemachine_wq)) { + spin_lock_irq(&cnx->lock); + veth_kick_statemachine(cnx); + spin_unlock_irq(&cnx->lock); + } + /* Wait for the state machine to run. */ flush_scheduled_work(); } From jgarzik@pobox.com Thu Jun 30 07:42:50 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 07:42:53 -0700 (PDT) Received: from mail.dvmed.net (mail.dvmed.net [216.237.124.58]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j5UEgnH9004335 for ; Thu, 30 Jun 2005 07:42:50 -0700 Received: from cpe-065-184-065-144.nc.res.rr.com ([65.184.65.144] helo=[10.10.10.88]) by mail.dvmed.net with esmtpsa (Exim 4.51 #1 (Red Hat Linux)) id 1Do0Dz-0001Ji-6J; Thu, 30 Jun 2005 14:41:08 +0000 Message-ID: <42C4047F.1000108@pobox.com> Date: Thu, 30 Jun 2005 10:41:03 -0400 From: Jeff Garzik User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: michael@ellerman.id.au CC: PPC64-dev , netdev@oss.sgi.com, LKML Subject: Re: [RFC/PATCH 0/12] Updates & bug fixes for iseries_veth network driver References: <200506302016.55125.michael@ellerman.id.au> In-Reply-To: <200506302016.55125.michael@ellerman.id.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2580 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: jgarzik@pobox.com Precedence: bulk X-list: netdev Content-Length: 343 Lines: 15 Michael Ellerman wrote: > Hi y'all, > > The following is a series of patches for the iseries_veth driver. > > They're not ready for merging yet, as we need to do more extensive testing. > However any feedback you have will be greatly appreciated. Note, make sure to CC me, and also the new netdev list (netdev@vger.kernel.org). Jeff From kaber@trash.net Thu Jun 30 17:44:44 2005 Received: with ECARTIS (v1.0.0; list netdev); Thu, 30 Jun 2005 17:44:46 -0700 (PDT) Received: from kaber.coreworks.de ([62.206.217.67]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j610ihH9021695 for ; Thu, 30 Jun 2005 17:44:43 -0700 Received: from localhost ([127.0.0.1]) by kaber.coreworks.de with esmtp (Exim 4.51) id 1Do9cY-00060o-Gk; Fri, 01 Jul 2005 02:43:06 +0200 Message-ID: <42C4919A.5000009@trash.net> Date: Fri, 01 Jul 2005 02:43:06 +0200 From: Patrick McHardy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050514 Debian/1.7.8-1 X-Accept-Language: en MIME-Version: 1.0 To: Patrick Jenkins CC: linux-kernel@vger.kernel.org, Maillist netdev Subject: Re: [PATCH] multipath routing algorithm, better patch References: In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-archive-position: 2583 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kaber@trash.net Precedence: bulk X-list: netdev Content-Length: 952 Lines: 30 Patrick Jenkins wrote: > Hi, > > The last patch wont work, this should. > > This patch assigns the multipath routing algorithm into the fib_info > struct's fib_mp_alg variable. Previously, the algorithm was always set to > IP_MP_ALG_NONE which was incorrect. This patch corrects the problem by > assigning the correct value when a fib_info is initialized. > > This patch was tested against kernel 2.6.12.1 for all multipath routing > algorithms (none, round robin, interface round robin, random, weighted > random). Multiple algorithms can be compiled in at once, so this patch is wrong. mp_alg is supplied by userspace: if (rta->rta_mp_alg) { mp_alg = *rta->rta_mp_alg; if (mp_alg < IP_MP_ALG_NONE || mp_alg > IP_MP_ALG_MAX) goto err_inval; } If it isn't set correctly its an iproute problem. Did you actually experience any problems? Regards Patrick