From owner-netdev@oss.sgi.com Fri Jun 1 06:09:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51D9lA16932 for netdev-outgoing; Fri, 1 Jun 2001 06:09:47 -0700 Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51D9fh16926 for ; Fri, 1 Jun 2001 06:09:42 -0700 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id XAA30991; Fri, 1 Jun 2001 23:09:21 +1000 X-Authentication-Warning: blackbird.intercode.com.au: jmorris owned process doing -bs Date: Fri, 1 Jun 2001 23:09:21 +1000 (EST) From: James Morris To: , cc: Wilmer van der Gaast Subject: [PATCH] ip_queue - netlink message handling oops fix Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk The patch below fixes a problem with the ip_queue module, where certain malformed-length netlink messages from userspace could cause a kernel oops during error reporting via netlink error messages. Any netlink messages arriving at the module are now silently dropped if they fail length validation. Thanks to Wilmer van der Gaast for discovering and reporting the problem. - James -- James Morris diff -urN linux-2.4.5.orig/net/ipv4/netfilter/ip_queue.c linux/net/ipv4/netfilter/ip_queue.c --- linux-2.4.5.orig/net/ipv4/netfilter/ip_queue.c Tue Dec 12 07:37:04 2000 +++ linux/net/ipv4/netfilter/ip_queue.c Fri Jun 1 22:25:17 2001 @@ -431,10 +431,15 @@ int status, type; struct nlmsghdr *nlh; + if (skb->len < sizeof(struct nlmsghdr)) + return; + nlh = (struct nlmsghdr *)skb->data; - if (nlh->nlmsg_len < sizeof(*nlh) - || skb->len < nlh->nlmsg_len - || nlh->nlmsg_pid <= 0 + if (nlh->nlmsg_len < sizeof(struct nlmsghdr) + || skb->len < nlh->nlmsg_len) + return; + + if(nlh->nlmsg_pid <= 0 || !(nlh->nlmsg_flags & NLM_F_REQUEST) || nlh->nlmsg_flags & NLM_F_MULTI) RCV_SKB_FAIL(-EINVAL); From owner-netdev@oss.sgi.com Fri Jun 1 06:15:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51DFjY17516 for netdev-outgoing; Fri, 1 Jun 2001 06:15:45 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51DFhh17508 for ; Fri, 1 Jun 2001 06:15:43 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 0BBCC1F70; Fri, 1 Jun 2001 09:15:37 -0400 (EDT) Message-ID: <3B179579.F9C9C721@mandrakesoft.com> Date: Fri, 01 Jun 2001 09:15:37 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bogdan Costescu Cc: Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , netdev@oss.sgi.com Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Bogdan Costescu wrote: > No way! If I implement a HA application which depends on link status, I > want the info to be accurate, I don't want to know that 30 seconds ago I > had good link. To tangent a little bit, and add netdev to the CC... The loss and regain of link status should be proactively signalled to userspace using netlink or something similar. Currently we have netif_carrier_{on,off,ok} but it is only passively checked. netif_carrier_{on,off} should probably schedule_task() to fire off a netlink message... For your HA application specifically, right now, I would suggest making sure your net driver calls netif_carrier_xxx correctly, then checking for IFF_RUNNING interface flag. IFF_RUNNING will disappear if the interface is up, but there is no carrier [as according to netif_carrier_ok]. -- Jeff Garzik | Disbelief, that's why you fail. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Fri Jun 1 06:20:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51DKBV18331 for netdev-outgoing; Fri, 1 Jun 2001 06:20:11 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51DK8h18325 for ; Fri, 1 Jun 2001 06:20:08 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id GAA02305; Fri, 1 Jun 2001 06:19:41 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15127.38509.495537.405210@pizda.ninka.net> Date: Fri, 1 Jun 2001 06:19:41 -0700 (PDT) To: Jeff Garzik Cc: Bogdan Costescu , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , netdev@oss.sgi.com Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: <3B179579.F9C9C721@mandrakesoft.com> References: <3B179579.F9C9C721@mandrakesoft.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Jeff Garzik writes: > For your HA application specifically, right now, I would suggest making > sure your net driver calls netif_carrier_xxx correctly, then checking > for IFF_RUNNING interface flag. IFF_RUNNING will disappear if the > interface is up, but there is no carrier [as according to > netif_carrier_ok]. Don't such HA apps need to run as root anyways? Regardless, I agree that, long term, the way to do this is via netlink. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Jun 1 06:28:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51DSBT19453 for netdev-outgoing; Fri, 1 Jun 2001 06:28:11 -0700 Received: from auemail1.firewall.lucent.com (auemail1.lucent.com [192.11.223.161]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51DS6h19445 for ; Fri, 1 Jun 2001 06:28:06 -0700 Received: from auemail1.firewall.lucent.com (localhost [127.0.0.1]) by auemail1.firewall.lucent.com (Switch-2.1.3/Switch-2.1.0) with ESMTP id f51DS5J07466 for ; Fri, 1 Jun 2001 09:28:05 -0400 (EDT) Received: from nc8220exchange.ral.lucent.com (h135-92-100-21.lucent.com [135.92.100.21]) by auemail1.firewall.lucent.com (Switch-2.1.3/Switch-2.1.0) with ESMTP id f51DS4e07436 for ; Fri, 1 Jun 2001 09:28:04 -0400 (EDT) Content-Class: urn:content-classes:message Subject: RE: regarding Redundancy in TCP / IP Stack MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Fri, 1 Jun 2001 09:28:04 -0400 Message-ID: X-MimeOLE: Produced By Microsoft Exchange V6.0.4418.65 X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: regarding Redundancy in TCP / IP Stack Thread-Index: AcDqaLudwNMt7CXVRsut3DEGlXY1MAANR2pw From: "Gregory Parrott" To: , Cc: , Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f51DS7h19447 Sender: owner-netdev@oss.sgi.com Precedence: bulk Bjorn is correct. You have to do a lot of work on your own. I posted questions regarding what I think is an exciting project - a hardware TCP/IP stack in silicon that has to play nicely with the existing Linux kernel stack - and received no replies from the gurus. (Jim, don't mean to slight you any; your comments have been valuable!) One book that I have found to help in regards to the networking layer is Linux Kernel Internals, Second Edition by Beck. The ISBN is 0-201-33143-8. See chapter 8. I plan on following up on Bjorn's links below. Greg Parrott Lucent Technologies -----Original Message----- From: Bjorn Hammarberg [mailto:Bjorn.Hammarberg@signal.uu.se] Sent: Friday, June 01, 2001 2:59 AM To: sakalra@hss.hns.com Cc: sndtrn27@hss.hns.com; netdev@oss.sgi.com Subject: Re: regarding Redundancy in TCP / IP Stack Although David's response was somewhat harsh you have to understand that this list consists mostly of people that do this on their spare time. Moreover, Linux is based on people doing things freely and, therefore, other people (newbies if you want) must be prepared to do a lot of work themselves or pay for it (unless it's a project that is so exciting that these gurus want to do just for the fun of it... ). Many people are certainly prepared to do others work if they get a fair compensation for their lost spare time. However, as a recent newbie myself I agree that it could be quite difficult to get the basics of the networking stack. Besides the actual source code these links have helped me a lot http://kernelnewbies.org/documents/ipnetworking/linuxipnetworking.html http://www.linuxdoc.org/LDP/khg/HyperNews/get/net/net-intro.html http://www.gnumonks.org/ftp/pub/doc/packet-journey-2.4.html Perhaps other people could contribute other links. One that I certainly would like to have is the link to the linux-hacker central where one can get information of just about anything about linux hacking; perhaps there is none. My experience is that this information is spread all over the place and many "promising" links points to non-existent pages. From owner-netdev@oss.sgi.com Fri Jun 1 06:39:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51Dddp20345 for netdev-outgoing; Fri, 1 Jun 2001 06:39:39 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51DdTh20336 for ; Fri, 1 Jun 2001 06:39:29 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f51DdQd13475; Fri, 1 Jun 2001 15:39:26 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id PAA18620; Fri, 1 Jun 2001 15:39:26 +0200 Date: Fri, 1 Jun 2001 15:39:26 +0200 (CEST) From: Bogdan Costescu To: Jeff Garzik cc: Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: <3B179579.F9C9C721@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 1 Jun 2001, Jeff Garzik wrote: > The loss and regain of link status should be proactively signalled to > userspace using netlink or something similar. [ For the general discussion ] I fully agree, but I just wanted to give an example of legit use from user space of _current_ values from hardware. > Currently we have > netif_carrier_{on,off,ok} but it is only passively checked. > netif_carrier_{on,off} should probably schedule_task() to fire off a > netlink message... [ Link status details ] Just that not all NICs have hardware support (and/or not all drivers use these facilities) for link status change notification using interrupts. Right now, most drivers _poll_ for media status and based on the poll rate, netif_carrier routines are (or should be) called. We can't make the poll rate very small for the general case, as MII access is time consuming (same discussion was some months ago when the bonding driver was updated). However, for users who know that they need this info to be more accurate (at the expense of CPU time), polling through ioctl's is the only solution. [ Back to general discussion ] So far, to the problem of too often access to hardware, 2 solutions were proposed: 1. cache the values. You can then let the user shoot him-/her-self in the foot by making too many ioctl calls. But this prevent any legit use of current hardware state. 2. rate limiting. You don't let the user access the hardware too often (to be defined), so he/she can't shoot his-/her-self in the foot. Legit use of current hardware state is possible. IMHO, solution 2 is much better. Can you find situations when it's not ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Fri Jun 1 07:02:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51E2JQ22206 for netdev-outgoing; Fri, 1 Jun 2001 07:02:19 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51E2Bh22189 for ; Fri, 1 Jun 2001 07:02:11 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f51E29d13911; Fri, 1 Jun 2001 16:02:09 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id QAA18753; Fri, 1 Jun 2001 16:02:09 +0200 Date: Fri, 1 Jun 2001 16:02:09 +0200 (CEST) From: Bogdan Costescu To: "David S. Miller" cc: Jeff Garzik , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: <15127.38509.495537.405210@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 1 Jun 2001, David S. Miller wrote: > Don't such HA apps need to run as root anyways? Not necessarily, but eventually you can let root (CAP_NET_ADMIN, anyway) go through without any limitations, root can bring down the system at will in other ways. In addition, the rate limiting solution allows a warning to be issued when the limit is exceeded, so that the poor sysadmin knows what hit him 8-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Fri Jun 1 07:16:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51EGTk23171 for netdev-outgoing; Fri, 1 Jun 2001 07:16:29 -0700 Received: from shell.cyberus.ca (shell.cyberus.ca [209.195.95.7]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51EGLh23163 for ; Fri, 1 Jun 2001 07:16:22 -0700 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id KAA11168; Fri, 1 Jun 2001 10:14:08 -0400 (EDT) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 1 Jun 2001 10:14:08 -0400 (EDT) From: jamal To: Bogdan Costescu cc: Jeff Garzik , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Jeff, Thanks for copying netdev. Wish more people would do that. On Fri, 1 Jun 2001, Bogdan Costescu wrote: > On Fri, 1 Jun 2001, Jeff Garzik wrote: > > > The loss and regain of link status should be proactively signalled to > > userspace using netlink or something similar. > > [ For the general discussion ] > I fully agree, but I just wanted to give an example of legit use from > user space of _current_ values from hardware. > > > Currently we have > > netif_carrier_{on,off,ok} but it is only passively checked. > > netif_carrier_{on,off} should probably schedule_task() to fire off a > > netlink message... > > [ Link status details ] > Just that not all NICs have hardware support (and/or not all drivers use > these facilities) for link status change notification using interrupts. > Right now, most drivers _poll_ for media status and based on the poll > rate, netif_carrier routines are (or should be) called. We can't make the > poll rate very small for the general case, as MII access is time > consuming (same discussion was some months ago when the bonding driver > was updated). However, for users who know that they need this info to be > more accurate (at the expense of CPU time), polling through ioctl's is the > only solution. Not really. One idea i have been toying with is to maintain hysteris or threshold of some form in dev_watchdog; example: if watchdog timer expires threshold times, you declare the link dead and send netif_carrier_off netlink message. On recovery, you send netif_carrier_on Assumption: If the tx path is blocked, more than likely the link is down. cheers, jamal From owner-netdev@oss.sgi.com Fri Jun 1 07:30:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51EUqZ24214 for netdev-outgoing; Fri, 1 Jun 2001 07:30:52 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51EUlh24209 for ; Fri, 1 Jun 2001 07:30:47 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f51EUgd14607; Fri, 1 Jun 2001 16:30:42 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id QAA19173; Fri, 1 Jun 2001 16:30:42 +0200 Date: Fri, 1 Jun 2001 16:30:42 +0200 (CEST) From: Bogdan Costescu To: jamal cc: Jeff Garzik , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 1 Jun 2001, jamal wrote: > Jeff, Thanks for copying netdev. Wish more people would do that. Shame on me, I should have thought of that too... I joined lkml only about 2 weeks ago because netdev related topics are sometimes discussed only there... > Not really. > > One idea i have been toying with is to maintain hysteris or threshold of > some form in dev_watchdog; AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please correct me!). So how do you sense link loss if you expect only high Rx traffic ? > example: if watchdog timer expires threshold times, you declare the link > dead and send netif_carrier_off netlink message. > On recovery, you send netif_carrier_on I assume that you mean "on recovery" as in "first succesful hard_start_xmit". > Assumption: > If the tx path is blocked, more than likely the link is down. Yes, but is this a good approximation ? I'm not saying that it's not, I'm merely asking for counter-arguments. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Fri Jun 1 08:13:12 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51FDCN27455 for netdev-outgoing; Fri, 1 Jun 2001 08:13:12 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51FDAh27449 for ; Fri, 1 Jun 2001 08:13:11 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f51FD9d15528; Fri, 1 Jun 2001 17:13:09 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id RAA19572; Fri, 1 Jun 2001 17:13:08 +0200 Date: Fri, 1 Jun 2001 17:13:08 +0200 (CEST) From: Bogdan Costescu To: Alan Cox cc: Jeff Garzik , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (for In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk [ OK, this time I cc'ed netdev 8-) ] On Fri, 1 Jun 2001, Alan Cox wrote: > Please re-read your comment. Then think about it. Then tell me how rate > limiting differs from caching to the application. For caching, the kernel establishes the rate with which the info is updated. There's nothing wrong, but how is the application to know if the value is actual or cached (from when, until when) ? That means that a single application that needs data more often than the caching rate will get bogus data and not know about it. With rate limiting, you always get new values, unless the limit is exceeded. When the limit is exceeded, you log and: - block any request until some timer is expired. The application can detect that it's been blocked and react. You can detect if there are several calls waiting and return the same value to all. - return error until some timer is expired. The application can again detect that. In both cases, the application is also capable of guessing the value of the delay. For one application which follows the rules (doesn't need data more often than the caching rate or doesn't exceed the rate limit) there is no difference, I agree. But when somebody is playing tricks while you need data, you have the chance of detecting this by using rate limits. And yes, I agree that either of them (cache or rate limit) should be modifiable through proc entry/ioctl/whatever. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Fri Jun 1 08:43:31 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51FhVO30343 for netdev-outgoing; Fri, 1 Jun 2001 08:43:31 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51FhUh30338 for ; Fri, 1 Jun 2001 08:43:30 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.0/8.11.0) with ESMTP id f51FtKr30448; Fri, 1 Jun 2001 08:55:20 -0700 Message-ID: <3B17BAE8.8454DDC6@candelatech.com> Date: Fri, 01 Jun 2001 08:55:20 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: sakalra@hss.hns.com CC: netdev@oss.sgi.com, sndtrn27@hss.hns.com Subject: Re: regarding Redundancy in TCP / IP Stack References: <65256A5E.001D9A58.00@sandesh.hss.hns.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk sakalra@hss.hns.com wrote: > > hi all , list > We am novice to the Linux TCP / IP stack arch, > At present i want to implement redundancy > at socket level in the Stack.. Can you please > help me with some docs, information in this regards What redundancy do you expect to get? What will you be able to do that we cannot do today? You mean keeping two stacks in sync across two different machines, so that you can hot-swap processes or something? I thought about this w/regard to building a VOIP box that could handle failover w/out dropping calls, but I decided that it was an intractable problem, and that there were probably other ways to get the functionality better. For example, on failover, grab packets right off the interface instead of letting them go up the stack and implement your own hacked up TCP/IP stack in user-space that is specifically designed to do what you want. This is pretty damn ugly, of course, but you might could keep the connections together. For VOIP in particular, most of your traffic is UDP anyway, so your problem is much more easily solved... > > we want to know > 1. The Data structures that are kept by the system for maintaining the > Connection. > 2. Kernel related data structures that are part of the TCP / IP stack. > 3. Any Documents, Links that can help us in getting with the procedure .as to > how it can be implemeted efficiently. > 4. Pros & cons in implementing such redundancy. > 5. kernel related other information as to which modules are interdependent to > this (If any). > 6. If any work is going in this regards, then what is the present status. & for > more detail whom shall then we refer to. > > regards > Sandeep , Rajiv -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Fri Jun 1 10:41:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51Hf9T03622 for netdev-outgoing; Fri, 1 Jun 2001 10:41:09 -0700 Received: from tinuviel.compendium.net.ar (usat2-00222.usateleport.com [208.248.183.222]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51Hf4h03619 for ; Fri, 1 Jun 2001 10:41:05 -0700 Received: by tinuviel.compendium.net.ar (Postfix, from userid 1000) id 236CF1967FF; Fri, 1 Jun 2001 14:40:51 -0300 (ART) Date: Fri, 1 Jun 2001 14:40:51 -0300 From: horape@tinuviel.compendium.net.ar To: netdev@oss.sgi.com Subject: why cannot bind to someipaddress:port when something else has *:port bound? Message-ID: <20010601144051.D16600@tinuviel.compendium.net.ar> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline User-Agent: Mutt/1.3.18i x-attribution: HoraPe Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f51Hf8h03620 Sender: owner-netdev@oss.sgi.com Precedence: bulk ¡Hola! The following program binds *:1000 to a socket, and then tries to bind 200.47.36.254:1000 to another socket, the error i gets is "Address already in use". Why? I am not asking for a "you're not allowed to do that", I know. I don't ask for a "why are you trying to do that?", I amn't trying. But I need to know why that's not permited. I know vaguely but i need a more sound explanation. A pointer to a mailing list/usenet archive where the subject was discussed in the past would be great. Just another time, i am asking for the theory about why that shouldn't be allowed. Not the fact that it's not allowed. Lots of thanks, HoraPe The code is: main() { l4(); l4esp(); select(0,NULL,NULL,NULL,NULL); } int l4() { int listenfd; struct sockaddr_in cliaddr, servaddr; socklen_t clilen; listenfd = socket(AF_INET, SOCK_STREAM, 0); if(listenfd < 0) die(); memset(&servaddr, 0, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl(INADDR_ANY); servaddr.sin_port = htons(1000); if(bind(listenfd, (struct sockaddr*) &servaddr, sizeof(servaddr)) != 0) die(); if(listen(listenfd, 10) != 0) die(); } int l4esp() { int listenfd; struct sockaddr_in cliaddr, servaddr; socklen_t clilen; listenfd = socket(AF_INET, SOCK_STREAM, 0); if(listenfd < 0) die(); memset(&servaddr, 0, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl(INADDR_ANY); servaddr.sin_port = htons(1000); if(bind(listenfd, (struct sockaddr*) &servaddr, sizeof(servaddr)) != 0) die(); if(listen(listenfd, 10) != 0) die(); } int l4esp() { int listenfd; struct sockaddr_in cliaddr, servaddr; socklen_t clilen; listenfd = socket(AF_INET, SOCK_STREAM, 0); if(listenfd < 0) die(); memset(&servaddr, 0, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl(0xc82f24fe); servaddr.sin_port = htons(1000); if(bind(listenfd, (struct sockaddr*) &servaddr, sizeof(servaddr)) != 0) die(); if(listen(listenfd, 10) != 0) die(); } die() { printf("die %s\n", strerror(errno)); } From owner-netdev@oss.sgi.com Fri Jun 1 11:10:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51IAT105235 for netdev-outgoing; Fri, 1 Jun 2001 11:10:29 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51IARh05228 for ; Fri, 1 Jun 2001 11:10:28 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 155tNZ-0005IZ-00 for netdev@oss.sgi.com; Fri, 01 Jun 2001 20:10:33 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 155tp3-0007jn-00; Fri, 01 Jun 2001 15:38:57 -0300 Date: Fri, 1 Jun 2001 15:38:56 -0300 From: Harald Welte To: sakalra@hss.hns.com Cc: netdev@oss.sgi.com Subject: Re: regarding Redundancy in TCP / IP Stack Message-ID: <20010601153856.M29571@obroa-skai.gnumonks.org> References: <65256A5E.001FB868.00@sandesh.hss.hns.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <65256A5E.001FB868.00@sandesh.hss.hns.com>; from sakalra@hss.hns.com on Fri, Jun 01, 2001 at 11:31:07AM +0530 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.2 X-Date: Today is Boomtime, the 6th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Jun 01, 2001 at 11:31:07AM +0530, sakalra@hss.hns.com wrote: > > > hi david > it is not that way, we have this project...nothing to start with ...just a pc > with Linux 6.2 installed > how to go for that ... i told u .. we am totally novice to the world of linux > and also time constain is also there > we are with just 1 months time . then don't do the project. You may get this to work with a experienced linux kernel-level networking hacker, but even then one month is unrealistic. You cannot just assume to learn everything about linux and linux networking in one month. Forget it. Most linux kernel hackers (network hackers) have spent years of unix administration/user-space programming, before they start trying to understand (and hack) the kernel. And yes, there is no documentation. There is the sourcecode, and you can almost conclude anything from it. > yes, the information i asked for was almost all that is needed to implement > that > but i dont know as to what extend ppl here can help us . they will assist you in particular questions where you don't understand a particular piece of sourcecode. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Fri Jun 1 11:10:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51IATY05241 for netdev-outgoing; Fri, 1 Jun 2001 11:10:29 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51IARh05229 for ; Fri, 1 Jun 2001 11:10:28 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 155tNZ-0005Id-00 for netdev@oss.sgi.com; Fri, 01 Jun 2001 20:10:33 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 155tsb-0007k2-00; Fri, 01 Jun 2001 15:42:37 -0300 Date: Fri, 1 Jun 2001 15:42:37 -0300 From: Harald Welte To: "Serge Maandag" Cc: netdev@oss.sgi.com Subject: Re: removing ip aliases Message-ID: <20010601154237.N29571@obroa-skai.gnumonks.org> References: <1C48875BDE7ED0469485A5FD49925C4ADECDD7@zmx.staff.zeelandnet.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <1C48875BDE7ED0469485A5FD49925C4ADECDD7@zmx.staff.zeelandnet.nl>; from serge.maandag@staff.zeelandnet.nl on Wed, May 23, 2001 at 09:52:26PM +0200 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.2 X-Date: Today is Boomtime, the 6th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, May 23, 2001 at 09:52:26PM +0200, Serge Maandag wrote: > Does anybody know how I remove ip aliases on a redhat 6.0 machine? > ifconfig eth1:1 down downs eth1 and all it's aliases. > ifdown doesn't eat aliases. what about using the 'real' tool instead of the backwards-compatible ifconfig? like 'ip addr del xxx.xxx.xxx/xx dev eth1' ? > Serge. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Fri Jun 1 17:28:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f520SKR19316 for netdev-outgoing; Fri, 1 Jun 2001 17:28:20 -0700 Received: from almesberger.net (IDENT:root@lsb-catv-1-p021.vtxnet.ch [212.147.5.21]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f520SIh19312 for ; Fri, 1 Jun 2001 17:28:18 -0700 Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id CAA15738; Sat, 2 Jun 2001 02:27:58 +0200 Date: Sat, 2 Jun 2001 02:27:58 +0200 From: Werner Almesberger To: Harald Welte Cc: netdev@oss.sgi.com Subject: Re: regarding Redundancy in TCP / IP Stack Message-ID: <20010602022758.D14893@almesberger.net> References: <65256A5E.001FB868.00@sandesh.hss.hns.com> <20010601153856.M29571@obroa-skai.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010601153856.M29571@obroa-skai.gnumonks.org>; from laforge@gnumonks.org on Fri, Jun 01, 2001 at 03:38:56PM -0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Harald Welte wrote: > You cannot just assume to learn everything about linux and linux networking > in one month. Forget it. I think all those movies that begin by showing a boy longingly looking at the moon, and less than an hour later, we see the same character - perhaps a bit older in virtual time - navigate some spacecraft, really ought to be banned ;-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Lausanne, CH wa@almesberger.net / /_http://icawww.epfl.ch/almesberger/_____________________________________/ From owner-netdev@oss.sgi.com Fri Jun 1 17:48:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f520mbY24176 for netdev-outgoing; Fri, 1 Jun 2001 17:48:37 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f520mah24173 for ; Fri, 1 Jun 2001 17:48:36 -0700 Received: from almesberger.net (lsb-catv-1-p021.vtxnet.ch [212.147.5.21]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id RAA09837 for ; Fri, 1 Jun 2001 17:48:33 -0700 (PDT) mail_from (almesber@almesberger.net) Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id CAA15738; Sat, 2 Jun 2001 02:27:58 +0200 Date: Sat, 2 Jun 2001 02:27:58 +0200 From: Werner Almesberger To: Harald Welte Cc: netdev@oss.sgi.com Subject: Re: regarding Redundancy in TCP / IP Stack Message-ID: <20010602022758.D14893@almesberger.net> References: <65256A5E.001FB868.00@sandesh.hss.hns.com> <20010601153856.M29571@obroa-skai.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010601153856.M29571@obroa-skai.gnumonks.org>; from laforge@gnumonks.org on Fri, Jun 01, 2001 at 03:38:56PM -0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Harald Welte wrote: > You cannot just assume to learn everything about linux and linux networking > in one month. Forget it. I think all those movies that begin by showing a boy longingly looking at the moon, and less than an hour later, we see the same character - perhaps a bit older in virtual time - navigate some spacecraft, really ought to be banned ;-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Lausanne, CH wa@almesberger.net / /_http://icawww.epfl.ch/almesberger/_____________________________________/ From owner-netdev@oss.sgi.com Sat Jun 2 01:20:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f528KUs02322 for netdev-outgoing; Sat, 2 Jun 2001 01:20:30 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f528KSh02317 for ; Sat, 2 Jun 2001 01:20:28 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f528KQd29402; Sat, 2 Jun 2001 10:20:27 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id KAA22240; Sat, 2 Jun 2001 10:20:26 +0200 Date: Sat, 2 Jun 2001 10:20:26 +0200 (CEST) From: Bogdan Costescu To: Alan Cox cc: Mark Frazer , Jeff Garzik , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (for In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 1 Jun 2001, Alan Cox wrote: > No the application gets back no data, ever, because a third party application > keeps beating it. You don't even need maliciousness for this, synchronization > effects and locking on the file will ensure it gets you in the end Sure, but as I already wrote, you can detect that something is wrong. Then shoot the person! > > With caching, you'd have to let the application know when the cached > > value was last read and how long it will be cached for. With rate > > fstat() mtime. That seems easy enough This only answered the first part of the question: when. How do you pass the "how long" info ? Does the same applies for the MII ioctl case ? Now let's talk about implementation issues. For the MII case, you have several registers that have to be read. The way it generally done is for the ioctl to pass MII address and register number and receive back the value. Caching means that the driver (I don't think that it can be done at higher levels) has to keep track of accesses to all MII interfaces (yes, there can be more than one on a NIC) and all of their registers. One solution is to read all registers at once and start the cache timer for each MII register access. Another solution is to have each register start its own cache timer. OTOH, ioctl rate limiting can be done at higher level and you need only one timer per netdevice. So, it's done once and all net drivers benefit from it. Guess which one I prefer... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Sat Jun 2 02:03:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f5293RH10140 for netdev-outgoing; Sat, 2 Jun 2001 02:03:27 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f5293Ph10136 for ; Sat, 2 Jun 2001 02:03:26 -0700 Received: (qmail 14939 invoked from network); 2 Jun 2001 09:03:19 -0000 Received: from pd9502488.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.80.36.136) by mail.bieringer.de with SMTP; 2 Jun 2001 09:03:19 -0000 Date: Sat, 02 Jun 2001 11:03:24 +0200 From: Peter Bieringer To: Maillist netdev cc: Maillist linux-ipv6 , Maillist USAGI-users Subject: IPv6+2.4.x: ipv6_local_port_range implementation plans + netfilter6 Message-ID: <14800000.991472604@localhost> X-Mailer: Mulberry/2.0.8 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi all, are there any plans to implement "ipv6_local_port_range" in the future like on IPv4? BTW: for all the IPv6 freaks and kernel 2.4 users: I've already brought netfilter6 on my gateway server up to life (thanks to netfilter & ext3 developers) and add some hints in my HowTo relating to this: http://www.bieringer.de/linux/IPv6/IPv6-HOWTO/IPv6-HOWTO-8.html netfilter6 is currently not fullfeatured and work in progress, but packet filtering works and for security issues it's really better than nothing! I recommend to insert following rules on (tunnel) interface(s) to block incoming TCP connections requests like: ip6tables -I INPUT -i sit+ -p tcp --syn -j DROP ip6tables -I FORWARD -i sit+ -p tcp --syn -j DROP Modify the "-i" option, if your interface(s) to the global IPv6 network are named different. Peter From owner-netdev@oss.sgi.com Sat Jun 2 07:51:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52EpHf02967 for netdev-outgoing; Sat, 2 Jun 2001 07:51:17 -0700 Received: from shell.cyberus.ca (shell.cyberus.ca [209.195.95.7]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52Ep8h02937 for ; Sat, 2 Jun 2001 07:51:08 -0700 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id KAA13667; Sat, 2 Jun 2001 10:49:01 -0400 (EDT) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sat, 2 Jun 2001 10:49:01 -0400 (EDT) From: jamal To: Bogdan Costescu cc: Jeff Garzik , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 1 Jun 2001, Bogdan Costescu wrote: > On Fri, 1 Jun 2001, jamal wrote: > > > One idea i have been toying with is to maintain hysteris or threshold of > > some form in dev_watchdog; > > AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please > correct me!). So how do you sense link loss if you expect only high Rx > traffic ? > Good question. Makes me think. Thoughts further below. > > example: if watchdog timer expires threshold times, you declare the link > > dead and send netif_carrier_off netlink message. > > On recovery, you send netif_carrier_on > > I assume that you mean "on recovery" as in "first succesful hard_start_xmit". > right. > > Assumption: > > If the tx path is blocked, more than likely the link is down. > > Yes, but is this a good approximation ? I'm not saying that it's not, I'm > merely asking for counter-arguments. It is an indirect approximation. Note that if the system data is very asymetrical as in the case you pointed out, notification will take a long long time. You need a plan B. Still, the tx watchdogs are a good source of fault detection in the case of non-availabilty of MII detection and even with the presence of MII. I hate making this more complex than it should be: Since we already have a messaging system within the kernel and user<->kernel space aka "netlink" -- one could easily add a protocol in user space which "dynamically heartbeats" the devices. Control should come from user space; it would be a great idea to avoid ioctls. "Dynamic" in the above sense means trying to totaly avoid making it a synchronous poll. The poll rate is a function of how many packets go out that device per average measurement time. Basically, the period that the user space app dumps "hello" netlink packets to the kernel is a variable. cheers, jamal From owner-netdev@oss.sgi.com Sat Jun 2 08:35:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52FZR610683 for netdev-outgoing; Sat, 2 Jun 2001 08:35:27 -0700 Received: from penguin.engin.umich.edu (IDENT:wingc@penguin.engin.umich.edu [141.213.33.36]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52FZNh10674 for ; Sat, 2 Jun 2001 08:35:23 -0700 Received: from localhost (wingc@localhost) by penguin.engin.umich.edu (8.9.3/8.9.3) with ESMTP id LAA09165; Sat, 2 Jun 2001 11:35:14 -0400 X-Authentication-Warning: penguin.engin.umich.edu: wingc owned process doing -bs Date: Sat, 2 Jun 2001 11:35:14 -0400 (EDT) From: Chris Wing To: Bogdan Costescu cc: netdev@oss.sgi.com Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Bogdan: > So far, to the problem of too often access to hardware, 2 solutions were > proposed: > 1. cache the values. You can then let the user shoot him-/her-self in the > foot by making too many ioctl calls. But this prevent any legit use of > current hardware state. > 2. rate limiting. You don't let the user access the hardware too often (to > be defined), so he/she can't shoot his-/her-self in the foot. Legit use > of current hardware state is possible. Why not provide cached data for unprivileged readers and only talk to the hardware when a process with the appropriate capability makes a request? All you'd need to do this is the memory required to store the cached data and a timestamp. (an unprivileged read would only update the cached data when it had exceeded a set age; this would provide rate limiting) You could do this entirely in user space too, just keep a daemon running that periodically makes queries and forwards the results. (and make it impossible for non-privileged users to ask the kernel at all) -Chris Wing wingc@engin.umich.edu From owner-netdev@oss.sgi.com Sat Jun 2 09:17:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52GHMl16444 for netdev-outgoing; Sat, 2 Jun 2001 09:17:22 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52GHJh16441 for ; Sat, 2 Jun 2001 09:17:20 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 156E3K-0001sJ-00; Sat, 02 Jun 2001 17:15:02 +0100 Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (for To: bogdan.costescu@iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sat, 2 Jun 2001 17:15:02 +0100 (BST) Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), mark@somanetworks.com (Mark Frazer), jgarzik@mandrakesoft.com (Jeff Garzik), zaitcev@redhat.com (Pete Zaitcev), linux-kernel@vger.kernel.org (Linux Kernel Mailing List), netdev@oss.sgi.com In-Reply-To: from "Bogdan Costescu" at Jun 02, 2001 10:20:26 AM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk > > keeps beating it. You don't even need maliciousness for this, synchronization > > effects and locking on the file will ensure it gets you in the end > > Sure, but as I already wrote, you can detect that something is wrong. Then > shoot the person! How does that solve the problem ? > > fstat() mtime. That seems easy enough > > This only answered the first part of the question: when. How do you pass > the "how long" info ? > Does the same applies for the MII ioctl case ? The mtime tells you exactly that. > Caching means that the driver (I don't think that it can be done at > higher levels) has to keep track of accesses to all MII interfaces (yes, > there can be more than one on a NIC) and all of their registers. One I disagree. A non priviledged app should not be able to poke around in MII registers anyway. So you only have to cache the generic state of the link. > each MII register access. Another solution is to have each register start > its own cache timer. You don't need timers. > OTOH, ioctl rate limiting can be done at higher level and you need only > one timer per netdevice. So, it's done once and all net drivers benefit > from it. You don't need any timers if you are caching. Zilch nada none. You know the last time a query came in. The mtime lets the app know the last time the value was modified. Alan From owner-netdev@oss.sgi.com Sat Jun 2 09:19:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52GJWP16757 for netdev-outgoing; Sat, 2 Jun 2001 09:19:32 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52GJUh16754 for ; Sat, 2 Jun 2001 09:19:31 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f52GJTd03764; Sat, 2 Jun 2001 18:19:29 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id SAA22763; Sat, 2 Jun 2001 18:19:29 +0200 Date: Sat, 2 Jun 2001 18:19:29 +0200 (CEST) From: Bogdan Costescu To: Chris Wing cc: Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 2 Jun 2001, Chris Wing wrote: > Why not provide cached data for unprivileged readers and only talk to the > hardware when a process with the appropriate capability makes a request? Yes, I already proposed this in a reply to Dave Miller. > All you'd need to do this is the memory required to store the cached data > and a timestamp. (an unprivileged read would only update the cached data > when it had exceeded a set age; this would provide rate limiting) My last message expressed my oppinion about the implementation issues - caching would need to many resources and/or changes. However, I might be wrong, I'm not an expert in kernel programming 8-) > You could do this entirely in user space too, just keep a daemon running > that periodically makes queries and forwards the results. (and make it > impossible for non-privileged users to ask the kernel at all) That was something that I also "envisioned" when I proposed the unlimited root (or CAP_NET_ADMIN) access. However, the initial problem was access not only to MII registers, but also to some other hardware. The discussion tried to be generic enough to cover all these cases and I don't think that a daemon to handle MII, batery status and others all at once would be a good ideea; neither the ideea to have a separate daemon for each of these... However, if we decide that MII is a special case that could be solved this way, it fine by me. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Sat Jun 2 12:10:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52JAql04914 for netdev-outgoing; Sat, 2 Jun 2001 12:10:52 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52JAoh04902 for ; Sat, 2 Jun 2001 12:10:50 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f52JAid05355; Sat, 2 Jun 2001 21:10:45 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id VAA23003; Sat, 2 Jun 2001 21:10:44 +0200 Date: Sat, 2 Jun 2001 21:10:44 +0200 (CEST) From: Bogdan Costescu To: Alan Cox cc: Mark Frazer , Jeff Garzik , Pete Zaitcev , Linux Kernel Mailing List , Subject: MII access (was [PATCH] support for Cobalt Networks (x86 only) systems) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk [ As this is becoming more and more MII specific, I changed the subject ] On Sat, 2 Jun 2001, Alan Cox wrote: > > This only answered the first part of the question: when. How do you pass > > the "how long" info ? > > Does the same applies for the MII ioctl case ? > > The mtime tells you exactly that. Alan, please consider this situation: One application needs to poll link status with 1 second resolution. On a system where caching is done with an unknown cache expiring time, this application is sometimes fed incorrect data. So, you need a way to tell for how long this situation lasts. If you have a proc/ioctl interface for setting cache expiring time, this same interface can then be used for reading back this info. This application can then check that this value is lower than 1 second and if not, notify the user that it cannot run. As this thread started as a general hardware access problem, would only _one_ value for all these cases be sufficient ? Or each case should have its own timeout ? Anyway, for MII, accessing the status at sub-second intervals might be a legit one, so what measuring units should be used? > I disagree. A non priviledged app should not be able to poke around in MII > registers anyway. So you only have to cache the generic state of the link. At the beginning of this thread, Jeff said "calling the ioctls without priveleges is quite useful". Now if you say that there is no such case, the whole problem could simply be solved by checking for the appropriate priviledges. I just realized another thing, important (IMHO) if a normal user is still allowed to access MII: the drivers (checked for 3c59x, eepro100, tulip) do not verify that the value passed for register number is within the allowed range and use it as: int read_cmd = (0xf6 << 10) | ((phy_id & 0x1f) << 5) | location; (phy_id is the MII address and location is the register number). There is also no check that the MII address specified is actually in use by the driver, but this is used with mii-diag to query a MII which was not correctly identified (maybe this should be allowed for CAP_NET_ADMIN only ?) >From one of Don Becker pages: "MII transceivers have 32 management registers. The first 16 are reserved for standard-defined uses, and the remaining one are available for chip-specific features. Only the first seven registers are currently defined." Usually, the transceivers return garbage if you read from locations you are not supposed to (overwritting phy_ad). But if you begin overwritting the READ command (0xf6 above)... Something like this should do: int read_cmd = (0xf6 << 10) | ((phy_id & 0x1f) << 5) | (location & 0x1f); > You don't need timers. Too tired to think straight yesterday... You're right. And if you alloc 32*sizeof(int) (you want to keep jiffies, right ?) per netdevice, I think that it could even be done outside the driver. Hmm, most of my previous arguments are no longer valid 8-( -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Sat Jun 2 12:28:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52JSGI09444 for netdev-outgoing; Sat, 2 Jun 2001 12:28:16 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52JSDh09433 for ; Sat, 2 Jun 2001 12:28:14 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 156H20-00023o-00; Sat, 02 Jun 2001 20:25:52 +0100 Subject: Re: MII access (was [PATCH] support for Cobalt Networks (x86 only) To: bogdan.costescu@iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sat, 2 Jun 2001 20:25:52 +0100 (BST) Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), mark@somanetworks.com (Mark Frazer), jgarzik@mandrakesoft.com (Jeff Garzik), zaitcev@redhat.com (Pete Zaitcev), linux-kernel@vger.kernel.org (Linux Kernel Mailing List), netdev@oss.sgi.com In-Reply-To: from "Bogdan Costescu" at Jun 02, 2001 09:10:44 PM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk > One application needs to poll link status with 1 second resolution. On a Then it needs to be privileged > for how long this situation lasts. If you have a proc/ioctl interface for > setting cache expiring time, this same interface can then be used for > reading back this info. This application can then check that this value is > lower than 1 second and if not, notify the user that it cannot run. And if the approach is to block until the time for the next read occurs is done then the program get stuck for 30 seconds, misses its deadline and kills the cluster - how is this better ?? > Usually, the transceivers return garbage if you read from locations you > are not supposed to (overwritting phy_ad). But if you begin overwritting > the READ command (0xf6 above)... Something like this should do: Some of them just hang. > Too tired to think straight yesterday... You're right. And if you alloc > 32*sizeof(int) (you want to keep jiffies, right ?) per netdevice, I think > that it could even be done outside the driver. Hmm, most of my > previous arguments are no longer valid 8-( Doing the MII monitoring somewhere centralised like the routing daemons would certainly let more inteillgent management and reporting get done From owner-netdev@oss.sgi.com Sat Jun 2 14:36:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52Labu31431 for netdev-outgoing; Sat, 2 Jun 2001 14:36:37 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52LaZh31428 for ; Sat, 2 Jun 2001 14:36:35 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f52LaXd06921; Sat, 2 Jun 2001 23:36:33 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id XAA23172; Sat, 2 Jun 2001 23:36:33 +0200 Date: Sat, 2 Jun 2001 23:36:33 +0200 (CEST) From: Bogdan Costescu To: Alan Cox cc: Mark Frazer , Jeff Garzik , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: MII access (was [PATCH] support for Cobalt Networks (x86 only) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 2 Jun 2001, Alan Cox wrote: > > One application needs to poll link status with 1 second resolution. On a > > Then it needs to be privileged Fine. Can you think of a default value for expiring cache ? > And if the approach is to block until the time for the next read occurs is > done then the program get stuck for 30 seconds, misses its deadline and kills > the cluster - how is this better ?? Is not better. Well, when somebody is playing against you, you're in trouble either way: - rate limit: - blocking - as above - non-blocking - notify the user that you can't get the info and probably stop or aquire elevated priviledges and try to restart the network - cache: get outdated info But when a HA application runs, it's usually preferable to stop (and you notice it) than to continue with wrong data. Especially if you set the cache expiry to something like 30 seconds; think in terms of how many transactions/second today's hardware allows... > Doing the MII monitoring somewhere centralised like the routing daemons would > certainly let more inteillgent management and reporting get done I don't argue over this point, already several people mentioned it. But I explained the present situation in a previous message: the MII info is normally read at a low rate and some applications need it more often. It doesn't matter that it's delivered through ioctl, netlink or any other way, you have to read it from the hardware and deliver to user-space at user request. So the "doing the MII monitoring" is the tough part. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Sat Jun 2 14:39:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f52LdLY31783 for netdev-outgoing; Sat, 2 Jun 2001 14:39:21 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f52LdJh31780 for ; Sat, 2 Jun 2001 14:39:20 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 156J4v-0002BA-00; Sat, 02 Jun 2001 22:37:01 +0100 Subject: Re: MII access (was [PATCH] support for Cobalt Networks (x86 only) To: bogdan.costescu@iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sat, 2 Jun 2001 22:37:01 +0100 (BST) Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), mark@somanetworks.com (Mark Frazer), jgarzik@mandrakesoft.com (Jeff Garzik), zaitcev@redhat.com (Pete Zaitcev), linux-kernel@vger.kernel.org (Linux Kernel Mailing List), netdev@oss.sgi.com In-Reply-To: from "Bogdan Costescu" at Jun 02, 2001 11:36:33 PM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk > > Then it needs to be privileged > > Fine. Can you think of a default value for expiring cache ? Yeah .. so long as its a default and tunable in /proc. > From owner-netdev@oss.sgi.com Sat Jun 2 18:36:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f531asl07739 for netdev-outgoing; Sat, 2 Jun 2001 18:36:54 -0700 Received: from exchange1.FalconStor.Net ([63.122.122.66]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f531arh07736 for ; Sat, 2 Jun 2001 18:36:53 -0700 Received: by exchange1.FalconStor.Net with Internet Mail Service (5.5.2653.19) id ; Sat, 2 Jun 2001 21:36:42 -0400 Message-ID: From: ReiJane Huai To: netdev@oss.sgi.com Subject: Date: Sat, 2 Jun 2001 21:36:37 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="big5" Sender: owner-netdev@oss.sgi.com Precedence: bulk From owner-netdev@oss.sgi.com Sun Jun 3 00:33:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f537XQh12544 for netdev-outgoing; Sun, 3 Jun 2001 00:33:26 -0700 Received: from marduk.litech.org (IDENT:mail@marduk.cs.cornell.edu [128.84.154.54]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f537XOh12540 for ; Sun, 3 Jun 2001 00:33:24 -0700 Received: from lutchann (helo=localhost) by marduk.litech.org with local-esmtp (Exim 3.22 #1) id 156SNs-0000jz-00; Sun, 03 Jun 2001 03:33:12 -0400 Date: Sun, 3 Jun 2001 03:33:01 -0400 (EDT) From: Nathan Lutchansky To: cc: Subject: Re: why cannot bind to someipaddress:port when something else has *:port bound? In-Reply-To: <20010601144051.D16600@tinuviel.compendium.net.ar> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 1 Jun 2001 horape@tinuviel.compendium.net.ar wrote: > The following program binds *:1000 to a socket, and then tries to bind > 200.47.36.254:1000 to another socket, the error i gets is "Address > already in use". Why? If this wasn't prevented, it would be a security hole. If the same application wants to do a wildcard bind and then a specific bind to the same port, that's all fine and good, but consider if it was two different applications. Imagine that I, as either a normal user or root, run a webserver that binds to *:8080. Now a different user attempts to bind to 10.1.1.1:8080. I will assume that if I connect to port 8080 on my server, I will connect to my webserver, but if I connect to the address 10.1.1.1 I will instead be connected to the other user's server. As you can see, this creates a huge security hole. Does this answer your question? I haven't looked at the code you attached to the message; I hope it doesn't change my answer. :-) -Nathan - -- +-------------------+---------------------+------------------------+ | Nathan Lutchansky | lutchann@litech.org | Lithium Technologies | +------------------------------------------------------------------+ | I dread success. To have succeeded is to have finished one's | | business on earth... I like a state of continual becoming, | | with a goal in front and not behind. - George Bernard Shaw | +------------------------------------------------------------------+ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: pgpenvelope 2.10.2 - http://pgpenvelope.sourceforge.net/ iD8DBQE7Geg2TviDkW8mhycRAtKdAKCtCUM9jc79iT/3Dd9fjQktez+h5wCcDxAd 1aMaCqVkCsvZuFdBSlwhYto= =3Nad -----END PGP SIGNATURE----- From owner-netdev@oss.sgi.com Sun Jun 3 00:37:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f537bLQ13096 for netdev-outgoing; Sun, 3 Jun 2001 00:37:21 -0700 Received: from tinuviel.compendium.net.ar (usat2-00222.usateleport.com [208.248.183.222]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f537bJh13090 for ; Sun, 3 Jun 2001 00:37:19 -0700 Received: by tinuviel.compendium.net.ar (Postfix, from userid 1000) id 722AC196764; Sun, 3 Jun 2001 04:35:49 -0300 (ART) Date: Sun, 3 Jun 2001 04:35:49 -0300 From: horape@tinuviel.compendium.net.ar To: Nathan Lutchansky Cc: netdev@oss.sgi.com Subject: Re: why cannot bind to someipaddress:port when something else has *:port bound? Message-ID: <20010603043549.B4142@tinuviel.compendium.net.ar> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i x-attribution: HoraPe Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f537bLh13094 Sender: owner-netdev@oss.sgi.com Precedence: bulk ¡Hola! > > The following program binds *:1000 to a socket, and then tries to bind > > 200.47.36.254:1000 to another socket, the error i gets is "Address > > already in use". Why? > If this wasn't prevented, it would be a security hole. If the same > application wants to do a wildcard bind and then a specific bind to the > same port, that's all fine and good, but consider if it was two different > applications. Imagine that I, as either a normal user or root, run a > webserver that binds to *:8080. Now a different user attempts to bind to > 10.1.1.1:8080. I will assume that if I connect to port 8080 on my server, > I will connect to my webserver, but if I connect to the address 10.1.1.1 I > will instead be connected to the other user's server. As you can see, > this creates a huge security hole. > Does this answer your question? Yes, and no. Why won't just allow binding to a "more specific" address if the new proccess wanting to do that binding is running with the same uid that the older one? (that's afaik how the 4.4BSD worked, I want to know why that was changed) > I haven't looked at the code you attached to the message; I hope it > doesn't change my answer. :-) -Nathan No, the code was just a way to clarify my answer. Thanks, HoraPe --- Horacio J. Peña horape@compendium.com.ar horape@uninet.edu bofh@puntoar.net.ar horape@hcdn.gov.ar From owner-netdev@oss.sgi.com Sun Jun 3 00:46:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f537k0814104 for netdev-outgoing; Sun, 3 Jun 2001 00:46:00 -0700 Received: from marduk.litech.org (IDENT:mail@marduk.cs.cornell.edu [128.84.154.54]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f537jxh14097 for ; Sun, 3 Jun 2001 00:45:59 -0700 Received: from lutchann (helo=localhost) by marduk.litech.org with local-esmtp (Exim 3.22 #1) id 156SaA-0003Id-00; Sun, 03 Jun 2001 03:45:54 -0400 Date: Sun, 3 Jun 2001 03:45:45 -0400 (EDT) From: Nathan Lutchansky To: "horape@tinuviel.compendium.net.ar" cc: "netdev@oss.sgi.com" Subject: Re: why cannot bind to someipaddress:port when something else has *:port bound? In-Reply-To: <20010603043549.B4142@tinuviel.compendium.net.ar> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, 3 Jun 2001, horape@tinuviel.compendium.net.ar wrote: > > > The following program binds *:1000 to a socket, and then tries to bind > > > 200.47.36.254:1000 to another socket, the error i gets is "Address > > > already in use". Why? > > > If this wasn't prevented, it would be a security hole. If the same > > application wants to do a wildcard bind and then a specific bind to the > > same port, that's all fine and good, but consider if it was two different > > applications. > > Why won't just allow binding to a "more specific" address if the new > proccess wanting to do that binding is running with the same uid that > the older one? (that's afaik how the 4.4BSD worked, I want to know why > that was changed) I imagine there are issues with some types of network applications like FTP daemons that "hunt" for an open port by repeatedly trying to bind to specific port numbers within a range. If the hunting was done with specific IP addresses, it would be possible for a daemon hunting as root to tromp over a wildcard-bound daemon listening on a well-known port. This is just a guess though; there are probably other, better reasons and my guess may not even be accurate. ;-) I do remember this question came up on one of the IPv6 lists, possibly USAGI, in regard to IPv6/IPv4 binds. And I presume this is the reason why you're asking on netdev... -Nathan - -- +-------------------+---------------------+------------------------+ | Nathan Lutchansky | lutchann@litech.org | Lithium Technologies | +------------------------------------------------------------------+ | I dread success. To have succeeded is to have finished one's | | business on earth... I like a state of continual becoming, | | with a goal in front and not behind. - George Bernard Shaw | +------------------------------------------------------------------+ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: pgpenvelope 2.10.2 - http://pgpenvelope.sourceforge.net/ iD8DBQE7GesxTviDkW8mhycRAmHUAKCh4S8ZKjnGmJgXGIPyiSVMsx614gCgg+xz 1Wr7D5U6hGHZaXeEvw6vYvk= =y8WC -----END PGP SIGNATURE----- From owner-netdev@oss.sgi.com Sun Jun 3 02:22:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f539Mri23209 for netdev-outgoing; Sun, 3 Jun 2001 02:22:53 -0700 Received: from sabre-wulf.nvg.ntnu.no (IDENT:root@sabre-wulf.nvg.ntnu.no [129.241.210.67]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f539Mqh23198 for ; Sun, 3 Jun 2001 02:22:52 -0700 Received: from tyrell.nvg.ntnu.no ([IPv6:::ffff:129.241.210.70]:3851 "EHLO tyrell.nvg.ntnu.no" ident: "root" whoson: "-unregistered-") by sabre-wulf.nvg.ntnu.no with ESMTP id ; Sun, 3 Jun 2001 11:22:44 +0200 Received: (from venaas@localhost) by tyrell.nvg.ntnu.no (8.9.3/8.8.4) id LAA08567; Sun, 3 Jun 2001 11:22:43 +0200 Date: Sun, 3 Jun 2001 11:22:43 +0200 From: Stig Venaas To: Nathan Lutchansky Cc: "horape@tinuviel.compendium.net.ar" , "netdev@oss.sgi.com" Subject: Re: why cannot bind to someipaddress:port when something else has *:port bound? Message-ID: <20010603112243.A8489@nvg.ntnu.no> References: <20010603043549.B4142@tinuviel.compendium.net.ar> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from lutchann@litech.org on Sun, Jun 03, 2001 at 03:45:45AM -0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sun, Jun 03, 2001 at 03:45:45AM -0400, Nathan Lutchansky wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Sun, 3 Jun 2001, horape@tinuviel.compendium.net.ar wrote: > > > > > The following program binds *:1000 to a socket, and then tries to bind > > > > 200.47.36.254:1000 to another socket, the error i gets is "Address > > > > already in use". Why? > > > > > If this wasn't prevented, it would be a security hole. If the same > > > application wants to do a wildcard bind and then a specific bind to the > > > same port, that's all fine and good, but consider if it was two different > > > applications. > > > > Why won't just allow binding to a "more specific" address if the new > > proccess wanting to do that binding is running with the same uid that > > the older one? (that's afaik how the 4.4BSD worked, I want to know why > > that was changed) Yes, I think that's normal BSD behavior. > I imagine there are issues with some types of network applications like > FTP daemons that "hunt" for an open port by repeatedly trying to bind to > specific port numbers within a range. If the hunting was done with > specific IP addresses, it would be possible for a daemon hunting as root > to tromp over a wildcard-bound daemon listening on a well-known port. > > This is just a guess though; there are probably other, better reasons and > my guess may not even be accurate. ;-) SO_REUSEADDR lets you do what you ask sort of. There is one interesting problem though. If have used bind(2) on two suck sockets using SO_REUSEADDR and try to use listen(2) on both, the second listen fails. I find this odd, I've only seen this on Linux so far. Stig From owner-netdev@oss.sgi.com Sun Jun 3 04:20:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f53BKlX09217 for netdev-outgoing; Sun, 3 Jun 2001 04:20:47 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f53BKih09202 for ; Sun, 3 Jun 2001 04:20:45 -0700 Received: (qmail 24633 invoked by uid 99); 3 Jun 2001 11:20:41 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 3 Jun 2001 11:20:41 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 6BED2E2D4D; Sun, 3 Jun 2001 13:29:42 +0200 (CEST) Date: Sun, 3 Jun 2001 13:29:42 +0200 From: Andi Kleen To: Peter Bieringer Cc: Maillist netdev , Maillist linux-ipv6 , Maillist USAGI-users Subject: Re: IPv6+2.4.x: ipv6_local_port_range implementation plans + netfilter6 Message-ID: <20010603132942.A2582@fred.local> References: <14800000.991472604@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <14800000.991472604@localhost>; from pb@bieringer.de on Sat, Jun 02, 2001 at 11:03:24AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, Jun 02, 2001 at 11:03:24AM +0200, Peter Bieringer wrote: > Hi all, > > are there any plans to implement "ipv6_local_port_range" in the > future like on IPv4? The IPv4 sysctl is shared between IPv4 and IPv6, because v4 and v6 share a common port space. -Andi From owner-netdev@oss.sgi.com Sun Jun 3 05:00:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f53C01N15305 for netdev-outgoing; Sun, 3 Jun 2001 05:00:01 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f53Bxxh15274 for ; Sun, 3 Jun 2001 04:59:59 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f53Bxvd15254; Sun, 3 Jun 2001 13:59:57 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id NAA31067; Sun, 3 Jun 2001 13:59:57 +0200 Date: Sun, 3 Jun 2001 13:59:57 +0200 (CEST) From: Bogdan Costescu To: Alan Cox cc: Mark Frazer , Jeff Garzik , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: MII access (was [PATCH] support for Cobalt Networks (x86 only) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 2 Jun 2001, Alan Cox wrote: > > > Then it needs to be privileged > > > > Fine. Can you think of a default value for expiring cache ? > > Yeah .. so long as its a default and tunable in /proc. New day, new ideea. The original problem was that unpriviledged users can access it too often. How about exposing the MII registers as /dev entries? Then you can have normal access rights for them and no need to worry about frequency of access. Probably default would be 600 owned by root and for HA applications a user or a group can get read (or even write) access. It's up to the sysadmin to allow it, but has to be renewed after each boot. But I guess that this is not something to be applied to 2.2 and 2.4... With clearer mind, I have to make some a correction to one of the previous messages: the problem of not checking arguments range does not apply to 3c59x which has in the ioctl function '& 0x1f' for both transceiver number and register number. However, eepro100 and tulip don't do that. (I'm checking now with 2.4.3 from Mandrake 8, but I don't think that there were recent changes in these areas). -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Sun Jun 3 05:09:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f53C95016604 for netdev-outgoing; Sun, 3 Jun 2001 05:09:05 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f53C93h16587 for ; Sun, 3 Jun 2001 05:09:04 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f53C92d15332; Sun, 3 Jun 2001 14:09:02 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id OAA31080; Sun, 3 Jun 2001 14:09:01 +0200 Date: Sun, 3 Jun 2001 14:09:01 +0200 (CEST) From: Bogdan Costescu To: jamal cc: Jeff Garzik , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 2 Jun 2001, jamal wrote: > Still, the tx watchdogs are a good source of fault detection in the case > of non-availabilty of MII detection and even with the presence of MII. Agreed. But my question was a bit different: is there any legit situation where Tx timeouts can happen in a row _without_ having a link loss ? In this situation, we'd have false positives... > "Dynamic" in the above sense means trying to totaly avoid making it a > synchronous poll. The poll rate is a function of how many packets go out > that device per average measurement time. Basically, the period that the > user space app dumps "hello" netlink packets to the kernel is a variable. Sounds nice, but could this be implemented light enough ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Sun Jun 3 05:11:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f53CBq117200 for netdev-outgoing; Sun, 3 Jun 2001 05:11:52 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f53CBph17197 for ; Sun, 3 Jun 2001 05:11:51 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 6BB261F6C; Sun, 3 Jun 2001 08:11:50 -0400 (EDT) Message-ID: <3B1A2982.C53B159C@mandrakesoft.com> Date: Sun, 03 Jun 2001 08:11:46 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bogdan Costescu Cc: Alan Cox , Mark Frazer , Pete Zaitcev , Linux Kernel Mailing List , netdev@oss.sgi.com Subject: Re: MII access (was [PATCH] support for Cobalt Networks (x86 only) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Bogdan Costescu wrote: > With clearer mind, I have to make some a correction to one of the previous > messages: the problem of not checking arguments range does not apply to > 3c59x which has in the ioctl function '& 0x1f' for both transceiver number > and register number. However, eepro100 and tulip don't do that. (I'm > checking now with 2.4.3 from Mandrake 8, but I don't think that there were > recent changes in these areas). half right -- tulip does this for the phy id but not the MII register number. I'll fix that up. Please bug Andrey about fixing up eepro100... -- Jeff Garzik | Echelon words of the day, from The Register: Building 1024 | FRU Lebed HALO Spetznaz Al Amn al-Askari Glock 26 MandrakeSoft | Steak Knife Kill the President anarchy echelon | nuclear assassinate Roswell Waco World Trade Center From owner-netdev@oss.sgi.com Sun Jun 3 07:14:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f53EEHw26485 for netdev-outgoing; Sun, 3 Jun 2001 07:14:17 -0700 Received: from lox.sandelman.ottawa.on.ca (IDENT:root@lox.sandelman.ottawa.on.ca [209.151.24.2]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f53EEGh26482 for ; Sun, 3 Jun 2001 07:14:16 -0700 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id KAA09348; Sun, 3 Jun 2001 10:14:08 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca ([3ffe:1ce1:0:fe50:2a0:24ff:feac:5c52]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f53EH5w13105 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK); Sun, 3 Jun 2001 10:17:06 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca (localhost [[UNIX: localhost]]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f53ECiO26985; Sun, 3 Jun 2001 10:12:45 -0400 (EDT) Message-Id: <200106031412.f53ECiO26985@marajade.sandelman.ottawa.on.ca> To: horape@tinuviel.compendium.net.ar cc: netdev@oss.sgi.com Subject: Re: why cannot bind to someipaddress:port when something else has *:port bound? In-reply-to: Your message of "Sun, 03 Jun 2001 04:35:49 -0300." <20010603043549.B4142@tinuviel.compendium.net.ar> Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Sun, 03 Jun 2001 10:12:44 -0400 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk >>>>> "HoraPe" == horape writes: HoraPe> Yes, and no. Why won't just allow binding to a "more specific" HoraPe> address if the new proccess wanting to do that binding is running HoraPe> with the same uid that the older one? (that's afaik how the HoraPe> 4.4BSD worked, I want to know why that was changed) BSD never quite worked that way. NetBSD, at least, has the same behaviour now. (I don't have another *BSD system handy at the moment) The same program can not bind both the wildcard and a specific one unless, on the first bind(), it does SO_REUSEPORT. NetBSD 1.4's bind(2) says: SECURITY CONSIDERATIONS bind() was changed in NetBSD 1.4 to prevent the binding of a socket to the same port as an existing socket when all of the following is true: o either of the existing or new addresses is INADDR_ANY, o the uid of the new socket is not root, and the uids of the cre- ators of the sockets are different, o the address is not a multicast address, and o both sockets are not bound to INADDR_ANY with SO_REUSEPORT set. This prevents an attack where a user could bind to a port with the host's IP address (after setting SO_REUSEADDR) and `steal' packets destined for a server that bound to the same port with INADDR_ANY. Canadian Commuter Challenge Project -- GNU Potato Caboose Michael Richardson, Sandelman Software Works, Ottawa, ON EMAIL: mcr@commuterchallenge.net for help, email or page at 1-866-231-8608 From owner-netdev@oss.sgi.com Sun Jun 3 12:54:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f53JsRj24521 for netdev-outgoing; Sun, 3 Jun 2001 12:54:27 -0700 Received: from colorfullife.com (colorfullife.com [216.156.138.34]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f53JsOh24506 for ; Sun, 3 Jun 2001 12:54:24 -0700 Received: from dbl.localdomain (localhost [127.0.0.1]) by colorfullife.com (8.11.2/8.11.2) with ESMTP id f53JuYq17499; Sun, 3 Jun 2001 15:56:34 -0400 Received: from colorfullife.com (IDENT:manfred@clmsdev.localdomain [172.17.4.1]) by dbl.localdomain (8.11.2/8.11.2) with ESMTP id f53Js1v12125; Sun, 3 Jun 2001 21:54:01 +0200 Message-ID: <3B1A9558.2DBAECE7@colorfullife.com> Date: Sun, 03 Jun 2001 21:51:52 +0200 From: Manfred Spraul X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.5-ac6 i686) X-Accept-Language: en, de MIME-Version: 1.0 To: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: multicast hash incorrect on big endian archs Content-Type: multipart/mixed; boundary="------------8788A1E78D3711A276170233" Sender: owner-netdev@oss.sgi.com Precedence: bulk This is a multi-part message in MIME format. --------------8788A1E78D3711A276170233 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I noticed that the multicast hash calculations assumed little endian byte ordering in the winbond-840 driver, and it seems that several other drivers are also affected: 8139too, epic100, fealnx, pci-skeleton, sis900, starfile, sundance, via-rhine, yellowfin perhaps drivers/net/pcmcia/xircom_tulip_cb I've attached an untested patch. It's possible that the nic performs another byte swap if configured for big endian support, but I've never seen a nic that performs byte swaps on register writes, only on memory reads. Please cc me, I'm not subscribed to the mailing lists. -- Manfred --------------8788A1E78D3711A276170233 Content-Type: text/plain; charset=us-ascii; name="patch-ask" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch-ask" diff -u 2.4/drivers/net/8139too.c build-2.4/drivers/net/8139too.c --- 2.4/drivers/net/8139too.c Sat Jun 2 14:19:44 2001 +++ build-2.4/drivers/net/8139too.c Sun Jun 3 19:46:05 2001 @@ -2248,6 +2248,10 @@ return crc; } +static void set_bit_32(int offset, u32 * data) +{ + data[offset >> 5] |= (1 << (offset & 0x1f)); +} static void rtl8139_set_rx_mode (struct net_device *dev) { @@ -2283,7 +2287,7 @@ mc_filter[1] = mc_filter[0] = 0; for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) - set_bit (ether_crc (ETH_ALEN, mclist->dmi_addr) >> 26, + set_bit_32(ether_crc (ETH_ALEN, mclist->dmi_addr) >> 26, mc_filter); } diff -u 2.4/drivers/net/epic100.c build-2.4/drivers/net/epic100.c --- 2.4/drivers/net/epic100.c Sat May 26 10:06:26 2001 +++ build-2.4/drivers/net/epic100.c Sun Jun 3 19:48:44 2001 @@ -1305,11 +1305,16 @@ return crc; } +static void set_bit_16(int offset, u16 *data) +{ + data[offset >> 4] |= (1<<(offset & 0xf)); +} + static void set_rx_mode(struct net_device *dev) { long ioaddr = dev->base_addr; struct epic_private *ep = dev->priv; - unsigned char mc_filter[8]; /* Multicast hash filter */ + u16 mc_filter[8]; /* Multicast hash filter */ int i; if (dev->flags & IFF_PROMISC) { /* Set promiscuous. */ @@ -1332,13 +1337,13 @@ memset(mc_filter, 0, sizeof(mc_filter)); for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) - set_bit(ether_crc_le(ETH_ALEN, mclist->dmi_addr) & 0x3f, + set_bit_16(ether_crc_le(ETH_ALEN, mclist->dmi_addr) & 0x3f, mc_filter); } /* ToDo: perhaps we need to stop the Tx and Rx process here? */ if (memcmp(mc_filter, ep->mc_filter, sizeof(mc_filter))) { for (i = 0; i < 4; i++) - outw(((u16 *)mc_filter)[i], ioaddr + MC0 + i*4); + outw(mc_filter[i], ioaddr + MC0 + i*4); memcpy(ep->mc_filter, mc_filter, sizeof(mc_filter)); } return; diff -u 2.4/drivers/net/fealnx.c build-2.4/drivers/net/fealnx.c --- 2.4/drivers/net/fealnx.c Sat May 26 10:06:26 2001 +++ build-2.4/drivers/net/fealnx.c Sun Jun 3 19:49:45 2001 @@ -1642,6 +1642,10 @@ return crc; } +static void set_bit_32(int offset, u32 * data) +{ + data[offset >> 5] |= (1 << (offset & 0x1f)); +} static void set_rx_mode(struct net_device *dev) { @@ -1667,7 +1671,7 @@ memset(mc_filter, 0, sizeof(mc_filter)); for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) { - set_bit((ether_crc(ETH_ALEN, mclist->dmi_addr) >> 26) ^ 0x3F, + set_bit_32((ether_crc(ETH_ALEN, mclist->dmi_addr) >> 26) ^ 0x3F, mc_filter); } rx_mode = AB | AM; diff -u 2.4/drivers/net/pci-skeleton.c build-2.4/drivers/net/pci-skeleton.c --- 2.4/drivers/net/pci-skeleton.c Fri Apr 20 20:54:23 2001 +++ build-2.4/drivers/net/pci-skeleton.c Sun Jun 3 20:01:52 2001 @@ -1862,6 +1862,10 @@ return crc; } +static void set_bit_32(int offset, u32 * data) +{ + data[offset >> 5] |= (1 << (offset & 0x1f)); +} static void netdrv_set_rx_mode (struct net_device *dev) { @@ -1896,7 +1900,7 @@ mc_filter[1] = mc_filter[0] = 0; for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) - set_bit (ether_crc (ETH_ALEN, mclist->dmi_addr) >> 26, + set_bit_32 (ether_crc (ETH_ALEN, mclist->dmi_addr) >> 26, mc_filter); } diff -u 2.4/drivers/net/sis900.c build-2.4/drivers/net/sis900.c --- 2.4/drivers/net/sis900.c Sat Jun 2 14:19:44 2001 +++ build-2.4/drivers/net/sis900.c Sun Jun 3 19:51:38 2001 @@ -1870,6 +1870,11 @@ * Multicast hash table changes from 128 to 256 bits for 635M/B & 900B0. */ +static void set_bit_16(int offset, u16 *data) +{ + data[offset >> 4] |= (1<<(offset & 0xf)); +} + static void set_rx_mode(struct net_device *net_dev) { long ioaddr = net_dev->base_addr; @@ -1904,7 +1909,7 @@ rx_mode = RFAAB; for (i = 0, mclist = net_dev->mc_list; mclist && i < net_dev->mc_count; i++, mclist = mclist->next) - set_bit(sis900_compute_hashtable_index(mclist->dmi_addr, revision), + set_bit_16(sis900_compute_hashtable_index(mclist->dmi_addr, revision), mc_filter); } diff -u 2.4/drivers/net/starfire.c build-2.4/drivers/net/starfire.c --- 2.4/drivers/net/starfire.c Fri Apr 20 20:54:23 2001 +++ build-2.4/drivers/net/starfire.c Sun Jun 3 19:53:07 2001 @@ -1185,6 +1185,12 @@ return crc; } +static void set_bit_16(int offset, u16 *data) +{ + data[offset >> 4] |= (1<<(offset & 0xf)); +} + + static void set_rx_mode(struct net_device *dev) { long ioaddr = dev->base_addr; @@ -1219,12 +1225,12 @@ } else { /* Must use a multicast hash table. */ long filter_addr; - u16 mc_filter[32] __attribute__ ((aligned(sizeof(long)))); /* Multicast hash filter */ + u16 mc_filter[32]; /* Multicast hash filter */ memset(mc_filter, 0, sizeof(mc_filter)); for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) { - set_bit(ether_crc_le(ETH_ALEN, mclist->dmi_addr) >> 23, mc_filter); + set_bit_16(ether_crc_le(ETH_ALEN, mclist->dmi_addr) >> 23, mc_filter); } /* Clear the perfect filter list. */ filter_addr = ioaddr + 0x56000 + 1*16; diff -u 2.4/drivers/net/sundance.c build-2.4/drivers/net/sundance.c --- 2.4/drivers/net/sundance.c Fri Apr 20 20:54:22 2001 +++ build-2.4/drivers/net/sundance.c Sun Jun 3 19:53:45 2001 @@ -1121,6 +1121,11 @@ return crc; } +static void set_bit_16(int offset, u16 *data) +{ + data[offset >> 4] |= (1<<(offset & 0xf)); +} + static void set_rx_mode(struct net_device *dev) { long ioaddr = dev->base_addr; @@ -1143,7 +1148,7 @@ memset(mc_filter, 0, sizeof(mc_filter)); for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) { - set_bit(ether_crc_le(ETH_ALEN, mclist->dmi_addr) & 0x3f, + set_bit_16(ether_crc_le(ETH_ALEN, mclist->dmi_addr) & 0x3f, mc_filter); } rx_mode = AcceptBroadcast | AcceptMultiHash | AcceptMyPhys; diff -u 2.4/drivers/net/via-rhine.c build-2.4/drivers/net/via-rhine.c --- 2.4/drivers/net/via-rhine.c Fri Apr 20 20:54:23 2001 +++ build-2.4/drivers/net/via-rhine.c Sun Jun 3 19:54:28 2001 @@ -1365,6 +1365,11 @@ return crc; } +static void set_bit_32(int offset, u32 * data) +{ + data[offset >> 5] |= (1 << (offset & 0x1f)); +} + static void via_rhine_set_rx_mode(struct net_device *dev) { struct netdev_private *np = dev->priv; @@ -1388,7 +1393,7 @@ memset(mc_filter, 0, sizeof(mc_filter)); for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count; i++, mclist = mclist->next) { - set_bit(ether_crc(ETH_ALEN, mclist->dmi_addr) >> 26, mc_filter); + set_bit_32(ether_crc(ETH_ALEN, mclist->dmi_addr) >> 26, mc_filter); } writel(mc_filter[0], ioaddr + MulticastFilter0); writel(mc_filter[1], ioaddr + MulticastFilter1); diff -u 2.4/drivers/net/yellowfin.c build-2.4/drivers/net/yellowfin.c --- 2.4/drivers/net/yellowfin.c Sat May 26 10:06:26 2001 +++ build-2.4/drivers/net/yellowfin.c Sun Jun 3 19:55:38 2001 @@ -1283,6 +1283,10 @@ return crc; } +static void set_bit_16(int offset, u16 *data) +{ + data[offset >> 4] |= (1<<(offset & 0xf)); +} static void set_rx_mode(struct net_device *dev) { @@ -1309,14 +1313,14 @@ /* Due to a bug in the early chip versions, multiple filter slots must be set for each address. */ if (yp->drv_flags & HasMulticastBug) { - set_bit((ether_crc_le(3, mclist->dmi_addr) >> 3) & 0x3f, + set_bit_16((ether_crc_le(3, mclist->dmi_addr) >> 3) & 0x3f, hash_table); - set_bit((ether_crc_le(4, mclist->dmi_addr) >> 3) & 0x3f, + set_bit_16((ether_crc_le(4, mclist->dmi_addr) >> 3) & 0x3f, hash_table); - set_bit((ether_crc_le(5, mclist->dmi_addr) >> 3) & 0x3f, + set_bit_16((ether_crc_le(5, mclist->dmi_addr) >> 3) & 0x3f, hash_table); } - set_bit((ether_crc_le(6, mclist->dmi_addr) >> 3) & 0x3f, + set_bit_16((ether_crc_le(6, mclist->dmi_addr) >> 3) & 0x3f, hash_table); } /* Copy the hash table to the chip. */ --------------8788A1E78D3711A276170233-- From owner-netdev@oss.sgi.com Sun Jun 3 19:24:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f542ORJ03434 for netdev-outgoing; Sun, 3 Jun 2001 19:24:27 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f542OQh03423 for ; Sun, 3 Jun 2001 19:24:26 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id TAA11717; Sun, 3 Jun 2001 19:24:18 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15130.61778.471925.245018@pizda.ninka.net> Date: Sun, 3 Jun 2001 19:24:18 -0700 (PDT) To: Manfred Spraul Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: multicast hash incorrect on big endian archs In-Reply-To: <3B1A9558.2DBAECE7@colorfullife.com> References: <3B1A9558.2DBAECE7@colorfullife.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Manfred Spraul writes: > I noticed that the multicast hash calculations assumed little endian > byte ordering in the winbond-840 driver, and it seems that several other > drivers are also affected: > > 8139too, epic100, fealnx, pci-skeleton, sis900, starfile, sundance, > via-rhine, yellowfin > perhaps drivers/net/pcmcia/xircom_tulip_cb Many big-endian systems already need to provide little-endian bitops, for ext2's sake for example. We should formalize this, with {set,clear,change,test}_le_bit which technically every port has implemented in some for or another already. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Jun 3 21:33:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f544Xka23037 for netdev-outgoing; Sun, 3 Jun 2001 21:33:46 -0700 Received: from hindon.hss.co.in ([202.54.26.202]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f544Xhh23018 for ; Sun, 3 Jun 2001 21:33:43 -0700 Received: from sandesh.hss.hns.com (localhost [127.0.0.1]) by hindon.hss.co.in (8.10.0/8.10.0) with SMTP id f544YAT21176; Mon, 4 Jun 2001 10:04:15 +0530 (IST) Received: by sandesh.hss.hns.com(Lotus SMTP MTA v4.6.3 (733.2 10-16-1998)) id 65256A61.00179D0D ; Mon, 4 Jun 2001 09:47:55 +0530 X-Lotus-FromDomain: HSS From: sakalra@hss.hns.com To: Ben Greear cc: netdev@oss.sgi.com Message-ID: <65256A61.00179AF8.00@sandesh.hss.hns.com> Date: Mon, 4 Jun 2001 10:02:29 +0530 Subject: Re: regarding Redundancy in TCP / IP Stack Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2955 Lines: 94 hi, BEN> What redundancy do you expect to get? What will you BEN> be able to do that we cannot do today? You mean BEN> keeping two stacks in sync across two different machines, so BEN> that you can hot-swap processes or something? San> Yes, to be more precise, this will be a part of a big VoIP project San> and thats the world i belong to San> the idea of the project is to have a redundant VoIP entity (may be MG /Switch /or MGControllers) San> where there will be a Single floating IP, and the whole entity will be a single San> unit to the outside world. For UDP there is no problem implementing this , but with TCP San> the stack has to be modified or what i think other ideas like San> a net-filter or going for hacking packets at the interface rather than San> allowing them to float to stack are also good. Sandeep kalra Ben Greear on 06/01/2001 09:25:20 PM To: Sandeep Kalra/HSS@HSS cc: netdev@oss.sgi.com, Rajiv Roy/HSS@HSS Subject: Re: regarding Redundancy in TCP / IP Stack sakalra@hss.hns.com wrote: > > hi all , list > We am novice to the Linux TCP / IP stack arch, > At present i want to implement redundancy > at socket level in the Stack.. Can you please > help me with some docs, information in this regards What redundancy do you expect to get? What will you be able to do that we cannot do today? You mean keeping two stacks in sync across two different machines, so that you can hot-swap processes or something? I thought about this w/regard to building a VOIP box that could handle failover w/out dropping calls, but I decided that it was an intractable problem, and that there were probably other ways to get the functionality better. For example, on failover, grab packets right off the interface instead of letting them go up the stack and implement your own hacked up TCP/IP stack in user-space that is specifically designed to do what you want. This is pretty damn ugly, of course, but you might could keep the connections together. For VOIP in particular, most of your traffic is UDP anyway, so your problem is much more easily solved... > > we want to know > 1. The Data structures that are kept by the system for maintaining the > Connection. > 2. Kernel related data structures that are part of the TCP / IP stack. > 3. Any Documents, Links that can help us in getting with the procedure .as to > how it can be implemeted efficiently. > 4. Pros & cons in implementing such redundancy. > 5. kernel related other information as to which modules are interdependent to > this (If any). > 6. If any work is going in this regards, then what is the present status. & for > more detail whom shall then we refer to. > > regards > Sandeep , Rajiv -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Sun Jun 3 23:33:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f546Xcb07776 for netdev-outgoing; Sun, 3 Jun 2001 23:33:38 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f546Xah07766 for ; Sun, 3 Jun 2001 23:33:37 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f5478fM21999; Mon, 4 Jun 2001 00:08:41 -0700 Message-ID: <3B1B33F9.B83F062@candelatech.com> Date: Mon, 04 Jun 2001 00:08:41 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: sakalra@hss.hns.com CC: netdev@oss.sgi.com Subject: Re: regarding Redundancy in TCP / IP Stack References: <65256A61.00179AF8.00@sandesh.hss.hns.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2193 Lines: 45 sakalra@hss.hns.com wrote: > > hi, > > BEN> What redundancy do you expect to get? What will you > BEN> be able to do that we cannot do today? You mean > BEN> keeping two stacks in sync across two different machines, so > BEN> that you can hot-swap processes or something? > > San> Yes, to be more precise, this will be a part of a big VoIP project > San> and thats the world i belong to > San> the idea of the project is to have a redundant VoIP entity (may be MG > /Switch /or MGControllers) > San> where there will be a Single floating IP, and the whole entity will be a > single > San> unit to the outside world. For UDP there is no problem implementing this , > but with TCP > San> the stack has to be modified or what i think other ideas like > San> a net-filter or going for hacking packets at the interface rather than > San> allowing them to float to stack are also good. If you start modifying the stack, you will likely have to modify everything above it (ie the VOIP program). This will get nasty, because the client on the other end of the TCP connection will have it's own idea of sequence numbers, etc. As nasty as it sounds, I think that just by-passing the tcp/ip stack alltogether may be the cleanest route. This will mean re-implementing most of the stack in user space (ie your VOIP program), but maybe that is OK. Depending on how forgiving TCP/IP is about sequence numbers (and it should be able to handle duplicate and dropped ones, so it might just recover OK), it may be possible to used a modified version of the existing TCP stack, but you may spend 99% of your time chasing hard-to-catch border cases... Have you considered just finding a piece of hardware/software combo that meets 5-9's, or whatever you are shooting for? Motorolla offers a 5-9's compact-PCI shelf based on Linux, for example. Also, since the RTP protocol is based off of UDP (which is admittedly easy to fail-over), what is running over TCP that is so important? -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Sun Jun 3 23:58:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f546w4U10741 for netdev-outgoing; Sun, 3 Jun 2001 23:58:04 -0700 Received: from hindon.hss.co.in ([202.54.26.202]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f546vxh10726 for ; Sun, 3 Jun 2001 23:58:00 -0700 Received: from sandesh.hss.hns.com (localhost [127.0.0.1]) by hindon.hss.co.in (8.10.0/8.10.0) with SMTP id f546wQb28540; Mon, 4 Jun 2001 12:28:27 +0530 (IST) Received: by sandesh.hss.hns.com(Lotus SMTP MTA v4.6.3 (733.2 10-16-1998)) id 65256A61.0024D187 ; Mon, 4 Jun 2001 12:12:09 +0530 X-Lotus-FromDomain: HSS From: sakalra@hss.hns.com To: Ben Greear cc: netdev@oss.sgi.com Message-ID: <65256A61.0024CEFA.00@sandesh.hss.hns.com> Date: Mon, 4 Jun 2001 12:26:42 +0530 Subject: Re: regarding Redundancy in TCP / IP Stack Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3259 Lines: 95 u can take the idea with an example say that u get with a "single framework" to interact with the outer world what soever be the transport mechanism (weather it be a UDP or TCP or Message Q's ) so, this transport is with a general library like void* p_msg; /* Data */ Is_active id;/* will hold the id of the entity that is active at that moment */ enum transport_mechanism={ 'TCP' ,'UDP' , 'M_Q'}; /*Prototype of general library */ boolean send_to_world( p_msg,transport_mechanism, ,...); boolean recv_from_world(p_msg, id, ...); BEN> If you start modifying the stack, you will likely have to modify BEN> everything above it (ie the VOIP program). this way i think the VoIP program is not modified every time. Also to the outer world the entty which interact is a singleton , whereas inside the entity the redundancy of the VoIP box will only be cared ab't sandeep kalra Ben Greear on 06/04/2001 12:38:41 PM To: Sandeep Kalra/HSS@HSS cc: netdev@oss.sgi.com Subject: Re: regarding Redundancy in TCP / IP Stack sakalra@hss.hns.com wrote: > > hi, > > BEN> What redundancy do you expect to get? What will you > BEN> be able to do that we cannot do today? You mean > BEN> keeping two stacks in sync across two different machines, so > BEN> that you can hot-swap processes or something? > > San> Yes, to be more precise, this will be a part of a big VoIP project > San> and thats the world i belong to > San> the idea of the project is to have a redundant VoIP entity (may be MG > /Switch /or MGControllers) > San> where there will be a Single floating IP, and the whole entity will be a > single > San> unit to the outside world. For UDP there is no problem implementing this , > but with TCP > San> the stack has to be modified or what i think other ideas like > San> a net-filter or going for hacking packets at the interface rather than > San> allowing them to float to stack are also good. If you start modifying the stack, you will likely have to modify everything above it (ie the VOIP program). This will get nasty, because the client on the other end of the TCP connection will have it's own idea of sequence numbers, etc. As nasty as it sounds, I think that just by-passing the tcp/ip stack alltogether may be the cleanest route. This will mean re-implementing most of the stack in user space (ie your VOIP program), but maybe that is OK. Depending on how forgiving TCP/IP is about sequence numbers (and it should be able to handle duplicate and dropped ones, so it might just recover OK), it may be possible to used a modified version of the existing TCP stack, but you may spend 99% of your time chasing hard-to-catch border cases... Have you considered just finding a piece of hardware/software combo that meets 5-9's, or whatever you are shooting for? Motorolla offers a 5-9's compact-PCI shelf based on Linux, for example. Also, since the RTP protocol is based off of UDP (which is admittedly easy to fail-over), what is running over TCP that is so important? -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Jun 4 00:02:12 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f5472CZ11767 for netdev-outgoing; Mon, 4 Jun 2001 00:02:12 -0700 Received: from colorfullife.com (colorfullife.com [216.156.138.34]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f54728h11751 for ; Mon, 4 Jun 2001 00:02:08 -0700 Received: from dbl.localdomain (localhost [127.0.0.1]) by colorfullife.com (8.11.2/8.11.2) with ESMTP id f5474bq22500; Mon, 4 Jun 2001 03:04:38 -0400 Received: from colorfullife.com (gw.cat5.localdomain [172.17.0.1]) by dbl.localdomain (8.11.2/8.11.2) with ESMTP id f54720v12732; Mon, 4 Jun 2001 09:02:00 +0200 Message-ID: <3B1B3268.2A02D2C@colorfullife.com> Date: Mon, 04 Jun 2001 09:02:00 +0200 From: Manfred Spraul X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: multicast hash incorrect on big endian archs References: <3B1A9558.2DBAECE7@colorfullife.com> <15130.61778.471925.245018@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1068 Lines: 36 "David S. Miller" wrote: > > Manfred Spraul writes: > > I noticed that the multicast hash calculations assumed little endian > > byte ordering in the winbond-840 driver, and it seems that several other > > drivers are also affected: > > > > 8139too, epic100, fealnx, pci-skeleton, sis900, starfile, sundance, > > via-rhine, yellowfin > > perhaps drivers/net/pcmcia/xircom_tulip_cb > > Many big-endian systems already need to provide little-endian bitops, > for ext2's sake for example. > > We should formalize this, with {set,clear,change,test}_le_bit which > technically every port has implemented in some for or another already. > The multicast hash is written into a nic register with set_bit(crc(...),mc_list); ... out{b,w,l}(mc_list[i],ioaddr); set_bit_le only helps for outb. My patch uses set_bit_16 and set_bit_32. Another option would be set_bit_le(crc(...),mc_list) ... out{w,l}(le{16,32}_to_cpu(mc_list[i]),ioaddr); but I think set_bit_{8,16,32,64} are the better solution. Obviously we could move them into a header file. -- Manfred From owner-netdev@oss.sgi.com Mon Jun 4 02:35:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f549ZGD27043 for netdev-outgoing; Mon, 4 Jun 2001 02:35:16 -0700 Received: from colorfullife.com (colorfullife.com [216.156.138.34]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f549ZFh27039 for ; Mon, 4 Jun 2001 02:35:15 -0700 Received: from dbl.localdomain (localhost [127.0.0.1]) by colorfullife.com (8.11.2/8.11.2) with ESMTP id f549bjq23817; Mon, 4 Jun 2001 05:37:49 -0400 Received: from colorfullife.com (IDENT:manfred@clmsdev.localdomain [172.17.4.1]) by dbl.localdomain (8.11.2/8.11.2) with ESMTP id f549ZAv12891; Mon, 4 Jun 2001 11:35:10 +0200 Message-ID: <3B1B564E.D83A741A@colorfullife.com> Date: Mon, 04 Jun 2001 11:35:10 +0200 From: Manfred Spraul X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.5-ac6 i686) X-Accept-Language: en, de MIME-Version: 1.0 To: "David S. Miller" CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: multicast hash incorrect on big endian archs References: <3B1A9558.2DBAECE7@colorfullife.com> <15130.61778.471925.245018@pizda.ninka.net> <3B1B3268.2A02D2C@colorfullife.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 885 Lines: 31 Manfred Spraul wrote: > > "David S. Miller" wrote: > > > > Many big-endian systems already need to provide little-endian bitops, > > for ext2's sake for example. > > > > We should formalize this, with {set,clear,change,test}_le_bit which > > technically every port has implemented in some for or another already. > > That could cause alignment problems. <<< from starfire.c { long filter_addr; u16 mc_filter[32] __attribute__ ((aligned(sizeof(long)))); <<< set_bit requires word alignment, but without the __attibute__ the compiler would only guarantee 16-bit alignment. IMHO ugly. Should I add __set_bit_{8,16,32} into , overridable with __HAVE_ARCH_SET_BIT_n? Default implementation for the nonatomic __set_bit could be added into , too. Btw, the correct name would be __set_bit_n: the function don't guarantee atomicity. -- Manfred From owner-netdev@oss.sgi.com Mon Jun 4 03:54:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f54Asj707745 for netdev-outgoing; Mon, 4 Jun 2001 03:54:45 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f54Asih07742 for ; Mon, 4 Jun 2001 03:54:44 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id DAA12320; Mon, 4 Jun 2001 03:54:42 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15131.26866.794001.525719@pizda.ninka.net> Date: Mon, 4 Jun 2001 03:54:42 -0700 (PDT) To: Manfred Spraul Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: multicast hash incorrect on big endian archs In-Reply-To: <3B1B564E.D83A741A@colorfullife.com> References: <3B1A9558.2DBAECE7@colorfullife.com> <15130.61778.471925.245018@pizda.ninka.net> <3B1B3268.2A02D2C@colorfullife.com> <3B1B564E.D83A741A@colorfullife.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 449 Lines: 17 Manfred Spraul writes: > That could cause alignment problems. > <<< from starfire.c > { > long filter_addr; > u16 mc_filter[32] __attribute__ ((aligned(sizeof(long)))); > <<< > set_bit requires word alignment, but without the __attibute__ the > compiler would only guarantee 16-bit alignment. IMHO ugly. Correction, it requires "long" alignment and that is 64-bits on several platforms. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Jun 4 11:43:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f54IhJW15915 for netdev-outgoing; Mon, 4 Jun 2001 11:43:19 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f54IhIh15908 for ; Mon, 4 Jun 2001 11:43:18 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 47BA31F74; Mon, 4 Jun 2001 14:43:11 -0400 (EDT) Message-ID: <3B1BD6C0.9F54047E@mandrakesoft.com> Date: Mon, 04 Jun 2001 14:43:12 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bogdan Costescu Cc: jamal , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , netdev@oss.sgi.com Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 697 Lines: 18 Bogdan Costescu wrote: > > On Sat, 2 Jun 2001, jamal wrote: > > > Still, the tx watchdogs are a good source of fault detection in the case > > of non-availabilty of MII detection and even with the presence of MII. > > Agreed. But my question was a bit different: is there any legit situation > where Tx timeouts can happen in a row _without_ having a link loss ? In > this situation, we'd have false positives... yes -- Jeff Garzik | Echelon words of the day, from The Register: Building 1024 | FRU Lebed HALO Spetznaz Al Amn al-Askari Glock 26 MandrakeSoft | Steak Knife Kill the President anarchy echelon | nuclear assassinate Roswell Waco World Trade Center From owner-netdev@oss.sgi.com Mon Jun 4 12:10:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f54JAoJ20976 for netdev-outgoing; Mon, 4 Jun 2001 12:10:50 -0700 Received: from shell.cyberus.ca (shell.cyberus.ca [209.195.95.7]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f54JAnh20969 for ; Mon, 4 Jun 2001 12:10:49 -0700 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id PAA19669; Mon, 4 Jun 2001 15:08:51 -0400 (EDT) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 4 Jun 2001 15:08:51 -0400 (EDT) From: jamal To: Bogdan Costescu cc: Jeff Garzik , Alan Cox , Pete Zaitcev , Linux Kernel Mailing List , Subject: Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1146 Lines: 37 On Sun, 3 Jun 2001, Bogdan Costescu wrote: > On Sat, 2 Jun 2001, jamal wrote: > > > Still, the tx watchdogs are a good source of fault detection in the case > > of non-availabilty of MII detection and even with the presence of MII. > > Agreed. But my question was a bit different: is there any legit situation > where Tx timeouts can happen in a row _without_ having a link loss ? In > this situation, we'd have false positives... Two places: 1) no MII indicators 2) shaky hardware and MII bounces. Is it on, is it off? What is going on? You could use them to "probe" to make sure that infact the MII indicators are not false positives. Your mileage may vary. > > > "Dynamic" in the above sense means trying to totaly avoid making it a > > synchronous poll. The poll rate is a function of how many packets go out > > that device per average measurement time. Basically, the period that the > > user space app dumps "hello" netlink packets to the kernel is a variable. > > Sounds nice, but could this be implemented light enough ? > Not as simple as synchronous polls. Note, however, simple/light does not imply the best. cheers, jamal From owner-netdev@oss.sgi.com Tue Jun 5 02:07:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f5597i929675 for netdev-outgoing; Tue, 5 Jun 2001 02:07:44 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f5597ah29650 for ; Tue, 5 Jun 2001 02:07:36 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f5597Cd12186; Tue, 5 Jun 2001 11:07:16 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id LAA05380; Tue, 5 Jun 2001 11:07:06 +0200 Date: Tue, 5 Jun 2001 11:07:06 +0200 (CEST) From: Bogdan Costescu To: Jeff Garzik cc: Alan Cox , Mark Frazer , Pete Zaitcev , Linux Kernel Mailing List , , Subject: Re: MII access (was [PATCH] support for Cobalt Networks (x86 only) In-Reply-To: <3B1A2982.C53B159C@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1057 Lines: 26 On Sun, 3 Jun 2001, Jeff Garzik wrote: > Bogdan Costescu wrote: > > With clearer mind, I have to make some a correction to one of the previous > > messages: the problem of not checking arguments range does not apply to > > 3c59x which has in the ioctl function '& 0x1f' for both transceiver number > > and register number. However, eepro100 and tulip don't do that. (I'm > > checking now with 2.4.3 from Mandrake 8, but I don't think that there were > > recent changes in these areas). > > half right -- tulip does this for the phy id but not the MII register > number. I'll fix that up. Please bug Andrey about fixing up > eepro100... OK, Andrey is now CC-ed. However, I only checked the 3 mentioned drivers, while MII ioctl's are used in many more... I was hoping that the mantainers would jump in! -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Tue Jun 5 07:13:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55EDYe19044 for netdev-outgoing; Tue, 5 Jun 2001 07:13:34 -0700 Received: from hindon.hss.co.in ([202.54.26.202]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55EDVh19017 for ; Tue, 5 Jun 2001 07:13:32 -0700 Received: from sandesh.hss.hns.com (localhost [127.0.0.1]) by hindon.hss.co.in (8.10.0/8.10.0) with SMTP id f55EEdS27259 for ; Tue, 5 Jun 2001 19:44:40 +0530 (IST) Received: by sandesh.hss.hns.com(Lotus SMTP MTA v4.6.3 (733.2 10-16-1998)) id 65256A62.004CBFFD ; Tue, 5 Jun 2001 19:28:18 +0530 X-Lotus-FromDomain: HSS From: sndtrn27@hss.hns.com To: netdev@oss.sgi.com Message-ID: <65256A62.004CA258.00@sandesh.hss.hns.com> Date: Tue, 5 Jun 2001 19:24:18 +0530 Subject: Where can i find 4.x BSD Source Code Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 800 Lines: 25 hi all, I am looking for 4.x BSD Source Code At present i want to implement redundancy at socket level in the TCP Stack.. Can you please help me with some docs, information in this regards any additional help in these feild will be appriciated. i want to know 1. The Data structures that are kept by the system for maintaining the Connection. 2. Kernel related data structures that are part of the TCP / IP stack. 3. Any Documents, Links that can help us in getting with the procedure .as to how it can be implemeted efficiently. 4. Pros & cons in implementing such redundancy. 5. kernel related other information as to which modules are interdependent to this (If any). 6. If any work is going in this regards, then what is the present status. & for more detail whom shall then we refer to. From owner-netdev@oss.sgi.com Tue Jun 5 11:06:12 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55I6CY12009 for netdev-outgoing; Tue, 5 Jun 2001 11:06:12 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55I6Ah12003 for ; Tue, 5 Jun 2001 11:06:10 -0700 Received: (qmail 29813 invoked from network); 5 Jun 2001 18:06:04 -0000 Received: from pd9502492.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.80.36.146) by mail.bieringer.de with SMTP; 5 Jun 2001 18:06:04 -0000 Date: Tue, 05 Jun 2001 20:06:12 +0200 From: Peter Bieringer To: Maillist netdev cc: Andi Kleen Subject: Re: IPv6+2.4.x: ipv6_local_port_range implementation plans + netfilter6 Message-ID: <12580000.991764372@localhost> In-Reply-To: <20010603132942.A2582@fred.local> X-Mailer: Mulberry/2.0.8 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 843 Lines: 27 --On Sunday, June 03, 2001 01:29:42 PM +0200 Andi Kleen wrote: > On Sat, Jun 02, 2001 at 11:03:24AM +0200, Peter Bieringer wrote: >> Hi all, >> >> are there any plans to implement "ipv6_local_port_range" in the >> future like on IPv4? > > The IPv4 sysctl is shared between IPv4 and IPv6, because v4 and v6 > share a common port space. Thanks for reply. Two more questions: 1) exists there any documentation beside the source code itself which "/proc/sys/net/ipv4" values will be also used for IPv6? 2) are there any plans for 2.5 or later to split off common used proc switches to another directory like "/proc/sys/net/ip"? There was a thread sometimes ago relating 'howto make IPv4 as module' which can be take advantage of such split off (I'm thinking about IPv6 only clients with Linux network stack)... TIA, Peter From owner-netdev@oss.sgi.com Tue Jun 5 12:01:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55J14f18101 for netdev-outgoing; Tue, 5 Jun 2001 12:01:04 -0700 Received: from infoserve.nmt.edu (admin.nmt.edu [129.138.18.18]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55J11h18096 for ; Tue, 5 Jun 2001 12:01:02 -0700 Received: by admin.nmt.edu with Internet Mail Service (5.5.2653.19) id ; Tue, 5 Jun 2001 13:00:56 -0600 Message-ID: From: "Snyder, Ryan" To: "'netdev@oss.sgi.com'" Subject: arp cache issue Date: Tue, 5 Jun 2001 13:00:48 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 676 Lines: 23 Hello, I was wondering if any one can help me, I received this email address from Alan Cox. I am running CheckPoint Firewall under Linux 2.2.19. The Firewall is working fine, but on the interface that is connected to the Internet via a Cisco router has over 950 entries in the arp cache. I understand this is normal, but since there is only one route to the Internet, is there a way to not have Linux to an arp cache lookup, or even a setting to make the cache size much bigger? I have looked into running arpd, but I am kinda fuzzy about running daemon stuff in userspace; espically on a firewall. Any help is greatly appreciated. Thanks, Ryan Snyder Socorro, NM From owner-netdev@oss.sgi.com Tue Jun 5 12:14:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55JE3F19456 for netdev-outgoing; Tue, 5 Jun 2001 12:14:03 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55JE2h19452 for ; Tue, 5 Jun 2001 12:14:02 -0700 Received: (qmail 30632 invoked from network); 5 Jun 2001 19:13:53 -0000 Received: from pd9502492.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.80.36.146) by mail.bieringer.de with SMTP; 5 Jun 2001 19:13:53 -0000 Date: Tue, 05 Jun 2001 21:14:00 +0200 From: Peter Bieringer To: "Snyder, Ryan" cc: "'netdev@oss.sgi.com'" Subject: Re: arp cache issue Message-ID: <37080000.991768440@localhost> In-Reply-To: X-Mailer: Mulberry/2.0.8 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 949 Lines: 30 Hi, --On Tuesday, June 05, 2001 01:00:48 PM -0600 "Snyder, Ryan" wrote: > Hello, > I was wondering if any one can help me, I received this email > address from Alan Cox. > > I am running CheckPoint Firewall under Linux 2.2.19. Me too on a customers site. > The > Firewall is working fine, > but on the interface that is connected to the Internet via a Cisco > router has over 950 > entries in the arp cache. I understand this is normal, but since > there is only one > route to the Internet, is there a way to not have Linux to an arp > cache lookup, or even > a setting to make the cache size much bigger? Do you see any reason why you have 950 entries? A short look into the arp cache on a CP on Linux shows me only one entry, the default gateway (also here) Cisco router, nothing else (ok, some permanent for NAT issues, too). What IP addresses are shown in the ARP chache? Worldwide? Internal ones? Peter From owner-netdev@oss.sgi.com Tue Jun 5 12:46:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55JkY322419 for netdev-outgoing; Tue, 5 Jun 2001 12:46:34 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55JkWh22412 for ; Tue, 5 Jun 2001 12:46:33 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f55JkLr23109; Tue, 5 Jun 2001 22:46:21 +0300 Date: Tue, 5 Jun 2001 22:46:21 +0300 (EEST) From: Pekka Savola To: "Snyder, Ryan" cc: "'netdev@oss.sgi.com'" Subject: Re: arp cache issue In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 950 Lines: 23 On Tue, 5 Jun 2001, Snyder, Ryan wrote: > I am running CheckPoint Firewall under Linux 2.2.19. The Firewall is > working fine, > but on the interface that is connected to the Internet via a Cisco router > has over 950 > entries in the arp cache. I understand this is normal, but since there is > only one > route to the Internet, is there a way to not have Linux to an arp cache > lookup, or even > a setting to make the cache size much bigger? The arp entries should definitely not be on the Cisco interface, if everything is set up right. If you understand the consequences, add 'no ip proxy-arp' on Cisco interface configuration. If proxy arp is required, your network is probably designed badly. Sometimes it is really needed though. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jun 5 13:28:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55KSpV26031 for netdev-outgoing; Tue, 5 Jun 2001 13:28:51 -0700 Received: from circuit.moureaux.com (IDENT:root@m201-3-p47.warwick.net [208.242.201.152]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55KSnh26025 for ; Tue, 5 Jun 2001 13:28:49 -0700 Received: from localhost (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.11.2/8.11.2) with ESMTP id f55KRSk01494; Tue, 5 Jun 2001 16:27:28 -0400 Date: Tue, 5 Jun 2001 16:27:28 -0400 (EDT) From: Statux X-X-Sender: To: cc: Subject: Re: Where can i find 4.x BSD Source Code In-Reply-To: <65256A62.004CA258.00@sandesh.hss.hns.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1255 Lines: 37 I don't know exactly where.. but you might want to start with something like www.bsd.org. Now.. the implementations like FreeBSD, NetBSD, OpenBSD, etc, are based on 4.4 BSD Lite (which was like an x86 version of 4.4 BSD, etc). There might be links in that area to it.. other than that, I'd snoop around Berkeley's site, since that is who made BSD ;) On Tue, 5 Jun 2001 sndtrn27@hss.hns.com wrote: > > > > hi all, > > I am looking for 4.x BSD Source Code > > At present i want to implement redundancy > at socket level in the TCP Stack.. Can you please > help me with some docs, information in this regards > > any additional help in these feild will be appriciated. > i want to know > 1. The Data structures that are kept by the system for maintaining the > Connection. > 2. Kernel related data structures that are part of the TCP / IP stack. > 3. Any Documents, Links that can help us in getting with the procedure .as to > how it can be implemeted efficiently. > 4. Pros & cons in implementing such redundancy. > 5. kernel related other information as to which modules are interdependent to > this (If any). > 6. If any work is going in this regards, then what is the present status. & for > more detail whom shall then we refer to. > > -- -Statux From owner-netdev@oss.sgi.com Wed Jun 6 04:55:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f56BtXP27823 for netdev-outgoing; Wed, 6 Jun 2001 04:55:33 -0700 Received: from luna.tlmat.unican.es (luna.tlmat.unican.es [193.144.186.2]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f56BtCh27749 for ; Wed, 6 Jun 2001 04:55:30 -0700 Received: from centauro (centauro.tlmat.unican.es [193.144.186.27]) by luna.tlmat.unican.es with SMTP (8.7.6/8.7.1) id OAA26078 for ; Wed, 6 Jun 2001 14:19:14 +0200 (METDST) Message-ID: <006701c0ee7f$892f2740$1bba90c1@tlmat.unican.es> From: =?iso-8859-1?B?UmFt824gQWf8ZXJv?= To: Subject: TCP and SACK retransmissions Date: Wed, 6 Jun 2001 13:55:08 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0064_01C0EE90.4C2A0F20" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3044 Lines: 83 This is a multi-part message in MIME format. ------=_NextPart_000_0064_01C0EE90.4C2A0F20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear all, I'm doing a quite exhaustive analysis of the TCP implementation = within 2.2.x kernel, and I have found an specific issue that I'm not = able to solve by myself. I know that now you will spend your effort in = the 2.4.x kernel. However I would like to share my doubt with you and I = would really appreciate any piece of help with it. When a Duplicate ACK arrives, the tcp_fast_retrans function is = called. Aparentely, this function does not trigger any retransmission = unless tp->dup_acks =3D=3D 3 or tp->fackets_out > 3. In some ocassions = this is the behaviour I see (by tcpdump captures), but in other cases, = the first dupack triggers a retransmission, although the number of = sacked segments is only two. I have tried to see why this retransmission = is trigerred, but I can't find it. Can anybody put some light in this = tunnel :-) ? Thanks in advance and regards. Ram=F3n PD.- Please I wish to be personally CC'ed the answers/comments posted to = the list in response to my posting ------=_NextPart_000_0064_01C0EE90.4C2A0F20 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Dear all,
 
    I'm doing a quite exhaustive analysis of the TCP = implementation within 2.2.x kernel, and I have found an specific issue = that I'm=20 not able to solve by myself. I know that now you will spend your effort = in the=20 2.4.x kernel. However I would like to share my doubt with you and I = would really=20 appreciate any piece of help with it.
 
    When a Duplicate ACK arrives, the = tcp_fast_retrans=20 function is called. Aparentely, this function does not trigger any=20 retransmission unless tp->dup_acks =3D=3D 3 or tp->fackets_out = > 3. In some=20 ocassions this is the behaviour I see (by tcpdump captures), but in = other cases,=20 the first dupack triggers a retransmission, although the number of = sacked=20 segments is only two. I have tried to see why this retransmission is = trigerred,=20 but I can't find it. Can anybody put some light in this = tunnel :-)=20 ?
 
    Thanks in advance and regards.
 
    Ram=F3n
 
PD.- Please I wish to be personally CC'ed the=20 answers/comments posted to the list in response to my=20 posting
------=_NextPart_000_0064_01C0EE90.4C2A0F20-- From owner-netdev@oss.sgi.com Wed Jun 6 10:42:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f56HgIi07941 for netdev-outgoing; Wed, 6 Jun 2001 10:42:18 -0700 Received: from localhost.localdomain (cpu2747.adsl.bellglobal.com [207.236.55.216]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f56HgHh07936 for ; Wed, 6 Jun 2001 10:42:17 -0700 Received: (from rgb@localhost) by localhost.localdomain (8.12.0.Beta5/8.11.1) id f56Hpnrt000492; Wed, 6 Jun 2001 13:51:49 -0400 Date: Wed, 6 Jun 2001 13:51:49 -0400 From: Richard Guy Briggs To: netdev@oss.sgi.com Subject: dst cache cleared on netdev down? Message-ID: <20010606135149.I31244@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2525 Lines: 54 Hi all, I'm seeing oopses possibly coming from the attempted use of a dst chache entry after a device has been downed. Can someone affirm that when a device goes down, it takes out all the routing table entries for that device and it also takes out all the dst cache entries for that device? The problem stems from the current kludgy way that FreeS/WAN gets packets. FreeS/WAN currently "attaches" a physical device to an ipsec virtual device, so that when a packet is routed to that virtual device it eventually comes out after encryption, being sent to the physical device. This was actually the way that it happenned in 2.0 kernels, but with the advent of dst cache, it now does a routing table lookup again, attempting to use the physical device if a valid route exists. When that physical device goes down for any reason, we would simply take down the corresponding virtual device. This would have the effect of clearing all the routes that had been used to direct packets through the ipsec device. When the physical device came back up (in this case, ppp, using the roaring penguin userspace driver) packets for which secure tunnels had been set up were now being sent in the clear. The code was changed so that if the physical device went down, the virtual device would stay up but simply drop the packets until the physical device was re-attached, ensuring that packets were dropped rather than being sent in the clear. I am now getting oopses in neigh_connected_output() at 6f/b0, which could be dev->hard_header() or neigh->ops->queue_xmit(). If fact, I suspect neigh->ha, but don't know for certain. Is it possible that neigh->ha is bugus when it tries to evaluate it before calling dev->hard_header? I assume that the three assignments in the variable declarations are protected by the compiler and don't need to be in the body of the code to be checked before assignment? If any one of the variables from which they point are null, it will not cause an oops? Is this a bug in neigh_connected_output(), the way we are using it, or the way we are attempting to clean up after the physical device goes down? slainte mhath, RGB -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: From owner-netdev@oss.sgi.com Wed Jun 6 11:01:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f56I1R709817 for netdev-outgoing; Wed, 6 Jun 2001 11:01:27 -0700 Received: from localhost.localdomain (cpu2747.adsl.bellglobal.com [207.236.55.216]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f56I1Qh09814 for ; Wed, 6 Jun 2001 11:01:26 -0700 Received: (from rgb@localhost) by localhost.localdomain (8.12.0.Beta5/8.11.1) id f56IBjWI000578; Wed, 6 Jun 2001 14:11:45 -0400 Date: Wed, 6 Jun 2001 14:11:45 -0400 From: Richard Guy Briggs To: netdev@oss.sgi.com Subject: skb_pull, etc. panics. Message-ID: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 518 Lines: 14 Hi again, If this is an FAQ, can someone point me to the reasons that skb_push() and skb_put() panic rather than dropping the skb and complaining in the log? If not, why does it do that? slainte mhath, RGB -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: From owner-netdev@oss.sgi.com Thu Jun 7 04:16:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f57BGtJ18397 for netdev-outgoing; Thu, 7 Jun 2001 04:16:55 -0700 Received: from hindon.hss.co.in ([202.54.26.202]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f57BGnh18390 for ; Thu, 7 Jun 2001 04:16:50 -0700 Received: from sandesh.hss.hns.com (localhost [127.0.0.1]) by hindon.hss.co.in (8.10.0/8.10.0) with SMTP id f57BHtu15284 for ; Thu, 7 Jun 2001 16:47:55 +0530 (IST) Received: by sandesh.hss.hns.com(Lotus SMTP MTA v4.6.3 (733.2 10-16-1998)) id 65256A64.003C775F ; Thu, 7 Jun 2001 16:30:27 +0530 X-Lotus-FromDomain: HSS From: sndtrn27@hss.hns.com To: netdev@oss.sgi.com Message-ID: <65256A64.003C757E.00@sandesh.hss.hns.com> Date: Thu, 7 Jun 2001 16:34:49 +0530 Subject: to extract tcpcb datastructure from the kernel Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 688 Lines: 22 hi all, A connection is specified in the OPEN call by the local port and foreign socket arguments. In return, the TCP supplies a (short) local connection name by which the user refers to the connection in subsequent calls. There are several things that must be remembered about a connection. To store this information there is a data structure called a Transmission Control Block (TCB). The TCB contains information about the connection state, its associated local process, and feedback parameters about the connection's transmission properties. The TCB is maintained on a per-connection basis. i want to know how can i extract this datastructure from the kernel.???? rajiv From owner-netdev@oss.sgi.com Thu Jun 7 05:30:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f57CUL528187 for netdev-outgoing; Thu, 7 Jun 2001 05:30:21 -0700 Received: from zmailer.org (mail.zmailer.org [194.252.70.162]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f57CUKh28184 for ; Thu, 7 Jun 2001 05:30:20 -0700 Received: (mea@zmailer.org) by mail.zmailer.org id ; Thu, 7 Jun 2001 15:30:09 +0300 Date: Thu, 7 Jun 2001 15:30:09 +0300 From: Matti Aarnio To: sndtrn27@hss.hns.com Cc: netdev@oss.sgi.com Subject: Re: to extract tcpcb datastructure from the kernel Message-ID: <20010607153009.I5947@mea-ext.zmailer.org> References: <65256A64.003C757E.00@sandesh.hss.hns.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <65256A64.003C757E.00@sandesh.hss.hns.com>; from sndtrn27@hss.hns.com on Thu, Jun 07, 2001 at 04:34:49PM +0530 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1093 Lines: 27 On Thu, Jun 07, 2001 at 04:34:49PM +0530, sndtrn27@hss.hns.com wrote: > hi all, I am sorry, but you have been reading BSD TCP protocol stack descriptions, and (un?)fortunately Linux and BSD have entirely separate network stack sources. > A connection is specified in the OPEN call by the local port and > foreign socket arguments. In return, the TCP supplies a (short) > local connection name by which the user refers to the connection > in subsequent calls. There are several things that must be remembered > about a connection. To store this information there is > a data structure called a Transmission Control Block (TCB). > > The TCB contains information about the connection state, its associated > local process, and feedback parameters about the connection's transmission > properties. The TCB is maintained on a per-connection basis. > > i want to know how can i extract this datastructure from the kernel.???? Browse thru how the Linux kernel implements /proc/net/tcp and you will find those socket state objects deep inside the system. > rajiv /Matti Aarnio From owner-netdev@oss.sgi.com Thu Jun 7 17:53:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f580rx104981 for netdev-outgoing; Thu, 7 Jun 2001 17:53:59 -0700 Received: from saw.sw.com.sg (cc625987-a.ewndsr1.nj.home.com [24.180.76.171] (may be forged)) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f580rwh04976 for ; Thu, 7 Jun 2001 17:53:58 -0700 Received: (qmail 26420 invoked by uid 577); 8 Jun 2001 00:45:56 -0000 Message-ID: <20010607204556.A26392@saw.sw.com.sg> Date: Thu, 7 Jun 2001 20:45:56 -0400 From: Andrey Savochkin To: Julian Anastasov Cc: "David S. Miller" , ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: arpfilter merged, next part? References: <20010524175734.A23528@saw.sw.com.sg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: ; from "Julian Anastasov" on Sun, May 27, 2001 at 11:34:19PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2139 Lines: 61 Hi, Sorry that it took so long, I was moving... On Sun, May 27, 2001 at 11:34:19PM +0000, Julian Anastasov wrote: > > On Thu, 24 May 2001, Andrey Savochkin wrote: > > > You may specify any you IP for rt_src for outgoing routes, just be setting > > pref_src for them. > > If you put your shared IP (or omit pref_src), you'll certainly end up with a > > non-working configuration. > > Just set pref_src to an address that you want to appear in communications. > > No, the pref src from the original skb->dst appears as source > in our probes which is bad in my case. Of course, the preferred > address to the target (after ip_route_output) is our last chance. pref_src is an attribute of output routes only. We get it through rt_src of skb->dst only if skb->dst is the output route. It was you who pointed out first that the check if it's the output route is mandatory! > In my case, sometimes we try to probe for the nexthop when > sending IP packets, so in arp_solicit we can see these shared > addresses in the output route in skb->dst. This is the reason > I'm adding symmetric flag for the output routes. The scenario is: > > 192.168.0.100: shared > 10.0.0.1: world client (reachable through nexthop router, i.e. not > direct client) > 192.168.0.1: our nexthop router > > The packet: > 10.0.0.1 -> 192.168.0.100 > > Our probe: > who-has 192.168.0.1 tell 192.168.0.100 > ^^^^^^^^^^^^^ > > 192.168.0.100 appears as pref src in skb->dst, bad in my case, i.e. > if we use rt_src from skb->dst. So, we must fallback to the > preferred src in the route to the target (ip_route_output). We do fall back in `arp-filter-src2.patch' after the check if skb->dst is output route. [snip] > > Take a peek at > > ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/v2.4/route.generic > > (it's against 2.4.0) > > Very good. I looked in the new version. I see that you always > fallback to the preferred source address to the target in arp_solicit. > This is good for me :) Not sure what we break, though :) It's an old code. The current ideas and the patch that I sent are better. We need to call ip_route_output sometimes. Andrey From owner-netdev@oss.sgi.com Thu Jun 7 18:29:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f581T1i06639 for netdev-outgoing; Thu, 7 Jun 2001 18:29:01 -0700 Received: from saw.sw.com.sg (cc625987-a.ewndsr1.nj.home.com [24.180.76.171] (may be forged)) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f581T0h06636 for ; Thu, 7 Jun 2001 18:29:00 -0700 Received: (qmail 26445 invoked by uid 577); 8 Jun 2001 01:27:35 -0000 Message-ID: <20010607212735.B26392@saw.sw.com.sg> Date: Thu, 7 Jun 2001 21:27:35 -0400 From: Andrey Savochkin To: Julian Anastasov Cc: "David S. Miller" , ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: per-route arp control References: <20010524175734.A23528@saw.sw.com.sg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: ; from "Julian Anastasov" on Sun, May 27, 2001 at 11:41:59PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1032 Lines: 28 Hi, > 2. Part two: source address selection for our probes > > - semantic: "in our probes announce the preferred source address for > the target if the original route in the skb is marked noarp" I consider it superfluous, and I don't see any symmetry here :-) If you maintain the proper configuration, i.e. specify preferred source for output routes, you should be fine. > Sometimes we don't want to announce particular addresses, for > example, if they are marked hidden/noarp in local routes - the symmetry. > In this case we add noarp route and then fallback to the preferred > source address no matter it is marked as hidden. The RTCF_NOARP flag > in the output routes is checked in this case (arp_solicit): > > ip rule from ... > ip route add ... noarp > > Andrey, I see that in your current route.generic version > arp_solicit always fallbacks to the preferred source address to the As I've said that code wasn't completely correct. For the remaining of your code and RTCF_NOARP additions look fine for me. Andrey From owner-netdev@linux-xfs.sgi.com Fri Jun 8 11:24:24 2001 Received: (from mail@localhost) by linux-xfs.sgi.com (8.12.0.Beta5/8.12.0.Beta5) id f58IOOdU007338 for netdev-outgoing; Fri, 8 Jun 2001 11:24:24 -0700 X-Authentication-Warning: linux-xfs.sgi.com: mail set sender to owner-netdev@oss.sgi.com using -f Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by linux-xfs.sgi.com (8.12.0.Beta5/8.12.0.Beta5) with SMTP id f58IOL3D007335 for ; Fri, 8 Jun 2001 11:24:22 -0700 Received: from l.himel.bg (unamed.infotel.bg [212.39.68.18] (may be forged)) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id BAA04884 for ; Fri, 8 Jun 2001 01:41:29 -0700 (PDT) mail_from (ja@ssi.bg) Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.9.3/8.9.3) with ESMTP id LAA02402; Fri, 8 Jun 2001 11:40:21 +0300 Date: Fri, 8 Jun 2001 11:40:21 +0300 (EEST) From: Julian Anastasov X-Sender: ja@l To: Andrey Savochkin cc: "David S. Miller" , ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: per-route arp control In-Reply-To: <20010607212735.B26392@saw.sw.com.sg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4097 Lines: 89 Hello, On Thu, 7 Jun 2001, Andrey Savochkin wrote: > Hi, > > > 2. Part two: source address selection for our probes > > > > - semantic: "in our probes announce the preferred source address for > > the target if the original route in the skb is marked noarp" > > I consider it superfluous, and I don't see any symmetry here :-) > If you maintain the proper configuration, i.e. specify preferred source for > output routes, you should be fine. But I'm not fine :) I made tests and they show that my shared address is announced as source in our probes because it is present in skb->dst as pref_src :) You know, any local address can be in rt_src in these routes because the higher protocols fill skb->dst by using ip_route_output(&rt, world_ip, local_ip,...) while when we use ip_route_output(&rt, arp_target, 0,...) in arp_solicit we load rt_src with the preferred source to this target. So, my shared address is present in rt_src and I need a way to fallback to the second ip_route_output. For this, I can add ip rule + noarp route to mark the output route as "not useful" for our ARP probes. The symmetry is here: when we don't reply for one IP => we don't announce it in our probes. While other setups don't talk with such IP (I'm not sure why they simply not remove this IP) in my setup I still need to talk IP with this shared IP and this always leads the shared IP to be present in pref_src, bad for ARP. > > Sometimes we don't want to announce particular addresses, for > > example, if they are marked hidden/noarp in local routes - the symmetry. > > In this case we add noarp route and then fallback to the preferred > > source address no matter it is marked as hidden. The RTCF_NOARP flag > > in the output routes is checked in this case (arp_solicit): > > > > ip rule from ... > > ip route add ... noarp > > > > Andrey, I see that in your current route.generic version > > arp_solicit always fallbacks to the preferred source address to the > > As I've said that code wasn't completely correct. No, when noarp is set for one route this flag does not control the both directions because one route is not used both to check the remote probes and in the announcement. For my setup I need to set this flag explicitly to two routes, not only to one. Other users may be will need only to set it only to one route (the local route). I remember that Jerome Etienne has something similar for VRRP (IFA_F_NO_NDISC). He will need to set noarp for a local route only, i.e. only to drop the replies. But now there is nothing that prevents this address to be announced in our probes (if any). I don't know whether this hurts him. For my setup the announcement is a problem because I talk IP with this ip. For my setup I have to add second rule+route to change the announcement too. This second unicast route must not allow announcement of the same shared addresses blocked by the first noarp local route. So, the symmetry is user defined. RTCF_NOARP does not work in the both directions, may be only when you want to ignore the probes for some unicast routes (proxy_arp?). Then for the announcement the preferred source to the arp target will be used. May be this is the only case where it is possible one route to be checked for both the ARP probes and replies. But you always can add more specific ip rules that will point to two routes, one with noarp flag and another without such flag. We are in the world of the "routes" :))) Yes, my example is hard to setup with this noarp flag (the second part, the announcement). May be there is alternative to control the announcement with routes? I'll tell it again: for my setup I'm not interested in the old route in skb->dst and I prefer always to fallback to ip_route_output but I'm not sure who relies on the pref_src in skb->dst. But in the current implementation (when we use skb->dst) I need a way to control the announcement too. May be we can find a better solution/semantic. > For the remaining of your code and RTCF_NOARP additions look fine for me. > > Andrey Regards -- Julian Anastasov From owner-netdev@linux-xfs.sgi.com Fri Jun 8 13:05:08 2001 Received: (from mail@localhost) by linux-xfs.sgi.com (8.12.0.Beta5/8.12.0.Beta5) id f58K58gS019022 for netdev-outgoing; Fri, 8 Jun 2001 13:05:08 -0700 X-Authentication-Warning: linux-xfs.sgi.com: mail set sender to owner-netdev@oss.sgi.com using -f Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by linux-xfs.sgi.com (8.12.0.Beta5/8.12.0.Beta5) with SMTP id f58K4a3D018952 for ; Fri, 8 Jun 2001 13:05:02 -0700 Received: from l.himel.bg (unamed.infotel.bg [212.39.68.18] (may be forged)) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id AAA03058 for ; Fri, 8 Jun 2001 00:58:25 -0700 (PDT) mail_from (ja@ssi.bg) Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.9.3/8.9.3) with ESMTP id KAA01553; Fri, 8 Jun 2001 10:57:27 +0300 Date: Fri, 8 Jun 2001 10:57:27 +0300 (EEST) From: Julian Anastasov X-Sender: ja@l To: Andrey Savochkin cc: "David S. Miller" , ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: arpfilter merged, next part? In-Reply-To: <20010607204556.A26392@saw.sw.com.sg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2161 Lines: 68 Hello, On Thu, 7 Jun 2001, Andrey Savochkin wrote: > > No, the pref src from the original skb->dst appears as source > > in our probes which is bad in my case. Of course, the preferred > > address to the target (after ip_route_output) is our last chance. > > pref_src is an attribute of output routes only. > We get it through rt_src of skb->dst only if skb->dst is the output route. > It was you who pointed out first that the check if it's the output route is > mandatory! Yes, but in my case I don't like sometimes the pref_src in skb->dst. > > In my case, sometimes we try to probe for the nexthop when > > sending IP packets, so in arp_solicit we can see these shared > > addresses in the output route in skb->dst. This is the reason > > I'm adding symmetric flag for the output routes. The scenario is: > > > > 192.168.0.100: shared > > 10.0.0.1: world client (reachable through nexthop router, i.e. not > > direct client) > > 192.168.0.1: our nexthop router > > > > The packet: > > 10.0.0.1 -> 192.168.0.100 > > > > Our probe: > > who-has 192.168.0.1 tell 192.168.0.100 > > ^^^^^^^^^^^^^ > > > > 192.168.0.100 appears as pref src in skb->dst, bad in my case, i.e. > > if we use rt_src from skb->dst. So, we must fallback to the > > preferred src in the route to the target (ip_route_output). > > We do fall back in `arp-filter-src2.patch' > after the check if skb->dst is output route. Yes, this patch does not like all input routes but my setup does not like some of the output routes in skb->dst too :) > [snip] > > > Take a peek at > > > ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/v2.4/route.generic > > > (it's against 2.4.0) > > > > Very good. I looked in the new version. I see that you always > > fallback to the preferred source address to the target in arp_solicit. > > This is good for me :) Not sure what we break, though :) > > It's an old code. > The current ideas and the patch that I sent are better. > We need to call ip_route_output sometimes. And I want to control this too, i.e. to distinguish the output routes in skb->dst and to change the announced IP. > Andrey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Sat Jun 9 19:09:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5A29fo08854 for netdev-outgoing; Sat, 9 Jun 2001 19:09:41 -0700 Received: from saw.sw.com.sg (cc1074780-a.ewndsr1.nj.home.com [24.180.76.171]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5A29cV08851 for ; Sat, 9 Jun 2001 19:09:39 -0700 Received: (qmail 3996 invoked by uid 577); 10 Jun 2001 02:07:59 -0000 Message-ID: <20010609220759.A3978@saw.sw.com.sg> Date: Sat, 9 Jun 2001 22:07:59 -0400 From: Andrey Savochkin To: torvalds@transmeta.com Cc: Bogdan Costescu , Jeff Garzik , Alan Cox , Mark Frazer , Pete Zaitcev , Linux Kernel Mailing List , netdev@oss.sgi.com Subject: eepro100 security fix [was: Re: MII access] References: <3B1A2982.C53B159C@mandrakesoft.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="k+w/mQv8wyuph6w0" X-Mailer: Mutt 0.93.2i In-Reply-To: ; from "Bogdan Costescu" on Tue, Jun 05, 2001 at 11:07:06AM Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1627 Lines: 45 --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Linus, Please apply the attached patch. It fixes a security problem of user-controlled access to the card ports from a non-privileged ioctl which should have read-only semantics. Best regards Andrey On Tue, Jun 05, 2001 at 11:07:06AM +0200, Bogdan Costescu wrote: > On Sun, 3 Jun 2001, Jeff Garzik wrote: > > > Bogdan Costescu wrote: > > > With clearer mind, I have to make some a correction to one of the previous > > > messages: the problem of not checking arguments range does not apply to > > > 3c59x which has in the ioctl function '& 0x1f' for both transceiver number > > > and register number. However, eepro100 and tulip don't do that. (I'm > > > checking now with 2.4.3 from Mandrake 8, but I don't think that there were > > > recent changes in these areas). > > > > half right -- tulip does this for the phy id but not the MII register > > number. I'll fix that up. Please bug Andrey about fixing up > > eepro100... --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=mii-access1 --- drivers/net/eepro100.c.prev Sat Jan 27 05:07:13 2001 +++ drivers/net/eepro100.c Wed Jun 6 22:26:03 2001 @@ -1913,7 +1913,7 @@ timer routine. 2000/05/09 SAW */ saved_acpi = pci_set_power_state(sp->pdev, 0); t = del_timer_sync(&sp->timer); - data[3] = mdio_read(ioaddr, data[0], data[1]); + data[3] = mdio_read(ioaddr, data[0] & 0x1f, data[1] & 0x1f); if (t) add_timer(&sp->timer); /* may be set to the past --SAW */ pci_set_power_state(sp->pdev, saved_acpi); --k+w/mQv8wyuph6w0-- From owner-netdev@oss.sgi.com Sat Jun 9 22:44:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5A5iPB12243 for netdev-outgoing; Sat, 9 Jun 2001 22:44:25 -0700 Received: from g96069.scn-net.ne.jp (g96069.scn-net.ne.jp [210.231.96.69]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5A5iMV12239 for ; Sat, 9 Jun 2001 22:44:23 -0700 Received: from deisui.bug.org (localhost [127.0.0.1]) by localhost (8.12.0.Beta10/8.12.0.Beta10/Debian 8.12.0.Beta10) with ESMTP id f5A5iHX7004546 for ; Sun, 10 Jun 2001 14:44:20 +0900 To: netdev@oss.sgi.com Subject: PATCH: Path MTU discovery fix MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII From: Daiki Ueno Date: 10 Jun 2001 14:44:17 +0900 Message-ID: User-Agent: T-gnus/6.15.4 (based on Oort Gnus v0.04) (revision 01) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1180 Lines: 36 Since 2.4.4, Path MTU discovery doesn't work for me. In the function icmpv6_rcv, ICMPv6 header is extracted from skb->data, rather from skb->h.raw, so the value of hdr->icmp6_mtu is not intended and the call of the function rt6_pmtu_discovery makes no sense here. The attached patch reverts to the same behavior of 2.4.3. Regards, -- Daiki Ueno --- linux/net/ipv6/icmp.c~ Sun Jun 10 13:40:11 2001 +++ linux/net/ipv6/icmp.c Sun Jun 10 13:41:15 2001 @@ -559,18 +559,16 @@ case ICMPV6_PKT_TOOBIG: /* BUGGG_FUTURE: if packet contains rthdr, we cannot update standard destination cache. Seems, only "advanced" destination cache will allow to solve this problem --ANK (980726) */ - if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) - goto discard_it; - hdr = (struct icmp6hdr *) skb->data; orig_hdr = (struct ipv6hdr *) (hdr + 1); - rt6_pmtu_discovery(&orig_hdr->daddr, &orig_hdr->saddr, dev, - ntohl(hdr->icmp6_mtu)); + if (pskb_may_pull(skb, sizeof(struct ipv6hdr))) + rt6_pmtu_discovery(&orig_hdr->daddr, &orig_hdr->saddr, dev, + ntohl(hdr->icmp6_mtu)); /* * Drop through to notify */ case ICMPV6_DEST_UNREACH: From owner-netdev@oss.sgi.com Sat Jun 9 23:58:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5A6wkK12977 for netdev-outgoing; Sat, 9 Jun 2001 23:58:46 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5A6wiV12974 for ; Sat, 9 Jun 2001 23:58:44 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id XAA11148; Sat, 9 Jun 2001 23:58:29 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15139.6804.985808.413675@pizda.ninka.net> Date: Sat, 9 Jun 2001 23:58:28 -0700 (PDT) To: Daiki Ueno Cc: netdev@oss.sgi.com Subject: Re: PATCH: Path MTU discovery fix In-Reply-To: References: X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 723 Lines: 21 Daiki Ueno writes: > Since 2.4.4, Path MTU discovery doesn't work for me. > > In the function icmpv6_rcv, ICMPv6 header is extracted from skb->data, > rather from skb->h.raw, so the value of hdr->icmp6_mtu is not intended > and the call of the function rt6_pmtu_discovery makes no sense here. > > The attached patch reverts to the same behavior of 2.4.3. It may revert to the behavior of 2.4.3 but it is a broken change. pskb_may_pull() can change all of the header pointers of the skb on you, so it has to reload skb->h.raw or whatever to be correct. It also must jump to discard if pskb_may_pull() fails, none of the other code below that call is valid if it fails. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Jun 10 00:54:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5A7sdx13802 for netdev-outgoing; Sun, 10 Jun 2001 00:54:39 -0700 Received: from g96069.scn-net.ne.jp (g96069.scn-net.ne.jp [210.231.96.69]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5A7sbV13799 for ; Sun, 10 Jun 2001 00:54:38 -0700 Received: from deisui.bug.org (localhost [127.0.0.1]) by localhost (8.12.0.Beta10/8.12.0.Beta10/Debian 8.12.0.Beta10) with ESMTP id f5A7sRWB000828; Sun, 10 Jun 2001 16:54:29 +0900 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: PATCH: Path MTU discovery fix References: <15139.6804.985808.413675@pizda.ninka.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII From: Daiki Ueno Date: 10 Jun 2001 16:54:27 +0900 In-Reply-To: <15139.6804.985808.413675@pizda.ninka.net> (David S. Miller's message of "Sat, 9 Jun 2001 23:58:28 -0700 (PDT)") Message-ID: User-Agent: T-gnus/6.15.4 (based on Oort Gnus v0.04) (revision 01) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1045 Lines: 29 >>>>> In <15139.6804.985808.413675@pizda.ninka.net> >>>>> "David S. Miller" wrote: > It may revert to the behavior of 2.4.3 but it is a broken > change. pskb_may_pull() can change all of the header > pointers of the skb on you, so it has to reload skb->h.raw > or whatever to be correct. > It also must jump to discard if pskb_may_pull() fails, none of > the other code below that call is valid if it fails. I understand. I've applied the attached patch which simply regards the value of skb->h.raw rather than skb->data and confirmed working. Thank you for reviewing the patch. -- Daiki Ueno --- linux/net/ipv6/icmp.c~ Sun Jun 10 16:07:53 2001 +++ linux/net/ipv6/icmp.c Sun Jun 10 16:08:44 2001 @@ -564,7 +564,7 @@ */ if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) goto discard_it; - hdr = (struct icmp6hdr *) skb->data; + hdr = (struct icmp6hdr *) skb->h.raw; orig_hdr = (struct ipv6hdr *) (hdr + 1); rt6_pmtu_discovery(&orig_hdr->daddr, &orig_hdr->saddr, dev, ntohl(hdr->icmp6_mtu)); From owner-netdev@oss.sgi.com Sun Jun 10 02:20:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5A9Kdr15052 for netdev-outgoing; Sun, 10 Jun 2001 02:20:39 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5A9KcV15049 for ; Sun, 10 Jun 2001 02:20:38 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id CAA11434; Sun, 10 Jun 2001 02:20:36 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15139.15332.228341.675553@pizda.ninka.net> Date: Sun, 10 Jun 2001 02:20:36 -0700 (PDT) To: Daiki Ueno Cc: netdev@oss.sgi.com Subject: Re: PATCH: Path MTU discovery fix In-Reply-To: References: <15139.6804.985808.413675@pizda.ninka.net> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 222 Lines: 10 Daiki Ueno writes: > I understand. I've applied the attached patch which simply regards the > value of skb->h.raw rather than skb->data and confirmed working. Applied, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Jun 10 02:29:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5A9TtJ15307 for netdev-outgoing; Sun, 10 Jun 2001 02:29:55 -0700 Received: from u.domain.uli ([212.95.166.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5A9TrV15304 for ; Sun, 10 Jun 2001 02:29:53 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id f5ACTE403644; Sun, 10 Jun 2001 12:29:14 GMT Date: Sun, 10 Jun 2001 12:29:14 +0000 (GMT) From: Julian Anastasov X-Sender: To: Andrey Savochkin cc: "David S. Miller" , , Alexey Kuznetsov , Subject: Re: per-route arp control In-Reply-To: <20010608111211.B29098@saw.sw.com.sg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 430 Lines: 19 Hello, On Fri, 8 Jun 2001, Andrey Savochkin wrote: > Ok, I see what you mean. > rt_src is not always pref_src, it is the address that is used for this > particular flow. And that creating special exceptions for flows that has the > shared IP as the local source looks a bit inconvenient for you. > > Let me think a bit... ok, I'll make more tests but for now nothing new from me. Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Sun Jun 10 08:55:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5AFt4C26243 for netdev-outgoing; Sun, 10 Jun 2001 08:55:04 -0700 Received: from colorfullife.com (colorfullife.com [216.156.138.34]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5AFt2V26237 for ; Sun, 10 Jun 2001 08:55:02 -0700 Received: from dbl.localdomain (localhost [127.0.0.1]) by colorfullife.com (8.11.2/8.11.2) with ESMTP id f5AFw3q31165 for ; Sun, 10 Jun 2001 11:58:03 -0400 Received: from colorfullife.com (IDENT:manfred@clmsdev.localdomain [172.17.4.1]) by dbl.localdomain (8.11.2/8.11.2) with ESMTP id f5AExEi11728 for ; Sun, 10 Jun 2001 16:59:15 +0200 Message-ID: <3B238B31.38F6D3ED@colorfullife.com> Date: Sun, 10 Jun 2001 16:58:57 +0200 From: Manfred Spraul X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.5-ac12 i686) X-Accept-Language: en, de MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Q: (ab)using zerocopy for drivers with alignment contraints Content-Type: multipart/mixed; boundary="------------0FC597B8E0032B8DBCFA6DC2" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3554 Lines: 131 This is a multi-part message in MIME format. --------------0FC597B8E0032B8DBCFA6DC2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Several cheap busmaster nics only accept tx buffers that are 32-bit aligned. Currently they memcpy into transfer buffers. What about replacing that memcpy with csum_copy_partial_nocheck and enabling NETIF_F_{SG,HW_CSUM}? I've attached a beta patch against the 8139too driver. szc_copy_csum() isn't driver specific, perhaps move it to linux/net/core/skbuff.c? -- Manfred --------------0FC597B8E0032B8DBCFA6DC2 Content-Type: text/plain; charset=us-ascii; name="patch-8139too-zc" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch-8139too-zc" --- 2.4/drivers/net/8139too.c Sun Jun 10 12:54:37 2001 +++ build-2.4/drivers/net/8139too.c Sun Jun 10 15:17:13 2001 @@ -656,6 +656,69 @@ #endif /* USE_IO_OPS */ +/***************************************************************************/ +/* zerocopy support */ +#include +#include +#include + +#ifdef MAX_SKB_FRAGS +#define ENABLE_SZC +#endif + +#ifdef ENABLE_SZC +#define SZC_FEATURES (NETIF_F_SG|NETIF_F_HW_CSUM) + +static int szc_copy_csum(void *tbuf, struct sk_buff *skb, int mask) +{ + int i, t, csum, csstart; + + if (( !(unsigned long)skb->data & mask) && + (skb->ip_summed != CHECKSUM_HW) && + (skb_shinfo(skb)->nr_frags == 0) ) + return 0; + + t = skb->len - skb->data_len; + if (skb->ip_summed == CHECKSUM_HW) + csstart = skb->h.raw - skb->data; + else + csstart = t; + if (csstart > t) BUG(); + memcpy(tbuf, skb->data, csstart); + if (t != csstart) + csum = csum_partial_copy_nocheck(skb->data+csstart, tbuf+csstart, t-csstart, 0); + else + csum = 0; + for (i=0;inr_frags;i++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + void *ptr = kmap_skb_frag(frag); + if (skb->ip_summed == CHECKSUM_HW) + csum = csum_partial_copy_nocheck(ptr+frag->page_offset, tbuf+t, frag->size, csum); + else + memcpy(tbuf+t, ptr+frag->page_offset, frag->size); + kunmap_skb_frag(ptr); + t += frag->size; + } + if(skb->ip_summed == CHECKSUM_HW) { + int csstuff = csstart + skb->csum; + *((unsigned short*)(tbuf+csstuff)) = csum_fold(csum); + } + return 1; +} +#else +#define SZC_FEATURES (0) + +static int szc_copy_csum(void *tbuf, struct sk_buff *skb, int mask) +{ + if ( (unsigned long)skb->data & mask) + return 1; + + memcpy(tbuf, skb->data, skb->len); + return 0; +} +#endif +/* END */ +/***************************************************************************/ static const u16 rtl8139_intr_mask = PCIErr | PCSTimeout | RxUnderrun | RxOverflow | RxFIFOOver | @@ -927,6 +990,7 @@ dev->do_ioctl = mii_ioctl; dev->tx_timeout = rtl8139_tx_timeout; dev->watchdog_timeo = TX_TIMEOUT; + dev->features |= SZC_FEATURES; dev->irq = pdev->irq; @@ -1662,8 +1726,6 @@ netif_wake_queue (dev); } - - static int rtl8139_start_xmit (struct sk_buff *skb, struct net_device *dev) { struct rtl8139_private *tp = dev->priv; @@ -1677,9 +1739,9 @@ assert (tp->tx_info[entry].mapping == 0); tp->tx_info[entry].skb = skb; - if ((long) skb->data & 3) { /* Must use alignment buffer. */ + if (szc_copy_csum(tp->tx_buf[entry], skb, 3)) { + /* Using alignment buffer. */ /* tp->tx_info[entry].mapping = 0; */ - memcpy (tp->tx_buf[entry], skb->data, skb->len); RTL_W32 (TxAddr0 + (entry * 4), tp->tx_bufs_dma + (tp->tx_buf[entry] - tp->tx_bufs)); } else { --------------0FC597B8E0032B8DBCFA6DC2-- From owner-netdev@oss.sgi.com Sun Jun 10 09:48:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5AGm8W27242 for netdev-outgoing; Sun, 10 Jun 2001 09:48:08 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5AGm7V27239 for ; Sun, 10 Jun 2001 09:48:07 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 954D21F65; Sun, 10 Jun 2001 12:48:04 -0400 (EDT) Message-ID: <3B23A4BB.7B4567A3@mandrakesoft.com> Date: Sun, 10 Jun 2001 12:47:55 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre1 i686) X-Accept-Language: en MIME-Version: 1.0 To: Russell King Cc: Ben LaHaise , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 References: <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1471 Lines: 40 Russell King wrote: > Indeed. However, I don't believe user space should _rely_ on the flag. > The reason is that there are network cards out there where the only way > to get the link status _is_ to transmit a packet, even on 10baseT. > > PCNET is one example - the "oh my god my link is down" status bit is in > the transmit ring headers, not in an easily accessible register. > > The only interpretation user space can place on IFF_RUNNING for these > cards is that if its not set, packets will get dropped by the interface. > If its set, packets _may_ be dropped by the interface. These are the exception not the rule, though, so I don't think we should design primarily for them. On most decent cards, we can not only ask for link status from a register, but also get interrupts when link change occurs [though we may still need a timer for certain link states]. > [note I've not found anything in 2.4.5 where netif_carrier_ok prevents > the net layers queueing packets for an interface, and forwarding them > on for transmission]. we want netif_carrier_{on,off} to emit netlink messages. I don't know how DaveM would feel about such getting implemented in 2.4.x though, even if well tested. Note we went over netif_carrier_xxx and related issues not a week ago, IIRC Jeff P.S. netdev@oss.sgi.com added to cc. please cc there on net interface/driver issues... -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Sun Jun 10 09:56:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5AGuRH27533 for netdev-outgoing; Sun, 10 Jun 2001 09:56:27 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5AGuQV27530 for ; Sun, 10 Jun 2001 09:56:26 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id A61241F65; Sun, 10 Jun 2001 12:56:24 -0400 (EDT) Message-ID: <3B23A6AF.302F36F1@mandrakesoft.com> Date: Sun, 10 Jun 2001 12:56:15 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre1 i686) X-Accept-Language: en MIME-Version: 1.0 To: Manfred Spraul Cc: netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints References: <3B238B31.38F6D3ED@colorfullife.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 647 Lines: 22 Manfred Spraul wrote: > > Several cheap busmaster nics only accept tx buffers that are 32-bit > aligned. > > Currently they memcpy into transfer buffers. What about replacing that > memcpy with csum_copy_partial_nocheck and enabling NETIF_F_{SG,HW_CSUM}? > > I've attached a beta patch against the 8139too driver. > > szc_copy_csum() isn't driver specific, perhaps move it to > linux/net/core/skbuff.c? I'm definitely for reducing copies, so this patch looks nice. I would prefer to call it "single-copy" or something other than zero-copy, though... -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Sun Jun 10 10:35:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5AHZGJ29513 for netdev-outgoing; Sun, 10 Jun 2001 10:35:16 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5AHZBV29479 for ; Sun, 10 Jun 2001 10:35:11 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id B05691F65; Sun, 10 Jun 2001 13:35:09 -0400 (EDT) Message-ID: <3B23AFC3.71CE2FD2@mandrakesoft.com> Date: Sun, 10 Jun 2001 13:34:59 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre1 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linux Kernel Mailing List , netdev@oss.sgi.com Cc: "David S. Miller" Subject: PATCH: ethtool MII helpers Content-Type: multipart/mixed; boundary="------------004E02343E95CC6BA4690791" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 13285 Lines: 452 This is a multi-part message in MIME format. --------------004E02343E95CC6BA4690791 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Initial draft of a helper which uses generic elements present in several net drivers to implement ethtool ioctl support in a minimum amount of code. I have included a sample implementation in the epic100 driver, to illustrate how these helpers may be used. This should make it easier to implement support across 10/100 hardware which uses primarily an MII phy. Comments appreciated. -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | --------------004E02343E95CC6BA4690791 Content-Type: text/plain; charset=us-ascii; name="mii.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="mii.patch" Index: linux_2_4/include/linux/ethtool.h diff -u linux_2_4/include/linux/ethtool.h:1.1.1.4 linux_2_4/include/linux/ethtool.h:1.1.1.4.84.1 --- linux_2_4/include/linux/ethtool.h:1.1.1.4 Thu Apr 19 17:55:36 2001 +++ linux_2_4/include/linux/ethtool.h Fri Jun 8 21:16:58 2001 @@ -1,4 +1,4 @@ -/* $Id: ethtool.h,v 1.1.1.4 2001/04/20 00:55:36 jgarzik Exp $ +/* $Id: ethtool.h,v 1.1.1.4.84.1 2001/06/09 04:16:58 jgarzik Exp $ * ethtool.h: Defines for Linux ethtool. * * Copyright (C) 1998 David S. Miller (davem@redhat.com) @@ -34,13 +34,15 @@ char bus_info[32]; /* Bus info for this interface. For PCI * devices, use pci_dev->slot_name. */ char reserved1[32]; - char reserved2[32]; + char reserved2[28]; + u32 regdump_len; /* Amount of data from ETHTOOL_GREGS */ }; /* CMDs currently supported */ #define ETHTOOL_GSET 0x00000001 /* Get settings. */ #define ETHTOOL_SSET 0x00000002 /* Set settings, privileged. */ #define ETHTOOL_GDRVINFO 0x00000003 /* Get driver info. */ +#define ETHTOOL_GREGS 0x00000004 /* Get NIC registers, privileged. */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET Index: linux_2_4/include/linux/mii.h diff -u linux_2_4/include/linux/mii.h:1.1.1.1 linux_2_4/include/linux/mii.h:1.1.1.1.52.1 --- linux_2_4/include/linux/mii.h:1.1.1.1 Fri May 11 16:54:44 2001 +++ linux_2_4/include/linux/mii.h Sun Jun 10 10:26:44 2001 @@ -126,6 +126,33 @@ #define CSCONFIG_RESV4 0x4000 /* Unused... */ #define CSCONFIG_NDISABLE 0x8000 /* Disable NRZI */ + + +struct ethtool_mii_info { + struct net_device *dev; /* our net interface */ + void *useraddr; /* userspace addr to which we put data */ + + int phy_id; /* PHY we are addressing */ + + int bmcr; /* cached MII register values. */ + int bmsr; /* -1 means 'undefined', which usually */ + int advertising; /* means the implementation should read */ + int lpa; /* the values from hardware instead. */ + + int autoneg; /* 0 (disabled), 1 (enabled), -1 (ask hw) */ + unsigned int ignore; /* mask of medias we never support, */ + /* such as 100baseT4 */ + int speed; /* 10, 100, 1000 or -1 (ask hw) */ + int full_duplex; /* 0 (no), 1 (yes), -1 (ask hw) */ + unsigned int port; /* PORT_xxx from linux/ethtool.h */ + + int (*mdio_read) (struct net_device *dev, int phy_id, int location); + void (*mdio_write) (struct net_device *dev, int phy_id, int location, int val); +}; + +int mii_ethtool_gset (struct ethtool_mii_info *mii); +int mii_ethtool_sset (struct ethtool_mii_info *mii); + /** * mii_nway_result * @negotiated: value of MII ANAR and'd with ANLPAR Index: linux_2_4/drivers/net/mii.c diff -u /dev/null linux_2_4/drivers/net/mii.c:1.1.2.1 --- /dev/null Sun Jun 10 10:28:01 2001 +++ linux_2_4/drivers/net/mii.c Sun Jun 10 10:26:44 2001 @@ -0,0 +1,212 @@ +/* + * linux/drivers/net/mii.c + * Copyright 2001 Jeff Garzik + * + * This software may be used and distributed according to the terms + * of the GNU General Public License, incorporated herein by reference. + */ + +#include +#include +#include +#include +#include +#include + + +static void mii_fill_ethtool_cmd (struct net_device *dev, + struct ethtool_mii_info *mii, + struct ethtool_cmd *ecmd) +{ + unsigned int bmsr, bmcr, v, autoneg, advertising, lpa; + unsigned int negotiated, full_duplex, speed; + + memset(ecmd, 0, sizeof(*ecmd)); + + ecmd->cmd = ETHTOOL_GSET; + + if (mii->bmcr < 0) + bmcr = mii->bmcr = mii->mdio_read(dev, mii->phy_id, MII_BMCR); + else bmcr = mii->bmcr; + + if (mii->bmsr < 0) + bmsr = mii->bmsr = mii->mdio_read(dev, mii->phy_id, MII_BMSR); + else bmsr = mii->bmsr; + + if (mii->advertising < 0) + advertising = mii->advertising = + mii->mdio_read(dev, mii->phy_id, MII_ADVERTISE); + else advertising = mii->advertising; + + if (mii->lpa < 0) + lpa = mii->lpa = mii->mdio_read(dev, mii->phy_id, MII_LPA); + else lpa = mii->lpa; + + negotiated = advertising & lpa; + + if (mii->autoneg < 0) + autoneg = mii->autoneg = (bmcr & BMCR_ANENABLE) ? 1 : 0; + else autoneg = mii->autoneg; + + if (mii->full_duplex < 0) + full_duplex = mii->full_duplex = + mii_nway_result(negotiated) & LPA_DUPLEX; + else full_duplex = mii->full_duplex; + + if (mii->speed < 0) { + if (negotiated & LPA_100) + speed = mii->speed = 100; + else + speed = mii->speed = 10; + } else + speed = mii->speed; + + ecmd->supported = SUPPORTED_MII; + v = bmsr & ~mii->ignore; + if (v & BMSR_10HALF) + ecmd->supported |= SUPPORTED_10baseT_Half; + if (v & BMSR_10FULL) + ecmd->supported |= SUPPORTED_10baseT_Full; + if (v & BMSR_100HALF) + ecmd->supported |= SUPPORTED_100baseT_Half; + if (v & BMSR_100FULL) + ecmd->supported |= SUPPORTED_100baseT_Full; + if (bmsr & BMSR_ANEGCAPABLE) + ecmd->supported |= SUPPORTED_Autoneg; + else + autoneg = mii->autoneg = 0; + + ecmd->advertising = ADVERTISED_MII; + v = advertising & ~mii->ignore; + if (v & BMSR_10HALF) + ecmd->advertising |= ADVERTISED_10baseT_Half; + if (v & BMSR_10FULL) + ecmd->advertising |= ADVERTISED_10baseT_Full; + if (v & BMSR_100HALF) + ecmd->advertising |= ADVERTISED_100baseT_Half; + if (v & BMSR_100FULL) + ecmd->advertising |= ADVERTISED_100baseT_Full; + if (autoneg) { + ecmd->advertising |= ADVERTISED_Autoneg; + ecmd->autoneg = AUTONEG_ENABLE; + } else + ecmd->autoneg = AUTONEG_DISABLE; + + ecmd->speed = speed == 100 ? SPEED_100 : SPEED_10; + ecmd->duplex = full_duplex ? DUPLEX_FULL : DUPLEX_HALF; + ecmd->port = PORT_MII; + ecmd->phy_address = mii->phy_id; + ecmd->transceiver = XCVR_INTERNAL; +} + +int mii_ethtool_gset (struct ethtool_mii_info *mii) +{ + struct ethtool_cmd ecmd; + + if (mii->port != PORT_MII) + return -EOPNOTSUPP; + + mii_fill_ethtool_cmd(mii->dev, mii, &ecmd); + + if (copy_to_user(mii->useraddr, &ecmd, sizeof(ecmd))) + return -EFAULT; + + return 0; +} + +int mii_ethtool_sset (struct ethtool_mii_info *mii) +{ + struct net_device *dev = mii->dev; + struct ethtool_cmd in, out; + unsigned int advert, bmcr; + + if (copy_from_user (&in, mii->useraddr, sizeof (in))) + return -EFAULT; + mii_fill_ethtool_cmd (dev, mii, &out); + + if (in.port != out.port) { + if (copy_to_user(mii->useraddr, &in, sizeof(in))) + return -EFAULT; + mii->port = in.port; + return 0; + } + + /* we don't support changing phy address, tranceiver, + * or the interrupt mitigation stuff. + */ + if ((in.phy_address != out.phy_address) || + (in.transceiver != XCVR_INTERNAL) || + (in.maxtxpkt != out.maxtxpkt) || + (in.maxrxpkt != out.maxrxpkt)) + return -EOPNOTSUPP; + + advert = mii->advertising & ~ADVERTISE_ALL; + + /* NWAY autonegotiation enabled */ + if (in.autoneg == AUTONEG_ENABLE) { + bmcr = mii->bmcr | BMCR_ANENABLE; + + if (in.advertising & ADVERTISED_10baseT_Half) + advert |= ADVERTISE_10HALF; + if (in.advertising & ADVERTISED_10baseT_Full) + advert |= ADVERTISE_10FULL; + if (in.advertising & ADVERTISED_100baseT_Half) + advert |= ADVERTISE_100HALF; + if (in.advertising & ADVERTISED_100baseT_Full) + advert |= ADVERTISE_100FULL; + if (advert == (mii->advertising & ~ADVERTISE_ALL)) + return -EINVAL; + } + + /* NWAY autonegotiation disabled */ + else { + bmcr = mii->bmcr & ~BMCR_ANENABLE; + + if (in.speed == SPEED_100) + bmcr |= BMCR_SPEED100; + else bmcr &= ~BMCR_SPEED100; + + if (in.duplex == DUPLEX_FULL) + bmcr |= BMCR_FULLDPLX; + else bmcr &= ~BMCR_FULLDPLX; + + if (mii->bmsr & BMSR_10HALF) + advert |= ADVERTISE_10HALF; + if (mii->bmsr & BMSR_10FULL) + advert |= ADVERTISE_10FULL; + if (mii->bmsr & BMSR_100HALF) + advert |= ADVERTISE_100HALF; + if (mii->bmsr & BMSR_100FULL) + advert |= ADVERTISE_100FULL; + } + + if (advert != mii->advertising) { + bmcr |= BMCR_ANRESTART; + mii->mdio_write(dev, mii->phy_id, MII_ADVERTISE, advert); + mii->advertising = advert; + } + + /* some phys need autoneg dis/enabled separately from other settings */ + if ((bmcr & BMCR_ANENABLE) && (!(mii->bmcr & BMCR_ANENABLE))) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, + mii->bmcr | BMCR_ANENABLE | BMCR_ANRESTART); + bmcr &= ~BMCR_ANRESTART; + } else if ((!(bmcr & BMCR_ANENABLE)) && (mii->bmcr & BMCR_ANENABLE)) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, + mii->bmcr & ~BMCR_ANENABLE); + } + + if (bmcr != mii->bmcr) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, bmcr); + bmcr &= ~BMCR_ANRESTART; + mii->bmcr = bmcr; + } + + if (copy_to_user(mii->useraddr, &out, sizeof(out))) + return -EFAULT; + + return 0; +} + +EXPORT_SYMBOL(mii_ethtool_gset); +EXPORT_SYMBOL(mii_ethtool_sset); Index: linux_2_4/drivers/net/epic100.c diff -u linux_2_4/drivers/net/epic100.c:1.1.1.35 linux_2_4/drivers/net/epic100.c:1.1.1.35.42.3 --- linux_2_4/drivers/net/epic100.c:1.1.1.35 Sat May 19 18:56:00 2001 +++ linux_2_4/drivers/net/epic100.c Sun Jun 10 10:26:44 2001 @@ -45,13 +45,16 @@ * { fill me in } LK1.1.8: - * ethtool support (jgarzik) + * ethtool driver info support (jgarzik) + LK1.1.9: + * ethtool media get/set support (jgarzik) + */ #define DRV_NAME "epic100" -#define DRV_VERSION "1.11+LK1.1.8" -#define DRV_RELDATE "May 18, 2001" +#define DRV_VERSION "1.11+LK1.1.9" +#define DRV_RELDATE "June 10, 2001" /* The user-configurable values. @@ -116,6 +119,7 @@ #include #include #include +#include #include #include #include @@ -135,6 +139,11 @@ MODULE_PARM(rx_copybreak, "i"); MODULE_PARM(options, "1-" __MODULE_STRING(MAX_UNITS) "i"); MODULE_PARM(full_duplex, "1-" __MODULE_STRING(MAX_UNITS) "i"); +MODULE_PARM_DESC(debug, "EPIC/100 debug level (0-5)"); +MODULE_PARM_DESC(max_interrupt_work, "EPIC/100 maximum events handled per interrupt"); +MODULE_PARM_DESC(options, "EPIC/100: Bits 0-3: media type, bit 4: full duplex"); +MODULE_PARM_DESC(rx_copybreak, "EPIC/100 copy breakpoint for copy-only-tiny-frames"); +MODULE_PARM_DESC(full_duplex, "EPIC/100 full duplex setting(s) (1)"); /* Theory of Operation @@ -1169,7 +1178,7 @@ if (pkt_len > PKT_BUF_SZ - 4) { printk(KERN_ERR "%s: Oversized Ethernet frame, status %x " "%d bytes.\n", - dev->name, pkt_len, status); + dev->name, status, pkt_len); pkt_len = 1514; } /* Check if the packet is long enough to accept without copying @@ -1344,27 +1353,72 @@ return; } -static int netdev_ethtool_ioctl(struct net_device *dev, void *useraddr) +static int netdev_ethtool_ioctl (struct net_device *dev, void *useraddr) { struct epic_private *np = dev->priv; u32 ethcmd; - - if (copy_from_user(ðcmd, useraddr, sizeof(ethcmd))) + + if (copy_from_user (ðcmd, useraddr, sizeof (ethcmd))) return -EFAULT; + + switch (ethcmd) { + case ETHTOOL_GDRVINFO: + { + struct ethtool_drvinfo info = { ETHTOOL_GDRVINFO }; + strcpy (info.driver, DRV_NAME); + strcpy (info.version, DRV_VERSION); + strcpy (info.bus_info, np->pci_dev->slot_name); + if (copy_to_user (useraddr, &info, sizeof (info))) + return -EFAULT; + return 0; + } - switch (ethcmd) { - case ETHTOOL_GDRVINFO: { - struct ethtool_drvinfo info = {ETHTOOL_GDRVINFO}; - strcpy(info.driver, DRV_NAME); - strcpy(info.version, DRV_VERSION); - strcpy(info.bus_info, np->pci_dev->slot_name); - if (copy_to_user(useraddr, &info, sizeof(info))) - return -EFAULT; - return 0; + case ETHTOOL_GSET: + case ETHTOOL_SSET: + { + struct ethtool_mii_info info = { + dev: dev, + useraddr: useraddr, + phy_id: np->phys[0], + bmcr: -1, + bmsr: -1, + lpa: -1, + advertising: np->advertising, + autoneg: -1, + ignore: ADVERTISE_100BASE4, + speed: -1, + full_duplex: np->full_duplex, + port: PORT_MII, + mdio_read: mdio_read, + mdio_write: mdio_write, + }; + int rc; + unsigned int changed = 0; + + if (ethcmd == ETHTOOL_GSET) + rc = mii_ethtool_gset (&info); + else + rc = mii_ethtool_sset (&info); + + if (np->advertising != info.advertising) { + np->advertising = info.advertising; + changed = 1; + } + if (np->full_duplex != info.full_duplex) { + np->full_duplex = info.full_duplex; + changed = 1; + } + + if (changed) + check_media (dev); + + return rc; + } + + default: + break; } - } - return -EOPNOTSUPP; } --------------004E02343E95CC6BA4690791-- From owner-netdev@oss.sgi.com Sun Jun 10 15:23:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5AMNji08759 for netdev-outgoing; Sun, 10 Jun 2001 15:23:45 -0700 Received: from lacrosse.corp.redhat.com (host154.207-175-42.redhat.com [207.175.42.154]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5AMNhV08748 for ; Sun, 10 Jun 2001 15:23:43 -0700 Received: from toomuch.toronto.redhat.com (IDENT:bcrl@toomuch.toronto.redhat.com [172.16.14.22]) by lacrosse.corp.redhat.com (8.9.3/8.9.3) with ESMTP id SAA28109; Sun, 10 Jun 2001 18:23:33 -0400 Date: Sun, 10 Jun 2001 18:23:33 -0400 (EDT) From: Ben LaHaise X-X-Sender: To: Jeff Garzik cc: Russell King , Andrew Morton , , Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 In-Reply-To: <3B23A4BB.7B4567A3@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 720 Lines: 18 On Sun, 10 Jun 2001, Jeff Garzik wrote: > rmk wrote: > > [note I've not found anything in 2.4.5 where netif_carrier_ok prevents > > the net layers queueing packets for an interface, and forwarding them > > on for transmission]. > > we want netif_carrier_{on,off} to emit netlink messages. I don't know > how DaveM would feel about such getting implemented in 2.4.x though, > even if well tested. There are a lot of places that make the assumption that packets transmitted after ifconfig eth0 .... up returns will hit the wire, and yes, anything that does make that assumption is indeed broken. That said, adding an extra 30s of DNS timeouts and a few more seconds of rpc timeouts there to bootup is painful. -ben From owner-netdev@oss.sgi.com Sun Jun 10 17:48:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B0mmq32403 for netdev-outgoing; Sun, 10 Jun 2001 17:48:48 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B0mlV32381 for ; Sun, 10 Jun 2001 17:48:47 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id RAA14060; Sun, 10 Jun 2001 17:48:34 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15140.5474.324005.550559@pizda.ninka.net> Date: Sun, 10 Jun 2001 17:48:34 -0700 (PDT) To: Manfred Spraul Cc: netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints In-Reply-To: <3B238B31.38F6D3ED@colorfullife.com> References: <3B238B31.38F6D3ED@colorfullife.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 717 Lines: 22 Manfred Spraul writes: > Several cheap busmaster nics only accept tx buffers that are 32-bit > aligned. I'm going to assume that it is safe to bet that such cards cannot take multiple buffers for a TX packet too. Because if they could, then we could do something like copy the header forward a few bytes to get it aligned, and set up two buffer pointers into the packet such that the 32-bit alignment requirement is met. There'd be some difficulty with SKB sharing... BTW, a routine exists already doing what you propose, only to user space. Make skb_copy_datagram{,_iovec}_kernel(), export these routines to modules, and I'd be more than happy to accept that patch. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Jun 10 17:50:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B0oM832687 for netdev-outgoing; Sun, 10 Jun 2001 17:50:22 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B0oLV32684 for ; Sun, 10 Jun 2001 17:50:21 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id RAA14065; Sun, 10 Jun 2001 17:50:15 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15140.5575.348371.147149@pizda.ninka.net> Date: Sun, 10 Jun 2001 17:50:15 -0700 (PDT) To: Jeff Garzik Cc: Manfred Spraul , netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints In-Reply-To: <3B23A6AF.302F36F1@mandrakesoft.com> References: <3B238B31.38F6D3ED@colorfullife.com> <3B23A6AF.302F36F1@mandrakesoft.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 220 Lines: 11 Jeff Garzik writes: > I would prefer to call it "single-copy" or something other than > zero-copy, though... Call it skb_copy_and_csum_iovec_kernel() which is what it is :-) Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Jun 10 17:53:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B0rOY00882 for netdev-outgoing; Sun, 10 Jun 2001 17:53:24 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B0rNV00879 for ; Sun, 10 Jun 2001 17:53:23 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id RAA14084; Sun, 10 Jun 2001 17:53:22 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15140.5762.589629.252904@pizda.ninka.net> Date: Sun, 10 Jun 2001 17:53:22 -0700 (PDT) To: Jeff Garzik Cc: Russell King , Ben LaHaise , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 In-Reply-To: <3B23A4BB.7B4567A3@mandrakesoft.com> References: <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> <3B23A4BB.7B4567A3@mandrakesoft.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 628 Lines: 18 Jeff Garzik writes: > > [note I've not found anything in 2.4.5 where netif_carrier_ok prevents > > the net layers queueing packets for an interface, and forwarding them > > on for transmission]. > > we want netif_carrier_{on,off} to emit netlink messages. I don't know > how DaveM would feel about such getting implemented in 2.4.x though, > even if well tested. If someone sent me patches which did this (and minded the restrictions, if any, this adds to the execution contexts in which the carrier on/off stuff can be invoked) I would consider the patch seriously for 2.4.x Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Jun 10 18:07:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B173h02935 for netdev-outgoing; Sun, 10 Jun 2001 18:07:03 -0700 Received: from vaio.greennet (adsl-151-196-235-239.baltmd.adsl.bellatlantic.net [151.196.235.239]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B172V02930 for ; Sun, 10 Jun 2001 18:07:02 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id VAA30159; Sun, 10 Jun 2001 21:14:52 -0400 Date: Sun, 10 Jun 2001 21:13:30 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints In-Reply-To: <15140.5474.324005.550559@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1214 Lines: 30 On Sun, 10 Jun 2001, David S. Miller wrote: > Manfred Spraul writes: > > Several cheap busmaster nics only accept tx buffers that are 32-bit > > aligned. > > I'm going to assume that it is safe to bet that such cards cannot take > multiple buffers for a TX packet too. Incorrect assumption. > Because if they could, then we could do something like copy the header > forward a few bytes to get it aligned, and set up two buffer pointers > into the packet such that the 32-bit alignment requirement is met. If it were that easy, don't you think the device driver writer would have used the same technique to avoid the bulk copy? There wouldn't be an alignment requirement if it were not built into the hardware. The usual reason for the limitation is that the Tx FIFO is 32 bits wide, and doesn't have byte rotate circuitry. The byte alignment will always remain the same. In software this appears as a four byte rounding of the Tx fragment length. Luckily, Tx alignment is required less commonly than Rx buffer alignment. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From owner-netdev@oss.sgi.com Sun Jun 10 20:18:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B3I2i13960 for netdev-outgoing; Sun, 10 Jun 2001 20:18:02 -0700 Received: from mta4.rcsntx.swbell.net (mta4.rcsntx.swbell.net [151.164.30.28]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B3I0V13957 for ; Sun, 10 Jun 2001 20:18:00 -0700 Received: from hofmann1 ([64.216.141.121]) by mta4.rcsntx.swbell.net (Sun Internet Mail Server sims.3.5.2000.03.23.18.03.p10) with ESMTP id <0GEQ008FXX62VJ@mta4.rcsntx.swbell.net> for netdev@oss.sgi.com; Sun, 10 Jun 2001 22:18:03 -0500 (CDT) Date: Sun, 10 Jun 2001 22:17:46 -0500 From: "Glenn C. Hofmann" Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 In-reply-to: <15140.5762.589629.252904@pizda.ninka.net> To: Jeff Garzik , "David S. Miller" Cc: Russell King , Ben LaHaise , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Reply-to: ghofmann@pair.com Message-id: <3B23F20A.22574.10AD93@localhost> MIME-version: 1.0 X-Mailer: Pegasus Mail for Win32 (v3.12c) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT References: <3B23A4BB.7B4567A3@mandrakesoft.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1257 Lines: 36 I have, as was suggested, built as a module, and get unresolved symbol do_softirq, so this appears to be another problem in this driver with 2.4.6-pre2. If I can help in any way, please let me know, although I am by no means a programmer, just a tester. Thanks. Glenn C. Hofmann On 10 Jun 2001, at 17:53 David S. Miller wrote: > > Jeff Garzik writes: > > > [note I've not found anything in 2.4.5 where netif_carrier_ok prevents > > > the net layers queueing packets for an interface, and forwarding them > > > on for transmission]. > > > > we want netif_carrier_{on,off} to emit netlink messages. I don't know > > how DaveM would feel about such getting implemented in 2.4.x though, > > even if well tested. > > If someone sent me patches which did this (and minded the > restrictions, if any, this adds to the execution contexts in > which the carrier on/off stuff can be invoked) I would consider > the patch seriously for 2.4.x > > Later, > David S. Miller > davem@redhat.com > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Sun Jun 10 20:25:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B3Pra14575 for netdev-outgoing; Sun, 10 Jun 2001 20:25:53 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B3PqV14572 for ; Sun, 10 Jun 2001 20:25:53 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 425831F6B; Sun, 10 Jun 2001 23:25:51 -0400 (EDT) Message-ID: <3B243A33.8B32FCD6@mandrakesoft.com> Date: Sun, 10 Jun 2001 23:25:39 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre1 i686) X-Accept-Language: en MIME-Version: 1.0 To: ghofmann@pair.com Cc: "David S. Miller" , Russell King , Ben LaHaise , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 References: <3B23A4BB.7B4567A3@mandrakesoft.com> <3B23F20A.22574.10AD93@localhost> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 552 Lines: 19 "Glenn C. Hofmann" wrote: > > I have, as was suggested, built as a module, and get unresolved symbol do_softirq, so > this appears to be another problem in this driver with 2.4.6-pre2. If I can help in any > way, please let me know, although I am by no means a programmer, just a tester. edit kernel/ksyms.c: -EXPORT_SYMBOL(do_softirq); +EXPORT_SYMBOL_NOVERS(do_softirq); and see if that helps. Errors about do_softirq are unrelated to a specific driver. -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Sun Jun 10 22:59:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B5xPC30495 for netdev-outgoing; Sun, 10 Jun 2001 22:59:25 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B5xMV30491 for ; Sun, 10 Jun 2001 22:59:22 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f5B5xB323303; Mon, 11 Jun 2001 08:59:11 +0300 Date: Mon, 11 Jun 2001 08:59:10 +0300 (EEST) From: Pekka Savola To: Jeff Garzik cc: Linux Kernel Mailing List , , "David S. Miller" Subject: Re: PATCH: ethtool MII helpers In-Reply-To: <3B23AFC3.71CE2FD2@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1198 Lines: 30 On Sun, 10 Jun 2001, Jeff Garzik wrote: > Initial draft of a helper which uses generic elements present in several > net drivers to implement ethtool ioctl support in a minimum amount of > code. In the patch there is: @@ -135,6 +139,11 @@ MODULE_PARM(rx_copybreak, "i"); MODULE_PARM(options, "1-" __MODULE_STRING(MAX_UNITS) "i"); MODULE_PARM(full_duplex, "1-" __MODULE_STRING(MAX_UNITS) "i"); +MODULE_PARM_DESC(debug, "EPIC/100 debug level (0-5)"); +MODULE_PARM_DESC(max_interrupt_work, "EPIC/100 maximum events handled per interrupt"); +MODULE_PARM_DESC(options, "EPIC/100: Bits 0-3: media type, bit 4: full duplex"); +MODULE_PARM_DESC(rx_copybreak, "EPIC/100 copy breakpoint for copy-only-tiny-frames"); +MODULE_PARM_DESC(full_duplex, "EPIC/100 full duplex setting(s) (1)"); I recall some discussion on a list (can't find it now) that driver specific comment like "EPIC/100" here notification on all _DESC's would be removed to a separate MODULE_ to make the comments more generic? -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Sun Jun 10 23:09:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5B69cK31466 for netdev-outgoing; Sun, 10 Jun 2001 23:09:38 -0700 Received: from ocs4.ocs-net (firewall.ocs.com.au [203.34.97.9]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5B69ZV31454 for ; Sun, 10 Jun 2001 23:09:35 -0700 Received: from ocs4.ocs-net (kaos@localhost) by ocs4.ocs-net (8.11.2/8.11.2) with ESMTP id f5B6AIV12460; Mon, 11 Jun 2001 16:10:18 +1000 X-Authentication-Warning: ocs4.ocs-net: kaos owned process doing -bs X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: Pekka Savola cc: Jeff Garzik , Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" Subject: Re: PATCH: ethtool MII helpers In-reply-to: Your message of "Mon, 11 Jun 2001 08:59:10 +0300." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 11 Jun 2001 16:10:18 +1000 Message-ID: <12459.992239818@ocs4.ocs-net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 500 Lines: 10 On Mon, 11 Jun 2001 08:59:10 +0300 (EEST), Pekka Savola wrote: >+MODULE_PARM_DESC(debug, "EPIC/100 debug level (0-5)"); >+MODULE_PARM_DESC(max_interrupt_work, "EPIC/100 maximum events handled per interrupt"); >I recall some discussion on a list (can't find it now) that driver >specific comment like "EPIC/100" here notification on all _DESC's would be >removed to a separate MODULE_ to make the comments more generic? MODULE_DESCRIPTION("EPIC/100 some text") would be better. From owner-netdev@oss.sgi.com Mon Jun 11 06:10:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BDApu20328 for netdev-outgoing; Mon, 11 Jun 2001 06:10:51 -0700 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BDAmV20324 for ; Mon, 11 Jun 2001 06:10:48 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id XAA08998; Mon, 11 Jun 2001 23:09:57 +1000 (EST) Message-ID: <3B24C185.824EBBE0@uow.edu.au> Date: Mon, 11 Jun 2001 23:03:01 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.5-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: Jeff Garzik , Russell King , Ben LaHaise , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 References: <3B23A4BB.7B4567A3@mandrakesoft.com>, <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> <3B23A4BB.7B4567A3@mandrakesoft.com> <15140.5762.589629.252904@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1287 Lines: 32 "David S. Miller" wrote: > > Jeff Garzik writes: > > > [note I've not found anything in 2.4.5 where netif_carrier_ok prevents > > > the net layers queueing packets for an interface, and forwarding them > > > on for transmission]. > > > > we want netif_carrier_{on,off} to emit netlink messages. I don't know > > how DaveM would feel about such getting implemented in 2.4.x though, > > even if well tested. > > If someone sent me patches which did this (and minded the > restrictions, if any, this adds to the execution contexts in > which the carrier on/off stuff can be invoked) I would consider > the patch seriously for 2.4.x It'd need to be callable from interrupt context - otherwise each device/driver which has link status change interrupts will need to implement some form of interrupt->process context trick. On the DHCP/DNS issue which Ben raised - MII-based NICs can take up to 3.5 seconds before they start sending packets, *after* their open() has returned success. This is within the letter of the law (ethernet can drop packets) but it'd be nicer to userspace if we were to not return from the open until the interface was actually usable. Jamal has a patch somewhere which does the netlink status notification. If he cares to share it I'll take a look. - From owner-netdev@oss.sgi.com Mon Jun 11 06:27:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BDRl621529 for netdev-outgoing; Mon, 11 Jun 2001 06:27:47 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BDRkV21525 for ; Mon, 11 Jun 2001 06:27:46 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id GAA16651; Mon, 11 Jun 2001 06:27:39 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15140.51018.942446.320621@pizda.ninka.net> Date: Mon, 11 Jun 2001 06:27:38 -0700 (PDT) To: Andrew Morton Cc: Jeff Garzik , Russell King , Ben LaHaise , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 In-Reply-To: <3B24C185.824EBBE0@uow.edu.au> References: <3B23A4BB.7B4567A3@mandrakesoft.com> <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> <15140.5762.589629.252904@pizda.ninka.net> <3B24C185.824EBBE0@uow.edu.au> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 408 Lines: 15 Andrew Morton writes: > It'd need to be callable from interrupt context - otherwise > each device/driver which has link status change interrupts > will need to implement some form of interrupt->process context > trick. Well, we could make the netif_carrier_*() implementation do the "interrupt->process context" trick. Jamal can feel free to post what he has. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Jun 11 06:32:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BDW8P21875 for netdev-outgoing; Mon, 11 Jun 2001 06:32:08 -0700 Received: from hindon.hss.co.in ([202.54.26.202]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BDW6V21871 for ; Mon, 11 Jun 2001 06:32:06 -0700 Received: from sandesh.hss.hns.com (localhost [127.0.0.1]) by hindon.hss.co.in (8.10.0/8.10.0) with SMTP id f5BDXGg12926 for ; Mon, 11 Jun 2001 19:03:16 +0530 (IST) Received: by sandesh.hss.hns.com(Lotus SMTP MTA v4.6.3 (733.2 10-16-1998)) id 65256A68.0048F15C ; Mon, 11 Jun 2001 18:46:43 +0530 X-Lotus-FromDomain: HSS From: sndtrn27@hss.hns.com To: netdev@oss.sgi.com Message-ID: <65256A68.0048EF45.00@sandesh.hss.hns.com> Date: Mon, 11 Jun 2001 18:46:34 +0530 Subject: arp packets over raw sockets in solaris Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 149 Lines: 12 hi all, is it possible to send arp packets over raw sockets in sun solaris as in linux. ? any help in regard will be appreciated. thankx rajiv From owner-netdev@oss.sgi.com Mon Jun 11 06:50:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BDo6T23116 for netdev-outgoing; Mon, 11 Jun 2001 06:50:06 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BDo5V23113 for ; Mon, 11 Jun 2001 06:50:05 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 4888D1F6C; Mon, 11 Jun 2001 09:50:03 -0400 (EDT) Message-ID: <3B24CC80.D880510@mandrakesoft.com> Date: Mon, 11 Jun 2001 09:49:52 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" Cc: Andrew Morton , Russell King , Ben LaHaise , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 References: <3B23A4BB.7B4567A3@mandrakesoft.com> <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> <15140.5762.589629.252904@pizda.ninka.net> <3B24C185.824EBBE0@uow.edu.au> <15140.51018.942446.320621@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 577 Lines: 19 "David S. Miller" wrote: > > Andrew Morton writes: > > It'd need to be callable from interrupt context - otherwise > > each device/driver which has link status change interrupts > > will need to implement some form of interrupt->process context > > trick. > > Well, we could make the netif_carrier_*() implementation do the > "interrupt->process context" trick. > > Jamal can feel free to post what he has. If we have any problems with context we can always use schedule_task() -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Mon Jun 11 07:29:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BET2t28526 for netdev-outgoing; Mon, 11 Jun 2001 07:29:02 -0700 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BESxV28520 for ; Mon, 11 Jun 2001 07:29:00 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id AAA21512; Tue, 12 Jun 2001 00:28:31 +1000 (EST) Message-ID: <3B24D3F0.F2B6DA76@uow.edu.au> Date: Tue, 12 Jun 2001 00:21:36 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-ac13 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: "David S. Miller" , Russell King , Ben LaHaise , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 References: <3B23A4BB.7B4567A3@mandrakesoft.com> <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> <15140.5762.589629.252904@pizda.ninka.net> <3B24C185.824EBBE0@uow.edu.au> <15140.51018.942446.320621@pizda.ninka.net> <3B24CC80.D880510@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 758 Lines: 21 Jeff Garzik wrote: > > "David S. Miller" wrote: > > > > Andrew Morton writes: > > > It'd need to be callable from interrupt context - otherwise > > > each device/driver which has link status change interrupts > > > will need to implement some form of interrupt->process context > > > trick. > > > > Well, we could make the netif_carrier_*() implementation do the > > "interrupt->process context" trick. > > > > Jamal can feel free to post what he has. > > If we have any problems with context we can always use schedule_task() Yep. With dev_hold() and dev_put() to avoid module removal races. One would also have to be sure that the right things happen if the interface is downed between the interrupt and execution of the schedule_task() callback. From owner-netdev@oss.sgi.com Mon Jun 11 09:05:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BG5LG10495 for netdev-outgoing; Mon, 11 Jun 2001 09:05:21 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BG5KV10492 for ; Mon, 11 Jun 2001 09:05:20 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id F17DD1F6A; Mon, 11 Jun 2001 12:05:14 -0400 (EDT) Message-ID: <3B24EC2F.175B088A@mandrakesoft.com> Date: Mon, 11 Jun 2001 12:05:03 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton Cc: "David S. Miller" , Russell King , Ben LaHaise , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 References: <3B23A4BB.7B4567A3@mandrakesoft.com> <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> <15140.5762.589629.252904@pizda.ninka.net> <3B24C185.824EBBE0@uow.edu.au> <15140.51018.942446.320621@pizda.ninka.net> <3B24CC80.D880510@mandrakesoft.com> <3B24D3F0.F2B6DA76@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1275 Lines: 37 Andrew Morton wrote: > > Jeff Garzik wrote: > > > > "David S. Miller" wrote: > > > > > > Andrew Morton writes: > > > > It'd need to be callable from interrupt context - otherwise > > > > each device/driver which has link status change interrupts > > > > will need to implement some form of interrupt->process context > > > > trick. > > > > > > Well, we could make the netif_carrier_*() implementation do the > > > "interrupt->process context" trick. > > > > > > Jamal can feel free to post what he has. > > > > If we have any problems with context we can always use schedule_task() > > Yep. With dev_hold() and dev_put() to avoid module removal > races. One would also have to be sure that the right things > happen if the interface is downed between the interrupt and > execution of the schedule_task() callback. Why not call MOD_INC_USE_COUNT and MOD_DEC_USE_COUNT? It makes it much more obvious you are closing a race related to modules, and it goes away when the module is built into the kernel. (as a tangent, I have run into cases where it would be nice to always have a module ref count, whether or not you were built into the kernel. this would be ok with me...) -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Mon Jun 11 09:18:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BGI3o10880 for netdev-outgoing; Mon, 11 Jun 2001 09:18:03 -0700 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BGI1V10877 for ; Mon, 11 Jun 2001 09:18:01 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id CAA05378; Tue, 12 Jun 2001 02:17:54 +1000 (EST) Message-ID: <3B24ED91.D08EE69B@uow.edu.au> Date: Tue, 12 Jun 2001 02:10:57 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-ac13 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: netdev@oss.sgi.com Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 References: <3B23A4BB.7B4567A3@mandrakesoft.com> <20010610093838.A13074@flint.arm.linux.org.uk> <20010610173419.B13164@flint.arm.linux.org.uk> <15140.5762.589629.252904@pizda.ninka.net> <3B24C185.824EBBE0@uow.edu.au> <15140.51018.942446.320621@pizda.ninka.net> <3B24CC80.D880510@mandrakesoft.com> <3B24D3F0.F2B6DA76@uow.edu.au> <3B24EC2F.175B088A@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1487 Lines: 40 [ List trimmed ] Jeff Garzik wrote: > > Andrew Morton wrote: > > > > Jeff Garzik wrote: > > > > > > "David S. Miller" wrote: > > > > > > > > Andrew Morton writes: > > > > > It'd need to be callable from interrupt context - otherwise > > > > > each device/driver which has link status change interrupts > > > > > will need to implement some form of interrupt->process context > > > > > trick. > > > > > > > > Well, we could make the netif_carrier_*() implementation do the > > > > "interrupt->process context" trick. > > > > > > > > Jamal can feel free to post what he has. > > > > > > If we have any problems with context we can always use schedule_task() > > > > Yep. With dev_hold() and dev_put() to avoid module removal > > races. One would also have to be sure that the right things > > happen if the interface is downed between the interrupt and > > execution of the schedule_task() callback. > > Why not call MOD_INC_USE_COUNT and MOD_DEC_USE_COUNT? It makes it much > more obvious you are closing a race related to modules, and it goes away > when the module is built into the kernel. It'd best be done inside netif_carrier_*(). So there one could use try_inc_mod_count(dev->owner). But that means an rmmod would fail if there was an event outstanding. With dev_hold(), which appears to be Alexey's answer to the module horrors, the rmmod caller will simply block until the callback has completed and will then see success, which is what we'd prefer to have happen. From owner-netdev@oss.sgi.com Mon Jun 11 09:47:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5BGl1r11812 for netdev-outgoing; Mon, 11 Jun 2001 09:47:01 -0700 Received: from colorfullife.com (colorfullife.com [216.156.138.34]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5BGl0V11805 for ; Mon, 11 Jun 2001 09:47:00 -0700 Received: from dbl.localdomain (localhost [127.0.0.1]) by colorfullife.com (8.11.2/8.11.2) with ESMTP id f5BGo6q10557; Mon, 11 Jun 2001 12:50:07 -0400 Received: from colorfullife.com (gw.cat5.localdomain [172.17.0.1]) by dbl.localdomain (8.11.2/8.11.2) with ESMTP id f5BGkvi12887; Mon, 11 Jun 2001 18:46:57 +0200 Message-ID: <3B24F601.D732B35E@colorfullife.com> Date: Mon, 11 Jun 2001 18:46:57 +0200 From: Manfred Spraul X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints References: <3B238B31.38F6D3ED@colorfullife.com> <15140.5474.324005.550559@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1235 Lines: 33 "David S. Miller" wrote: > > Manfred Spraul writes: > > Several cheap busmaster nics only accept tx buffers that are 32-bit > > aligned. > > I'm going to assume that it is safe to bet that such cards cannot take > multiple buffers for a TX packet too. > > Because if they could, then we could do something like copy the header > forward a few bytes to get it aligned, and set up two buffer pointers > into the packet such that the 32-bit alignment requirement is met. > > There'd be some difficulty with SKB sharing... > > BTW, a routine exists already doing what you propose, only to user > space. Make skb_copy_datagram{,_iovec}_kernel(), export these > routines to modules, and I'd be more than happy to accept that patch. > skb_copy_datagram & friends follow the fragment list. My function doesn't/mustn't follow skb_shinfo(skb)->frag_list. Should I still call it skb_copy_datagram{,_iovec}_kernel? I don't like functions with similar names and subtile differences. What about int skb_dblbuf_copy(void *buf, const struct sk_buff *skb, unsigned long addr_mask); * return value: 1 -> skb copied. * 0 -> copy unnecessary, data accessible with skb->data * addr_mask: bitmask with the required alignment mask. -- Manfred From owner-netdev@oss.sgi.com Mon Jun 11 20:15:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5C3FhU26810 for netdev-outgoing; Mon, 11 Jun 2001 20:15:43 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5C3FgV26807 for ; Mon, 11 Jun 2001 20:15:42 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id UAA19313; Mon, 11 Jun 2001 20:14:41 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15141.35105.583572.505567@pizda.ninka.net> Date: Mon, 11 Jun 2001 20:14:41 -0700 (PDT) To: Manfred Spraul Cc: netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints In-Reply-To: <3B24F601.D732B35E@colorfullife.com> References: <3B238B31.38F6D3ED@colorfullife.com> <15140.5474.324005.550559@pizda.ninka.net> <3B24F601.D732B35E@colorfullife.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 749 Lines: 21 Manfred Spraul writes: > skb_copy_datagram & friends follow the fragment list. My function > doesn't/mustn't follow skb_shinfo(skb)->frag_list. Should I still call > it skb_copy_datagram{,_iovec}_kernel? I don't like functions with > similar names and subtile differences. Why "mustn't it" follow the frag list? I think it would be "absolutely fantastic" if it did follow the frag list! Then we could optimize the forwarding of fragmented packets. There is no subtle difference, make it do _exactly_ what skb_copy_datagram{,_iovec}() does to userspace and name it how I've asked you to name it. What are you trying to avoid by not walking the frag_list? A single NULL pointer check? Get real :-) Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Jun 11 20:18:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5C3IO027073 for netdev-outgoing; Mon, 11 Jun 2001 20:18:24 -0700 Received: from mta4.rcsntx.swbell.net (mta4.rcsntx.swbell.net [151.164.30.28]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5C3INV27068 for ; Mon, 11 Jun 2001 20:18:23 -0700 Received: from hofmann1 ([64.216.141.121]) by mta4.rcsntx.swbell.net (Sun Internet Mail Server sims.3.5.2000.03.23.18.03.p10) with ESMTP id <0GES006C1RUIDN@mta4.rcsntx.swbell.net> for netdev@oss.sgi.com; Mon, 11 Jun 2001 22:18:19 -0500 (CDT) Date: Mon, 11 Jun 2001 22:17:59 -0500 From: "Glenn C. Hofmann" Subject: Re: 3C905b partial lockup in 2.4.5-pre5 and up to 2.4.6-pre1 In-reply-to: <3B243A33.8B32FCD6@mandrakesoft.com> To: ghofmann@pair.com, Jeff Garzik Cc: "David S. Miller" , Russell King , Ben LaHaise , Andrew Morton , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Reply-to: ghofmann@pair.com Message-id: <3B254397.16198.EB2CF@localhost> MIME-version: 1.0 X-Mailer: Pegasus Mail for Win32 (v3.12c) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 999 Lines: 32 Yes, thank you, that makes it load and it also works with ifconfig and dhcp as a module. Glenn C. Hofmann On 10 Jun 2001, at 23:25 Jeff Garzik wrote: > "Glenn C. Hofmann" wrote: > > > > I have, as was suggested, built as a module, and get unresolved symbol do_softirq, so > > this appears to be another problem in this driver with 2.4.6-pre2. If I can help in any > > way, please let me know, although I am by no means a programmer, just a tester. > > edit kernel/ksyms.c: > > -EXPORT_SYMBOL(do_softirq); > +EXPORT_SYMBOL_NOVERS(do_softirq); > > and see if that helps. > > Errors about do_softirq are unrelated to a specific driver. > > -- > Jeff Garzik | Andre the Giant has a posse. > Building 1024 | > MandrakeSoft | > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Tue Jun 12 02:07:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5C97Dw19256 for netdev-outgoing; Tue, 12 Jun 2001 02:07:13 -0700 Received: from motgate4.mot.com (motgate4.mot.com [144.189.100.102]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5C97CV19253 for ; Tue, 12 Jun 2001 02:07:12 -0700 Received: [from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate4.mot.com (motgate4 2.1) with ESMTP id CAA25393 for ; Tue, 12 Jun 2001 02:07:11 -0700 (MST)] Received: [from m-il06-r3.mot.com (m-il06-r3.mot.com [129.188.137.194]) by pobox.mot.com (MOT-pobox 2.0) with ESMTP id CAA12105 for ; Tue, 12 Jun 2001 02:07:11 -0700 (MST)] Received: from [140.101.173.9] by m-il06-r3.mot.com with ESMTP for netdev@oss.sgi.com; Tue, 12 Jun 2001 04:07:00 -0500 Received: (from root@localhost) by zorglub.crm.mot.com (8.8.8/8.8.8/crm-1.6) id LAA06676 for netdev@oss.sgi.com.DELIVER; Tue, 12 Jun 2001 11:07:07 +0200 (METDST) Received: from crm.mot.com (varagnat@riri.crm.mot.com [140.101.173.128]) by zorglub.crm.mot.com (8.8.8/8.8.8/crm-1.6) with ESMTP id LAA06570 for ; Tue, 12 Jun 2001 11:07:05 +0200 (METDST) Message-Id: <3B25DB45.565F8D96@crm.mot.com> Date: Tue, 12 Jun 2001 11:05:09 +0200 From: Emmanuel Varagnat Organization: Motorola X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.3 i686) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: sk_buff allocation Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 726 Lines: 19 I'm writing a module that is able to modify outgoing packets. This is done by registering a new entry in ptype_all. But my problem is that in dev_queue_xmit_nit the sk_buff is cloned and that my function get this clone. So my modification on skb->data isn't take into account by the ethernet driver. My idea was to do my modifications and then copy all my datas starting at skb->data so that nothing in the sk_buff is modified. But what am I doing if the buffer doesn't have enough room to support the new/modified data ? skb_cow or skb_copy_expand, for example, will return me a new sk_buff with a new buffer but how could I tell the system that it must "replace" the old sk_buff by this one ? Thanks -Emmanuel Varagnat From owner-netdev@oss.sgi.com Tue Jun 12 10:09:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5CH9iJ31258 for netdev-outgoing; Tue, 12 Jun 2001 10:09:44 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5CH9fV31255 for ; Tue, 12 Jun 2001 10:09:42 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f5CH9Pg06496; Tue, 12 Jun 2001 19:09:25 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id TAA24623; Tue, 12 Jun 2001 19:09:25 +0200 Date: Tue, 12 Jun 2001 19:09:25 +0200 (CEST) From: Bogdan Costescu To: Jeff Garzik cc: Linux Kernel Mailing List , , "David S. Miller" Subject: Re: PATCH: ethtool MII helpers In-Reply-To: <3B23AFC3.71CE2FD2@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3594 Lines: 106 On Sun, 10 Jun 2001, Jeff Garzik wrote: > Comments appreciated. Some general comments first, the others are spread through the code. - I don't know what the long-term plan is about ethtool vs. MII ioctl's. If you do plan to replace completely the MII ioctl's, there should be a way to access _all_ MII registers provided by the PHY, even if you do this in a restricted way (i.e. for CAP_NET_ADMIN only). There is also useful info in other registers than the 4 you have in your implementation. - You are proposing some caching for the MII registers. I suppose that you would like to have this code also working with whatever caching will be done for MII access that was recently discussed. Wouldn't this produce double caching under some circumstances ? + int speed; /* 10, 100, 1000 or -1 (ask hw) */ Please note that the comment specifies 1000, while the code in several places assumes only 2 possibilities: 10 and 100. + if (mii->autoneg < 0) + autoneg = mii->autoneg = (bmcr & BMCR_ANENABLE) ? 1 : 0; + else autoneg = mii->autoneg; You don't read anything from the hardware at this point. Why do you want caching ? Not related: I know that this comes from David Miller's older work, but wouldn't be possible to have a more uniform naming scheme ? You have BMCR_ANENABLE, but you have BMSR_ANEGCAPABLE... + if (mii->full_duplex < 0) + full_duplex = mii->full_duplex = + mii_nway_result(negotiated) & LPA_DUPLEX; + else full_duplex = mii->full_duplex; If autoneg. is disabled, I don't think that you always get useful info in 'negotiated'. Applies to the next chunk, too. + if (mii->speed < 0) { + if (negotiated & LPA_100) + speed = mii->speed = 100; + else + speed = mii->speed = 10; + } else + speed = mii->speed; That's one of the places where you don't have 1000... + ecmd->speed = speed == 100 ? SPEED_100 : SPEED_10; ... and that's the second. + ecmd->transceiver = XCVR_INTERNAL; I didn't understand what XCVR_INTERNAL should mean as opposed to XCVR_EXTERNAL or whatever. For example: some older 3Com cards use external transceivers (not on the chip), while newer ones have NWAY capable MII transceivers on the chip. So, you can have: 1. chip + MII 2. NWAY-chip 3. NWAY-chip + MII All MII accesses are done through the serial mdio_* protocol. How should be this handled w.r.t. XCVR_* or is it completely orthogonal? + if ((in.phy_address != out.phy_address) || + (in.transceiver != XCVR_INTERNAL) || + (in.maxtxpkt != out.maxtxpkt) || + (in.maxrxpkt != out.maxrxpkt)) + return -EOPNOTSUPP; ... and here too. + if (advert != mii->advertising) { + bmcr |= BMCR_ANRESTART; + mii->mdio_write(dev, mii->phy_id, MII_ADVERTISE, advert); + mii->advertising = advert; + } + + /* some phys need autoneg dis/enabled separately from other settings */ + if ((bmcr & BMCR_ANENABLE) && (!(mii->bmcr & BMCR_ANENABLE))) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, + mii->bmcr | BMCR_ANENABLE | BMCR_ANRESTART); + bmcr &= ~BMCR_ANRESTART; + } else if ((!(bmcr & BMCR_ANENABLE)) && (mii->bmcr & BMCR_ANENABLE)) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, + mii->bmcr & ~BMCR_ANENABLE); + } This is nice, but I would like to able to restart autonegotiation even without changing any of the advertised capabilities. If I missed this possibility, please point me to it... Nice work! -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Tue Jun 12 10:40:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5CHewC32292 for netdev-outgoing; Tue, 12 Jun 2001 10:40:58 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5CHeuV32288 for ; Tue, 12 Jun 2001 10:40:56 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 8F1341F65; Tue, 12 Jun 2001 13:40:50 -0400 (EDT) Message-ID: <3B265416.58941C3C@mandrakesoft.com> Date: Tue, 12 Jun 2001 13:40:38 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bogdan Costescu Cc: Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" , Linus Torvalds Subject: Re: PATCH: ethtool MII helpers References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 6499 Lines: 153 Bogdan Costescu wrote: > On Sun, 10 Jun 2001, Jeff Garzik wrote: > - I don't know what the long-term plan is about ethtool vs. MII ioctl's. > If you do plan to replace completely the MII ioctl's, there should be a > way to access _all_ MII registers provided by the PHY, even if you do this > in a restricted way (i.e. for CAP_NET_ADMIN only). There is also useful > info in other registers than the 4 you have in your implementation. What are you doing that you need to access all registers from userspace? On to your larger question, ethtool versus MII ioctls. That is an issue that weighs heavily on me. Right now we have quite a bit of deployed code using MII ioctls, and there is a gigabit MII standard; so, Becker's argument is that each driver should provide a set of MII ioctls, emulating behavior when hardware isn't exactly per spec. (yes, right now they are SIOCDEVPRIVATE, but that can be easily changed to SIOCDEVMIIxxx) David's argument is for ethtool, which originally comes out of the sparc port (see include/asm-sparc/ethtool.h in older trees), and has been around for a while, but doesn't enjoy the massive deployment that the MII ioctls enjoy. We have control over the ethtool API, and we can correct its deficiencies, whereas any MII spec deficiencies must be worked out inside the driver. Further, there is the question of "how much MII to implement" -- currently the MII-ioctl-based net drivers all implement -basic- MII, but I guarantee that you will find per-driver(per-chip) differences in the MII implementation... which is a flaw in the MII ioctl implementation in the driver, regardless of how the chip is designed. There are completeness flaws in more than one MII ioctl implementation. Several drivers will return zeroes for the MII id registers, for example. The ethtool API doesn't have that problem. For 2.4, my conclusion is: for drivers that already implement MII ioctls as SIOCDEVPRIVATE, continue to support those ioctls. In addition, support ethtool. For drivers without support for either, just add ethtool support. The patch being discussed will cut down on duplicate code for both cases. Further, for the userland ethtool program, support for MII ioctls will be added soon, so that there will be no need for additional mii-tool or mii-diag tools. For 2.5? I don't know. I am not a visionary. I defer that to Linus and David and Donald and Jamal and Alexey and... I am mainly a maintainer and merge monkey, only implementing new APIs when the needs are blindingly obvious. > - You are proposing some caching for the MII registers. I suppose that you > would like to have this code also working with whatever caching will be > done for MII access that was recently discussed. Wouldn't this produce > double caching under some circumstances ? You misunderstood the code. The "caching" here is whatever is -already- being done by the driver. Many Becker-style drivers cache the advertising value. If such a driver uses the ethtool MII code, that is one less MII read that needs to occur. If the driver author wishes to cache more stuff, they have to do so in the obvious way. struct ethtool_mii_info is only used for helper functions which are only used inside netdrv_ioctl(). > + int speed; /* 10, 100, 1000 or -1 (ask hw) */ > > Please note that the comment specifies 1000, while the code in several > places assumes only 2 possibilities: 10 and 100. planning for the future :) Yes, the code only supports 10/100, as I mentioned in my introductory message. > + if (mii->autoneg < 0) > + autoneg = mii->autoneg = (bmcr & BMCR_ANENABLE) ? 1 : 0; > + else autoneg = mii->autoneg; > > You don't read anything from the hardware at this point. Why do you want > caching ? I don't understand your question. Of course we have read BMCR from the hardware at that point, read the code... > Not related: I know that this comes from David Miller's older work, but > wouldn't be possible to have a more uniform naming scheme ? You have > BMCR_ANENABLE, but you have BMSR_ANEGCAPABLE... capable != enable.. I prefer them different, so I am therefore unmotivated to change anything ;-) > + if (mii->full_duplex < 0) > + full_duplex = mii->full_duplex = > + mii_nway_result(negotiated) & LPA_DUPLEX; > + else full_duplex = mii->full_duplex; > > If autoneg. is disabled, I don't think that you always get useful info in > 'negotiated'. Applies to the next chunk, too. > > + if (mii->speed < 0) { > + if (negotiated & LPA_100) > + speed = mii->speed = 100; > + else > + speed = mii->speed = 10; > + } else > + speed = mii->speed; interesting point, thanks. > + ecmd->transceiver = XCVR_INTERNAL; > > I didn't understand what XCVR_INTERNAL should mean as opposed to > XCVR_EXTERNAL or whatever. It is really up to interpretation of the individual driver author (or in this case mii.c author), because the net core doesn't know nor care about XCVR_xxx. > + if (advert != mii->advertising) { > + bmcr |= BMCR_ANRESTART; > + mii->mdio_write(dev, mii->phy_id, MII_ADVERTISE, advert); > + mii->advertising = advert; > + } > + > + /* some phys need autoneg dis/enabled separately from other settings */ > + if ((bmcr & BMCR_ANENABLE) && (!(mii->bmcr & BMCR_ANENABLE))) { > + mii->mdio_write(dev, mii->phy_id, MII_BMCR, > + mii->bmcr | BMCR_ANENABLE | BMCR_ANRESTART); > + bmcr &= ~BMCR_ANRESTART; > + } else if ((!(bmcr & BMCR_ANENABLE)) && (mii->bmcr & BMCR_ANENABLE)) { > + mii->mdio_write(dev, mii->phy_id, MII_BMCR, > + mii->bmcr & ~BMCR_ANENABLE); > + } > > This is nice, but I would like to able to restart autonegotiation even > without changing any of the advertised capabilities. If I missed this > possibility, please point me to it... no, that is a capability which needs to be added to ethtool. ETHTOOL_RENEG or ETHTOOL_ANRESTART or something. Basically kick the link state machine, whether such a state machine is in the driver or in the MII phy. That's the one big thing that mii-tool can do that ethtool cannot, AFAICS. Jeff -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Tue Jun 12 11:41:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5CIf3R01865 for netdev-outgoing; Tue, 12 Jun 2001 11:41:03 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5CIexV01862 for ; Tue, 12 Jun 2001 11:40:59 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 5D1481F65; Tue, 12 Jun 2001 14:40:58 -0400 (EDT) Message-ID: <3B26622A.BD0852FE@mandrakesoft.com> Date: Tue, 12 Jun 2001 14:40:42 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linux Kernel Mailing List , netdev@oss.sgi.com Cc: Bogdan Costescu , "David S. Miller" , Linus Torvalds Subject: PATCH: ethtool MII helpers (vers 2) References: <3B265416.58941C3C@mandrakesoft.com> Content-Type: multipart/mixed; boundary="------------75A73EB022F069CDE8124CDE" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 13494 Lines: 463 This is a multi-part message in MIME format. --------------75A73EB022F069CDE8124CDE Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Here is an updated version of the ethtool generic MII patch. It fixes a few bugs, and adds the capability to restart autonegotiation by passing AUTONEG_RESTART constant into the existing ETHTOOL_SSET. Note this isn't implemented in the code, just added to the ethtool header. Do not apply, still just for comment and testing. -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | --------------75A73EB022F069CDE8124CDE Content-Type: text/plain; charset=us-ascii; name="mii.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="mii.patch" Index: linux_2_4/include/linux/mii.h diff -u linux_2_4/include/linux/mii.h:1.1.1.1 linux_2_4/include/linux/mii.h:1.1.1.1.52.1 --- linux_2_4/include/linux/mii.h:1.1.1.1 Fri May 11 16:54:44 2001 +++ linux_2_4/include/linux/mii.h Sun Jun 10 10:26:44 2001 @@ -126,6 +126,33 @@ #define CSCONFIG_RESV4 0x4000 /* Unused... */ #define CSCONFIG_NDISABLE 0x8000 /* Disable NRZI */ + + +struct ethtool_mii_info { + struct net_device *dev; /* our net interface */ + void *useraddr; /* userspace addr to which we put data */ + + int phy_id; /* PHY we are addressing */ + + int bmcr; /* cached MII register values. */ + int bmsr; /* -1 means 'undefined', which usually */ + int advertising; /* means the implementation should read */ + int lpa; /* the values from hardware instead. */ + + int autoneg; /* 0 (disabled), 1 (enabled), -1 (ask hw) */ + unsigned int ignore; /* mask of medias we never support, */ + /* such as 100baseT4 */ + int speed; /* 10, 100, 1000 or -1 (ask hw) */ + int full_duplex; /* 0 (no), 1 (yes), -1 (ask hw) */ + unsigned int port; /* PORT_xxx from linux/ethtool.h */ + + int (*mdio_read) (struct net_device *dev, int phy_id, int location); + void (*mdio_write) (struct net_device *dev, int phy_id, int location, int val); +}; + +int mii_ethtool_gset (struct ethtool_mii_info *mii); +int mii_ethtool_sset (struct ethtool_mii_info *mii); + /** * mii_nway_result * @negotiated: value of MII ANAR and'd with ANLPAR Index: linux_2_4/include/linux/ethtool.h diff -u linux_2_4/include/linux/ethtool.h:1.1.1.4 linux_2_4/include/linux/ethtool.h:1.1.1.4.84.2 --- linux_2_4/include/linux/ethtool.h:1.1.1.4 Thu Apr 19 17:55:36 2001 +++ linux_2_4/include/linux/ethtool.h Sun Jun 10 10:56:26 2001 @@ -1,4 +1,4 @@ -/* $Id: ethtool.h,v 1.1.1.4 2001/04/20 00:55:36 jgarzik Exp $ +/* $Id: ethtool.h,v 1.1.1.4.84.2 2001/06/10 17:56:26 jgarzik Exp $ * ethtool.h: Defines for Linux ethtool. * * Copyright (C) 1998 David S. Miller (davem@redhat.com) @@ -34,13 +34,15 @@ char bus_info[32]; /* Bus info for this interface. For PCI * devices, use pci_dev->slot_name. */ char reserved1[32]; - char reserved2[32]; + char reserved2[28]; + u32 regdump_len; /* Amount of data from ETHTOOL_GREGS */ }; /* CMDs currently supported */ #define ETHTOOL_GSET 0x00000001 /* Get settings. */ #define ETHTOOL_SSET 0x00000002 /* Set settings, privileged. */ #define ETHTOOL_GDRVINFO 0x00000003 /* Get driver info. */ +#define ETHTOOL_GREGS 0x00000004 /* Get NIC registers, privileged. */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET @@ -106,5 +108,6 @@ */ #define AUTONEG_DISABLE 0x00 #define AUTONEG_ENABLE 0x01 +#define AUTONEG_RESTART 0x02 /* implies AUTONEG_ENABLE */ #endif /* _LINUX_ETHTOOL_H */ Index: linux_2_4/drivers/net/mii.c diff -u /dev/null linux_2_4/drivers/net/mii.c:1.1.2.4 --- /dev/null Tue Jun 12 11:03:26 2001 +++ linux_2_4/drivers/net/mii.c Sun Jun 10 12:19:43 2001 @@ -0,0 +1,228 @@ +/* + * linux/drivers/net/mii.c + * Copyright 2001 Jeff Garzik + * + * This software may be used and distributed according to the terms + * of the GNU General Public License, incorporated herein by reference. + */ + +#include +#include +#include +#include +#include +#include + + +static int mii_fill_ethtool_cmd (struct net_device *dev, + struct ethtool_mii_info *mii, + struct ethtool_cmd *ecmd) +{ + unsigned int bmsr, bmcr, v, autoneg, advertising, lpa; + unsigned int negotiated, full_duplex, speed; + + memset(ecmd, 0, sizeof(*ecmd)); + + ecmd->cmd = ETHTOOL_GSET; + + if (mii->bmcr < 0) + bmcr = mii->bmcr = mii->mdio_read(dev, mii->phy_id, MII_BMCR); + else bmcr = mii->bmcr; + if (bmcr == 0xffff) + return -EIO; + + if (mii->bmsr < 0) + bmsr = mii->bmsr = mii->mdio_read(dev, mii->phy_id, MII_BMSR); + else bmsr = mii->bmsr; + if (bmsr == 0xffff) + return -EIO; + + if (mii->advertising < 0) + advertising = mii->advertising = + mii->mdio_read(dev, mii->phy_id, MII_ADVERTISE); + else advertising = mii->advertising; + if (advertising == 0xffff) + return -EIO; + + if (mii->lpa < 0) + lpa = mii->lpa = mii->mdio_read(dev, mii->phy_id, MII_LPA); + else lpa = mii->lpa; + if (lpa == 0xffff) + return -EIO; + + negotiated = advertising & lpa; + + if (mii->autoneg < 0) + autoneg = mii->autoneg = (bmcr & BMCR_ANENABLE) ? 1 : 0; + else autoneg = mii->autoneg; + + if (mii->full_duplex < 0) + full_duplex = mii->full_duplex = + (mii_nway_result(negotiated) & LPA_DUPLEX) ? 1 : 0; + else full_duplex = mii->full_duplex; + + if (mii->speed < 0) { + if (negotiated & LPA_100) + speed = mii->speed = 100; + else + speed = mii->speed = 10; + } else + speed = mii->speed; + + ecmd->supported = SUPPORTED_MII; + v = bmsr & ~mii->ignore; + if (v & BMSR_10HALF) + ecmd->supported |= SUPPORTED_10baseT_Half; + if (v & BMSR_10FULL) + ecmd->supported |= SUPPORTED_10baseT_Full; + if (v & BMSR_100HALF) + ecmd->supported |= SUPPORTED_100baseT_Half; + if (v & BMSR_100FULL) + ecmd->supported |= SUPPORTED_100baseT_Full; + if (bmsr & BMSR_ANEGCAPABLE) + ecmd->supported |= SUPPORTED_Autoneg; + else + autoneg = mii->autoneg = 0; + + ecmd->advertising = ADVERTISED_MII; + v = advertising & ~mii->ignore; + if (v & ADVERTISE_10HALF) + ecmd->advertising |= ADVERTISED_10baseT_Half; + if (v & ADVERTISE_10FULL) + ecmd->advertising |= ADVERTISED_10baseT_Full; + if (v & ADVERTISE_100HALF) + ecmd->advertising |= ADVERTISED_100baseT_Half; + if (v & ADVERTISE_100FULL) + ecmd->advertising |= ADVERTISED_100baseT_Full; + if (autoneg) { + ecmd->advertising |= ADVERTISED_Autoneg; + ecmd->autoneg = AUTONEG_ENABLE; + } else + ecmd->autoneg = AUTONEG_DISABLE; + + ecmd->speed = speed == 100 ? SPEED_100 : SPEED_10; + ecmd->duplex = full_duplex ? DUPLEX_FULL : DUPLEX_HALF; + ecmd->port = PORT_MII; + ecmd->phy_address = mii->phy_id; + ecmd->transceiver = XCVR_INTERNAL; + + return 0; +} + +int mii_ethtool_gset (struct ethtool_mii_info *mii) +{ + struct ethtool_cmd ecmd; + int rc; + + if (mii->port != PORT_MII) + return -EOPNOTSUPP; + + rc = mii_fill_ethtool_cmd(mii->dev, mii, &ecmd); + if (rc) + return rc; + + if (copy_to_user(mii->useraddr, &ecmd, sizeof(ecmd))) + return -EFAULT; + + return 0; +} + +int mii_ethtool_sset (struct ethtool_mii_info *mii) +{ + struct net_device *dev = mii->dev; + struct ethtool_cmd in, out; + unsigned int advert, bmcr; + int rc; + + if (copy_from_user (&in, mii->useraddr, sizeof (in))) + return -EFAULT; + rc = mii_fill_ethtool_cmd (dev, mii, &out); + if (rc) + return rc; + + if (in.port != out.port) { + if (copy_to_user(mii->useraddr, &in, sizeof(in))) + return -EFAULT; + mii->port = in.port; + return 0; + } + + /* we don't support changing phy address, tranceiver, + * or the interrupt mitigation stuff. + */ + if ((in.phy_address != out.phy_address) || + (in.transceiver != XCVR_INTERNAL) || + (in.maxtxpkt != out.maxtxpkt) || + (in.maxrxpkt != out.maxrxpkt)) + return -EOPNOTSUPP; + + advert = mii->advertising & ~ADVERTISE_ALL; + + /* NWAY autonegotiation enabled */ + if (in.autoneg == AUTONEG_ENABLE) { + bmcr = mii->bmcr | BMCR_ANENABLE; + + if (in.advertising & ADVERTISED_10baseT_Half) + advert |= ADVERTISE_10HALF; + if (in.advertising & ADVERTISED_10baseT_Full) + advert |= ADVERTISE_10FULL; + if (in.advertising & ADVERTISED_100baseT_Half) + advert |= ADVERTISE_100HALF; + if (in.advertising & ADVERTISED_100baseT_Full) + advert |= ADVERTISE_100FULL; + if (advert == (mii->advertising & ~ADVERTISE_ALL)) + return -EINVAL; + } + + /* NWAY autonegotiation disabled */ + else { + bmcr = mii->bmcr & ~BMCR_ANENABLE; + + if (in.speed == SPEED_100) + bmcr |= BMCR_SPEED100; + else bmcr &= ~BMCR_SPEED100; + + if (in.duplex == DUPLEX_FULL) + bmcr |= BMCR_FULLDPLX; + else bmcr &= ~BMCR_FULLDPLX; + + if (mii->bmsr & BMSR_10HALF) + advert |= ADVERTISE_10HALF; + if (mii->bmsr & BMSR_10FULL) + advert |= ADVERTISE_10FULL; + if (mii->bmsr & BMSR_100HALF) + advert |= ADVERTISE_100HALF; + if (mii->bmsr & BMSR_100FULL) + advert |= ADVERTISE_100FULL; + } + + if (advert != mii->advertising) { + bmcr |= BMCR_ANRESTART; + mii->mdio_write(dev, mii->phy_id, MII_ADVERTISE, advert); + mii->advertising = advert; + } + + /* some phys need autoneg dis/enabled separately from other settings */ + if ((bmcr & BMCR_ANENABLE) && (!(mii->bmcr & BMCR_ANENABLE))) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, + mii->bmcr | BMCR_ANENABLE | BMCR_ANRESTART); + bmcr &= ~BMCR_ANRESTART; + } else if ((!(bmcr & BMCR_ANENABLE)) && (mii->bmcr & BMCR_ANENABLE)) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, + mii->bmcr & ~BMCR_ANENABLE); + } + + if (bmcr != mii->bmcr) { + mii->mdio_write(dev, mii->phy_id, MII_BMCR, bmcr); + bmcr &= ~BMCR_ANRESTART; + mii->bmcr = bmcr; + } + + if (copy_to_user(mii->useraddr, &out, sizeof(out))) + return -EFAULT; + + return 0; +} + +EXPORT_SYMBOL(mii_ethtool_gset); +EXPORT_SYMBOL(mii_ethtool_sset); Index: linux_2_4/drivers/net/epic100.c diff -u linux_2_4/drivers/net/epic100.c:1.1.1.35 linux_2_4/drivers/net/epic100.c:1.1.1.35.42.4 --- linux_2_4/drivers/net/epic100.c:1.1.1.35 Sat May 19 18:56:00 2001 +++ linux_2_4/drivers/net/epic100.c Sun Jun 10 12:42:37 2001 @@ -45,13 +45,16 @@ * { fill me in } LK1.1.8: - * ethtool support (jgarzik) + * ethtool driver info support (jgarzik) + LK1.1.9: + * ethtool media get/set support (jgarzik) + */ #define DRV_NAME "epic100" -#define DRV_VERSION "1.11+LK1.1.8" -#define DRV_RELDATE "May 18, 2001" +#define DRV_VERSION "1.11+LK1.1.9" +#define DRV_RELDATE "June 10, 2001" /* The user-configurable values. @@ -116,6 +119,7 @@ #include #include #include +#include #include #include #include @@ -135,6 +139,11 @@ MODULE_PARM(rx_copybreak, "i"); MODULE_PARM(options, "1-" __MODULE_STRING(MAX_UNITS) "i"); MODULE_PARM(full_duplex, "1-" __MODULE_STRING(MAX_UNITS) "i"); +MODULE_PARM_DESC(debug, "EPIC/100 debug level (0-5)"); +MODULE_PARM_DESC(max_interrupt_work, "EPIC/100 maximum events handled per interrupt"); +MODULE_PARM_DESC(options, "EPIC/100: Bits 0-3: media type, bit 4: full duplex"); +MODULE_PARM_DESC(rx_copybreak, "EPIC/100 copy breakpoint for copy-only-tiny-frames"); +MODULE_PARM_DESC(full_duplex, "EPIC/100 full duplex setting(s) (1)"); /* Theory of Operation @@ -1169,7 +1178,7 @@ if (pkt_len > PKT_BUF_SZ - 4) { printk(KERN_ERR "%s: Oversized Ethernet frame, status %x " "%d bytes.\n", - dev->name, pkt_len, status); + dev->name, status, pkt_len); pkt_len = 1514; } /* Check if the packet is long enough to accept without copying @@ -1344,27 +1353,64 @@ return; } -static int netdev_ethtool_ioctl(struct net_device *dev, void *useraddr) +static int netdev_ethtool_ioctl (struct net_device *dev, void *useraddr) { struct epic_private *np = dev->priv; u32 ethcmd; - - if (copy_from_user(ðcmd, useraddr, sizeof(ethcmd))) + + if (copy_from_user (ðcmd, useraddr, sizeof (ethcmd))) return -EFAULT; + + switch (ethcmd) { + case ETHTOOL_GDRVINFO: + { + struct ethtool_drvinfo info = { ETHTOOL_GDRVINFO }; + strcpy (info.driver, DRV_NAME); + strcpy (info.version, DRV_VERSION); + strcpy (info.bus_info, np->pci_dev->slot_name); + if (copy_to_user (useraddr, &info, sizeof (info))) + return -EFAULT; + return 0; + } + + case ETHTOOL_GSET: + case ETHTOOL_SSET: + { + struct ethtool_mii_info info = { + dev: dev, + useraddr: useraddr, + phy_id: np->phys[0], + bmcr: -1, + bmsr: -1, + lpa: -1, + advertising: np->advertising, + autoneg: -1, + ignore: ADVERTISE_100BASE4, + speed: -1, + full_duplex: np->full_duplex, + port: PORT_MII, + mdio_read: mdio_read, + mdio_write: mdio_write, + }; + int rc; + + if (ethcmd == ETHTOOL_GSET) + rc = mii_ethtool_gset (&info); + else + rc = mii_ethtool_sset (&info); - switch (ethcmd) { - case ETHTOOL_GDRVINFO: { - struct ethtool_drvinfo info = {ETHTOOL_GDRVINFO}; - strcpy(info.driver, DRV_NAME); - strcpy(info.version, DRV_VERSION); - strcpy(info.bus_info, np->pci_dev->slot_name); - if (copy_to_user(useraddr, &info, sizeof(info))) - return -EFAULT; - return 0; + np->advertising = info.advertising; + np->full_duplex = info.full_duplex; + + check_media (dev); + + return rc; + } + + default: + break; } - } - return -EOPNOTSUPP; } --------------75A73EB022F069CDE8124CDE-- From owner-netdev@oss.sgi.com Tue Jun 12 17:55:14 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D0tEr05938 for netdev-outgoing; Tue, 12 Jun 2001 17:55:14 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D0tBP05932 for ; Tue, 12 Jun 2001 17:55:12 -0700 Received: (qmail 18959 invoked by uid 99); 13 Jun 2001 00:55:07 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 13 Jun 2001 00:55:07 -0000 Received: by fred.muc.de (Postfix, from userid 500) id E857DE2D4F; Wed, 13 Jun 2001 02:21:21 +0200 (CEST) Date: Wed, 13 Jun 2001 02:21:21 +0200 From: Andi Kleen To: Peter Bieringer Cc: Maillist netdev , Andi Kleen Subject: Re: IPv6+2.4.x: ipv6_local_port_range implementation plans + netfilter6 Message-ID: <20010613022121.C3926@fred.local> References: <20010603132942.A2582@fred.local> <12580000.991764372@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <12580000.991764372@localhost>; from pb@bieringer.de on Tue, Jun 05, 2001 at 08:06:12PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1061 Lines: 33 [Just back from vacation, sorry for late answer] On Tue, Jun 05, 2001 at 08:06:12PM +0200, Peter Bieringer wrote: > --On Sunday, June 03, 2001 01:29:42 PM +0200 Andi Kleen > wrote: > > > On Sat, Jun 02, 2001 at 11:03:24AM +0200, Peter Bieringer wrote: > >> Hi all, > >> > >> are there any plans to implement "ipv6_local_port_range" in the > >> future like on IPv4? > > > > The IPv4 sysctl is shared between IPv4 and IPv6, because v4 and v6 > > share a common port space. > Thanks for reply. > > Two more questions: > > 1) exists there any documentation beside the source code itself which > "/proc/sys/net/ipv4" values will be also used for IPv6? No, you have to do RTFS currently. > > 2) are there any plans for 2.5 or later to split off common used proc > switches to another directory like "/proc/sys/net/ip"? There was a > thread sometimes ago relating 'howto make IPv4 as module' which can > be take advantage of such split off (I'm thinking about IPv6 only > clients with Linux network stack)... I know of no such plans. -Andi From owner-netdev@oss.sgi.com Tue Jun 12 18:30:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D1U0D09921 for netdev-outgoing; Tue, 12 Jun 2001 18:30:00 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D1TwP09907 for ; Tue, 12 Jun 2001 18:29:59 -0700 Received: (qmail 19053 invoked by uid 99); 13 Jun 2001 01:29:57 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 13 Jun 2001 01:29:57 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 07A2FE2D4C; Wed, 13 Jun 2001 03:19:30 +0200 (CEST) Date: Wed, 13 Jun 2001 03:19:29 +0200 From: Andi Kleen To: Manfred Spraul Cc: netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints Message-ID: <20010613031929.A5323@fred.local> References: <3B238B31.38F6D3ED@colorfullife.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <3B238B31.38F6D3ED@colorfullife.com>; from manfred@colorfullife.com on Sun, Jun 10, 2001 at 04:58:57PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 431 Lines: 11 On Sun, Jun 10, 2001 at 04:58:57PM +0200, Manfred Spraul wrote: > Several cheap busmaster nics only accept tx buffers that are 32-bit > aligned. > > Currently they memcpy into transfer buffers. What about replacing that > memcpy with csum_copy_partial_nocheck and enabling NETIF_F_{SG,HW_CSUM}? [...] I'll probably not give you much gain in 2.4 anymore. Both TCP and UDP do csum and copy to user in most fast path cases. -Andi From owner-netdev@oss.sgi.com Tue Jun 12 18:30:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D1U1f09934 for netdev-outgoing; Tue, 12 Jun 2001 18:30:01 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D1U0P09918 for ; Tue, 12 Jun 2001 18:30:00 -0700 Received: (qmail 19055 invoked by uid 99); 13 Jun 2001 01:29:57 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 13 Jun 2001 01:29:57 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 8C92DE2D50; Wed, 13 Jun 2001 03:38:32 +0200 (CEST) Date: Wed, 13 Jun 2001 03:38:32 +0200 From: Andi Kleen To: =?iso-8859-1?Q?Ram=F3n_Ag=FCero?= Cc: netdev@oss.sgi.com Subject: Re: TCP and SACK retransmissions Message-ID: <20010613033832.D5323@fred.local> References: <006701c0ee7f$892f2740$1bba90c1@tlmat.unican.es> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0.1i In-Reply-To: <006701c0ee7f$892f2740$1bba90c1@tlmat.unican.es>; from ramon@tlmat.unican.es on Wed, Jun 06, 2001 at 01:55:08PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 895 Lines: 12 [Please use line breaks every 80 characters when writing email] On Wed, Jun 06, 2001 at 01:55:08PM +0200, Ramón Agüero wrote: > When a Duplicate ACK arrives, the tcp_fast_retrans function is called. Aparentely, this function does not trigger any retransmission unless tp->dup_acks == 3 or tp->fackets_out > 3. In some ocassions this is the behaviour I see (by tcpdump captures), but in other cases, the first dupack triggers a retransmission, although the number of sacked segments is only two. I have tried to see why this retransmission is trigerred, but I can't find it. Can anybody put some light in this tunnel :-) ? You're probably seeing the Hoe heuristic, which extends Fast Retransmit to fix more than a single lost packet per RTT. See Janey Hoe, "Improving the startup behavior of a congestion control scheme for tcp". It is a variant of NewReno as described in RFC2582. -Andi From owner-netdev@oss.sgi.com Tue Jun 12 18:30:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D1U2B09938 for netdev-outgoing; Tue, 12 Jun 2001 18:30:02 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D1U0P09919 for ; Tue, 12 Jun 2001 18:30:00 -0700 Received: (qmail 19056 invoked by uid 99); 13 Jun 2001 01:29:57 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 13 Jun 2001 01:29:57 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 7F11BE2D4E; Wed, 13 Jun 2001 03:21:26 +0200 (CEST) Date: Wed, 13 Jun 2001 03:21:26 +0200 From: Andi Kleen To: Richard Guy Briggs Cc: netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. Message-ID: <20010613032126.B5323@fred.local> References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca>; from rgb@conscoop.ottawa.on.ca on Wed, Jun 06, 2001 at 08:11:45PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 513 Lines: 18 On Wed, Jun 06, 2001 at 08:11:45PM +0200, Richard Guy Briggs wrote: > Hi again, > > If this is an FAQ, can someone point me to the reasons that skb_push() > and skb_put() panic rather than dropping the skb and complaining in the > log? > > If not, why does it do that? Because an overflow or underflow is always a bug in the code. If you're not sure if the skb has enough room you have to use *_expand_headroom and friends. -Andi -- Life would be so much easier if we could just look at the source code. From owner-netdev@oss.sgi.com Tue Jun 12 18:30:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D1U2109945 for netdev-outgoing; Tue, 12 Jun 2001 18:30:02 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D1U0P09920 for ; Tue, 12 Jun 2001 18:30:00 -0700 Received: (qmail 19057 invoked by uid 99); 13 Jun 2001 01:29:57 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 13 Jun 2001 01:29:57 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 83ABAE2D4F; Wed, 13 Jun 2001 03:30:19 +0200 (CEST) Date: Wed, 13 Jun 2001 03:30:19 +0200 From: Andi Kleen To: Richard Guy Briggs Cc: netdev@oss.sgi.com Subject: Re: dst cache cleared on netdev down? Message-ID: <20010613033019.C5323@fred.local> References: <20010606135149.I31244@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20010606135149.I31244@grendel.conscoop.ottawa.on.ca>; from rgb@conscoop.ottawa.on.ca on Wed, Jun 06, 2001 at 07:51:49PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1775 Lines: 43 On Wed, Jun 06, 2001 at 07:51:49PM +0200, Richard Guy Briggs wrote: > Hi all, > > I'm seeing oopses possibly coming from the attempted use of a dst chache > entry after a device has been downed. > > Can someone affirm that when a device goes down, it takes out all the > routing table entries for that device and it also takes out all the dst > cache entries for that device? When an IP address is deleted the routing cache is flushed after some delay. This will remove all dst_entries in it that do have a zero reference count. When you're relying on a dst_entry with zero reference count that's probably a bug. > I am now getting oopses > in > neigh_connected_output() at 6f/b0, which could be dev->hard_header() or > neigh->ops->queue_xmit(). If fact, I suspect neigh->ha, but don't know > for certain. Is it possible that neigh->ha is bugus when it tries to > evaluate it before calling dev->hard_header? I assume that the three > assignments in the variable declarations are protected by the compiler > and don't need to be in the body of the code to be checked before > assignment? If any one of the variables from which they point are null, > it will not cause an oops? When a physical device goes down its neighbours with zero reference count get deleted. When you have a virtual interface the neighbours it sees should be for the virtual interface though. > > > Is this a bug in neigh_connected_output(), the way we are using it, or > the way we are attempting to clean up after the physical device goes > down? I guess you're messing up reference counts somewhere, so data structures get deleted under you. -Andi -- Life would be so much easier if we could just look at the source code. From owner-netdev@oss.sgi.com Tue Jun 12 18:30:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D1U6M09958 for netdev-outgoing; Tue, 12 Jun 2001 18:30:06 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D1TxP09909 for ; Tue, 12 Jun 2001 18:29:59 -0700 Received: (qmail 19054 invoked by uid 99); 13 Jun 2001 01:29:57 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 13 Jun 2001 01:29:57 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 3A9DEE2D51; Wed, 13 Jun 2001 03:40:22 +0200 (CEST) Date: Wed, 13 Jun 2001 03:40:22 +0200 From: Andi Kleen To: "Snyder, Ryan" Cc: "'netdev@oss.sgi.com'" Subject: Re: arp cache issue Message-ID: <20010613034022.E5323@fred.local> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from RSnyder@admin.nmt.edu on Tue, Jun 05, 2001 at 09:00:48PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 923 Lines: 29 On Tue, Jun 05, 2001 at 09:00:48PM +0200, Snyder, Ryan wrote: > Hello, > I was wondering if any one can help me, I received this email address from > Alan Cox. > > I am running CheckPoint Firewall under Linux 2.2.19. The Firewall is > working fine, > but on the interface that is connected to the Internet via a Cisco router > has over 950 > entries in the arp cache. I understand this is normal, but since there is > only one > route to the Internet, is there a way to not have Linux to an arp cache > lookup, or even > a setting to make the cache size much bigger? > > I have looked into running arpd, but I am kinda fuzzy about running daemon > stuff in > userspace; espically on a firewall. > > Any help is greatly appreciated. You can tune the ARP cache size using the appropiate sysctls. See arp(7) for more information. -Andi -- Life would be so much easier if we could just look at the source code. From owner-netdev@oss.sgi.com Tue Jun 12 23:28:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D6Sjr16136 for netdev-outgoing; Tue, 12 Jun 2001 23:28:45 -0700 Received: from colorfullife.com (colorfullife.com [216.156.138.34]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D6SiP16132 for ; Tue, 12 Jun 2001 23:28:44 -0700 Received: from dbl.localdomain (localhost [127.0.0.1]) by colorfullife.com (8.11.2/8.11.2) with ESMTP id f5D6Vvq28052; Wed, 13 Jun 2001 02:31:58 -0400 Received: from colorfullife.com (gw.cat5.localdomain [172.17.0.1]) by dbl.localdomain (8.11.2/8.11.2) with ESMTP id f5D6Sji14966; Wed, 13 Jun 2001 08:28:45 +0200 Message-ID: <3B27081D.80DE795@colorfullife.com> Date: Wed, 13 Jun 2001 08:28:45 +0200 From: Manfred Spraul X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: netdev@oss.sgi.com Subject: Re: Q: (ab)using zerocopy for drivers with alignment contraints References: <3B238B31.38F6D3ED@colorfullife.com> <20010613031929.A5323@fred.local> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 941 Lines: 26 Andi Kleen wrote: > > On Sun, Jun 10, 2001 at 04:58:57PM +0200, Manfred Spraul wrote: > > Several cheap busmaster nics only accept tx buffers that are 32-bit ^^^^^^^^^^^^ > > aligned. > > > > Currently they memcpy into transfer buffers. What about replacing that > > memcpy with csum_copy_partial_nocheck and enabling NETIF_F_{SG,HW_CSUM}? > > [...] I'll probably not give you much gain in 2.4 anymore. Both TCP and UDP > do csum and copy to user in most fast path cases. > It's an improvement for the tx codepath: If an application uses sendfile with an 8139too [or via-rhine,...] nic then currently 2 copies are made: 1) copy_and_csum into skb->data 2) memcopy from skb->data to an aligned transfer buffer with skb_copy_datagram_kernel() only one copy is needed * networking core gives a fragmented & uncsumed packet to the driver. * copy_and_csum into aligned transfer buffer. -- Manfred From owner-netdev@oss.sgi.com Wed Jun 13 08:28:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DFSMu10232 for netdev-outgoing; Wed, 13 Jun 2001 08:28:22 -0700 Received: from shell.cyberus.ca (shell.cyberus.ca [209.195.95.7]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DFSLP10229 for ; Wed, 13 Jun 2001 08:28:21 -0700 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id LAA19829; Wed, 13 Jun 2001 11:26:32 -0400 (EDT) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Wed, 13 Jun 2001 11:26:31 -0400 (EDT) From: jamal To: cc: Subject: FYI: ECN approved as Standard Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 165 Lines: 10 The IESG approved ECN as a proposed standard on the 12th of June. That means as of now, anyone blocking ECN bits is considered to be blaspheming. cheers, jamal From owner-netdev@oss.sgi.com Wed Jun 13 08:30:14 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DFUEj10430 for netdev-outgoing; Wed, 13 Jun 2001 08:30:14 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DFUAP10423 for ; Wed, 13 Jun 2001 08:30:10 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f5DFTug26287; Wed, 13 Jun 2001 17:29:56 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id RAA31014; Wed, 13 Jun 2001 17:29:55 +0200 Date: Wed, 13 Jun 2001 17:29:55 +0200 (CEST) From: Bogdan Costescu To: Jeff Garzik cc: Bogdan Costescu , Linux Kernel Mailing List , , "David S. Miller" , Linus Torvalds Subject: Re: PATCH: ethtool MII helpers In-Reply-To: <3B265416.58941C3C@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5357 Lines: 127 On Tue, 12 Jun 2001, Jeff Garzik wrote: > What are you doing that you need to access all registers from userspace? The main purpose is debugging. Some registers give you more detailed info about what's going on; you don't need this for normal functioning, but when you're looking for "what's wrong" it might give you additional details (next page, transceiver manufacturer/model for hardware bugs, etc.). That's why I proposed restricted access, a normal user shouldn't need this info, but the sysadmin might. Probably the same info can be taken from the card inside the kernel, convert it to some user parsable form and give it to user through /proc or something similar. But I think that the implementation would be much more complicated than allowing direct access to MII registers. One other argument: mii-diag currently allows this. ethtool would mean a step backwards 8-) > Right now we have quite a bit of > deployed code using MII ioctls, and there is a gigabit MII standard; so, > Becker's argument is that each driver should provide a set of MII > ioctls, emulating behavior when hardware isn't exactly per spec. I'm more-or-less supporting this oppinion. But I use Don's mii-diag heavily in debugging media-related problems for 3c59x, so I might be biased 8-) > We have control over the ethtool API, and we can > correct its deficiencies, whereas any MII spec deficiencies must be > worked out inside the driver. I agree that this is a problem. Even more, if you start emulating MII for non-MII NICs, you might get into trouble when presenting the available info in a MII-compatible way (f.e. how would you emulate "link partner ability" ?). But the NIC specific deficiencies should be worked out inside the driver, isn't this always the case with a common API ? > Further, there is the question of "how much MII to implement" -- > currently the MII-ioctl-based net drivers all implement -basic- MII, but > I guarantee that you will find per-driver(per-chip) differences in the > MII implementation... which is a flaw in the MII ioctl implementation in > the driver, regardless of how the chip is designed. I take that you mean by implementing basic MII that the drivers don't take advantage of the additional info when dealing with media settings. The drivers do allow user access to all MII registers when available. Per-driver or per-chip differences mean that the driver author didn't do a good job at emulating MII 8-) > There are completeness flaws in more than one MII ioctl implementation. IMHO, this is only because there is no agreed-upon standard. But this can be corrected. What prevents ethtool implementations from being flawed ? > The ethtool API doesn't have that problem. Well, IMHO you can't directly compare the two. MII has hardware support, so for a MII-capable NIC, you usually just handle access to the registers. ethtool is software only and you emulate everything; if you would also partly emulate MII (where you need to), you are in the same situation. > For drivers without support for either, just add ethtool support. Well, that's my point. You need to write code in both cases. So why do you choose ethtool ? > For 2.5? I don't know. I am not a visionary. I defer that to Linus > and David and Donald and Jamal and Alexey and... I am mainly a > maintainer and merge monkey, only implementing new APIs when the needs > are blindingly obvious. I don't want to push anything. But when oppinions start to diverge, there will always be (from all sides!) something like: "my version can do this, but yours can't". So I'm all for _one_ way of doing things. > You misunderstood the code. The "caching" here is whatever is -already- > being done by the driver. Many Becker-style drivers cache the > advertising value. If such a driver uses the ethtool MII code, that is > one less MII read that needs to occur. No, I was talking mainly about 'bmcr' and 'bmsr'. I'm not aware of any driver that caches these values currently. > > + if (mii->autoneg < 0) > > + autoneg = mii->autoneg = (bmcr & BMCR_ANENABLE) ? 1 : 0; > > + else autoneg = mii->autoneg; > > > > You don't read anything from the hardware at this point. Why do you want > > caching ? > > I don't understand your question. Of course we have read BMCR from the > hardware at that point, read the code... My question was directly related to caching of 'autoneg'. You need to read 'bmcr' before, sure, but why not directly "computing" autoneg instead of the "if" ? What do you achieve by setting autoneg to potentially something else than the actual BMCR setting ? > It is really up to interpretation of the individual driver author (or in > this case mii.c author), because the net core doesn't know nor care > about XCVR_xxx. Yes, but it might make a difference for debugging too. For the example that I gave, it really helps knowing which of the 2 MII transceivers on the card is used. So, this info might need to be propagated as exactly as possible even to user space. And probably needs to be driver-specific, not in mii.c, anyway. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Wed Jun 13 08:39:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DFd6H11488 for netdev-outgoing; Wed, 13 Jun 2001 08:39:06 -0700 Received: from localhost.localdomain (cpu2747.adsl.bellglobal.com [207.236.55.216]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DFd5P11472 for ; Wed, 13 Jun 2001 08:39:05 -0700 Received: (from rgb@localhost) by localhost.localdomain (8.12.0.Beta5/8.11.1) id f5DFwmhj019900; Wed, 13 Jun 2001 11:58:48 -0400 Date: Wed, 13 Jun 2001 11:58:48 -0400 From: Richard Guy Briggs To: Andi Kleen Cc: Richard Guy Briggs , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. Message-ID: <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010613032126.B5323@fred.local>; from ak@muc.de on Wed, Jun 13, 2001 at 03:21:26AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1045 Lines: 28 On Wed, Jun 13, 2001 at 03:21:26AM +0200, Andi Kleen wrote: > On Wed, Jun 06, 2001 at 08:11:45PM +0200, Richard Guy Briggs wrote: > > Hi again, > > > > If this is an FAQ, can someone point me to the reasons that skb_push() > > and skb_put() panic rather than dropping the skb and complaining in the > > log? > > > > If not, why does it do that? > > Because an overflow or underflow is always a bug in the code. If you're > not sure if the skb has enough room you have to use *_expand_headroom and > friends. ...so checking for it is considered debugging code and thus, overhead that need not be in a production system? > -Andi > > -- > Life would be so much easier if we could just look at the source code. slainte mhath, RGB -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: From owner-netdev@oss.sgi.com Wed Jun 13 08:41:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DFfIi11840 for netdev-outgoing; Wed, 13 Jun 2001 08:41:18 -0700 Received: from localhost.localdomain (cpu2747.adsl.bellglobal.com [207.236.55.216]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DFfHP11833 for ; Wed, 13 Jun 2001 08:41:17 -0700 Received: (from rgb@localhost) by localhost.localdomain (8.12.0.Beta5/8.11.1) id f5DG10uh019922; Wed, 13 Jun 2001 12:01:00 -0400 Date: Wed, 13 Jun 2001 12:01:00 -0400 From: Richard Guy Briggs To: Andi Kleen Cc: Richard Guy Briggs , netdev@oss.sgi.com Subject: Re: dst cache cleared on netdev down? Message-ID: <20010613120100.B19894@grendel.conscoop.ottawa.on.ca> References: <20010606135149.I31244@grendel.conscoop.ottawa.on.ca> <20010613033019.C5323@fred.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010613033019.C5323@fred.local>; from ak@muc.de on Wed, Jun 13, 2001 at 03:30:19AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2442 Lines: 57 On Wed, Jun 13, 2001 at 03:30:19AM +0200, Andi Kleen wrote: > On Wed, Jun 06, 2001 at 07:51:49PM +0200, Richard Guy Briggs wrote: > > Hi all, > > > > I'm seeing oopses possibly coming from the attempted use of a dst chache > > entry after a device has been downed. > > > > Can someone affirm that when a device goes down, it takes out all the > > routing table entries for that device and it also takes out all the dst > > cache entries for that device? > > When an IP address is deleted the routing cache is flushed after some delay. > This will remove all dst_entries in it that do have a zero reference count. > When you're relying on a dst_entry with zero reference count that's probably > a bug. Understood, thanks. > > I am now getting oopses > > in > > neigh_connected_output() at 6f/b0, which could be dev->hard_header() or > > neigh->ops->queue_xmit(). If fact, I suspect neigh->ha, but don't know > > for certain. Is it possible that neigh->ha is bugus when it tries to > > evaluate it before calling dev->hard_header? I assume that the three > > assignments in the variable declarations are protected by the compiler > > and don't need to be in the body of the code to be checked before > > assignment? If any one of the variables from which they point are null, > > it will not cause an oops? > > When a physical device goes down its neighbours with zero reference count > get deleted. When you have a virtual interface the neighbours it sees should > be for the virtual interface though. I'm not quite sure what the neighbours of a virtual device would be, since it passes the packet to a physical device before being stuffed out.... > > Is this a bug in neigh_connected_output(), the way we are using it, or > > the way we are attempting to clean up after the physical device goes > > down? > > I guess you're messing up reference counts somewhere, so data structures > get deleted under you. This sounds like the problem. Thanks. > -Andi > > -- > Life would be so much easier if we could just look at the source code. slainte mhath, RGB -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: From owner-netdev@oss.sgi.com Wed Jun 13 08:52:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DFqSK13475 for netdev-outgoing; Wed, 13 Jun 2001 08:52:28 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DFqRP13471 for ; Wed, 13 Jun 2001 08:52:27 -0700 Received: by colin.muc.de id <140630-3>; Wed, 13 Jun 2001 17:52:42 +0200 Message-ID: <20010613175238.26016@colin.muc.de> Date: Wed, 13 Jun 2001 17:52:38 +0200 From: Andi Kleen To: Richard Guy Briggs Cc: Andi Kleen , netdev@oss.sgi.com Subject: Re: dst cache cleared on netdev down? References: <20010606135149.I31244@grendel.conscoop.ottawa.on.ca> <20010613033019.C5323@fred.local> <20010613120100.B19894@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <20010613120100.B19894@grendel.conscoop.ottawa.on.ca>; from Richard Guy Briggs on Wed, Jun 13, 2001 at 06:01:00PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 449 Lines: 12 On Wed, Jun 13, 2001 at 06:01:00PM +0200, Richard Guy Briggs wrote: > I'm not quite sure what the neighbours of a virtual device would be, > since it passes the packet to a physical device before being stuffed > out.... Just the kernel doesn't know that. When you have a hard header routine in your device you'll also get neighbours. What happens after your hard_start_xmit the stack doesn't care about anymore because you own the packet. -Andi From owner-netdev@oss.sgi.com Wed Jun 13 09:15:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DGFog14602 for netdev-outgoing; Wed, 13 Jun 2001 09:15:50 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DGFmP14599 for ; Wed, 13 Jun 2001 09:15:49 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f5DGFe403550; Wed, 13 Jun 2001 09:15:40 -0700 Message-ID: <3B2791AC.7EAEC35A@candelatech.com> Date: Wed, 13 Jun 2001 09:15:40 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Richard Guy Briggs CC: Andi Kleen , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1296 Lines: 32 Richard Guy Briggs wrote: > > On Wed, Jun 13, 2001 at 03:21:26AM +0200, Andi Kleen wrote: > > On Wed, Jun 06, 2001 at 08:11:45PM +0200, Richard Guy Briggs wrote: > > > Hi again, > > > > > > If this is an FAQ, can someone point me to the reasons that skb_push() > > > and skb_put() panic rather than dropping the skb and complaining in the > > > log? > > > > > > If not, why does it do that? > > > > Because an overflow or underflow is always a bug in the code. If you're > > not sure if the skb has enough room you have to use *_expand_headroom and > > friends. > > ...so checking for it is considered debugging code and thus, overhead > that need not be in a production system? In certain cases you may have to clone and grow the skb. But, since that is inneficient, you should only do that in wierd boundary cases at most, and in general, you should make sure the skb has enough reserved space from the beginning. netdevice.h has some macro magic that figures out the max-header (ie what to reserve). If you can determine how much you need, you can change it there... Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Wed Jun 13 09:48:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DGmxp15806 for netdev-outgoing; Wed, 13 Jun 2001 09:48:59 -0700 Received: from vaio.greennet (battlejitney.wdhq.scyld.com [216.254.93.178]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DGmvP15803 for ; Wed, 13 Jun 2001 09:48:57 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id MAA10148; Wed, 13 Jun 2001 12:56:57 -0400 Date: Wed, 13 Jun 2001 12:56:57 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Jeff Garzik cc: Bogdan Costescu , Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" , Linus Torvalds Subject: Re: PATCH: ethtool MII helpers In-Reply-To: <3B265416.58941C3C@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 6313 Lines: 145 I was on vacation, and thus didn't have the opportunity to comment earlier. This message covers Why caching MII values doesn't work. Why extended MII values are useful. On Tue, 12 Jun 2001, Jeff Garzik wrote: > > - You are proposing some caching for the MII registers. I suppose that you > > would like to have this code also working with whatever caching will be > > done for MII access that was recently discussed. Wouldn't this produce > > double caching under some circumstances ? > > You misunderstood the code. The "caching" here is whatever is -already- > being done by the driver. Many Becker-style drivers cache the > advertising value. If such a driver uses the ethtool MII code, that is > one less MII read that needs to occur. That's not the way I read the code. It appears to cache various MII management registers. Caching almost any MII register, except the ID registers, may be invalid. My drivers record values written to MII registers (note 1), but always does an actual read. Here is a quick summary of the basic mode registers MII Reg Function When it changes 0 Control register -- May return the current autonegotiated status. 1 Status register -- Changes with link status and other events. 2&3 ID registers -- Should never change 4 Advertised value -- may change with a transceiver reset 5 Link partner -- changes with negotiation, and "next page" feature (1) The drivers record the autonegotiation advertised value, and recently have been updated to allow writes to the control register to force the speed and duplex. Caching and ioctl() rate-limiting are both a problem for a program I use frequently. It monitors the transceiver to report the timing and state transitions of autonegotiation. It internally handles polling rate limiting by backing off the poll rate when nothing is happening. But when something happens, it polls every timer tick for the next 30 ticks. > Bogdan Costescu wrote: > > On Sun, 10 Jun 2001, Jeff Garzik wrote: > > - I don't know what the long-term plan is about ethtool vs. MII ioctl's. > > If you do plan to replace completely the MII ioctl's, there should be a > > way to access _all_ MII registers provided by the PHY, even if you do this > > in a restricted way (i.e. for CAP_NET_ADMIN only). There is also useful > > info in other registers than the 4 you have in your implementation. > > What are you doing that you need to access all registers from userspace? That's an easy one: "next page" information, diagnostics, status reports, and extended configuration. Much useful information is reported by certain MII transceivers. People that care select transceivers that provide the extended information. Diagnostic reports The approximate distance to the first major impedence mismatch on the cable. Signal status reports. Signal level -- estimate if the cable is to long or flawed. Signal to noise -- estimate the reliability of the link. Near-end cross-talk level. Reversed receive polarity. Operational errors Symbol coding error count Symbol sequence error count Decoder/PLL slip indication. Some examples of extended configuration are Increasing or decreasing the transmit level. Setting a lowered recieve threshold to allow marginal non-noisy links to work. Using symbol coding over fiber. Changing the information reported on the LED outputs > Becker's argument is that each driver should provide a set of MII > ioctls, emulating behavior when hardware isn't exactly per spec. (yes, > right now they are SIOCDEVPRIVATE, but that can be easily changed to > SIOCDEVMIIxxx) My driver sources converted to using specific names, which are currently mapped as follows #ifndef SIOCGMIIPHY #define SIOCGMIIPHY (SIOCDEVPRIVATE) /* Get the PHY in use. */ #define SIOCGMIIREG (SIOCDEVPRIVATE+1) /* Read a PHY register. */ #define SIOCSMIIREG (SIOCDEVPRIVATE+2) /* Write a PHY register. */ #define SIOCGPARAMS (SIOCDEVPRIVATE+3) /* Read operational parameters. */ #define SIOCSPARAMS (SIOCDEVPRIVATE+4) /* Set operational parameters. */ > David's argument is for ethtool, which originally comes out of the sparc > port (see include/asm-sparc/ethtool.h in older trees), and has been > around for a while, but doesn't enjoy the massive deployment that the > MII ioctls enjoy. We have control over the ethtool API, and we can > correct its deficiencies, whereas any MII spec deficiencies must be > worked out inside the driver. You should first understand what MII management registers provide before deciding that you can do better. There are some design uglinesses, but it was put together by people that lived and breathed transceivers. It has been proven over six or seven years or use with no incompatible changes to the original software interface definition. >... > the chip is designed. There are completeness flaws in more than one MII > ioctl implementation. Several drivers will return zeroes for the MII id > registers, for example. The ethtool API doesn't have that problem. Returning zeros for the MII ID registers is accepted industry practice for integrated transceivers. We could have the driver substitute a specific ID, but this isn't an actual problem. > Further, for the userland ethtool program, support for MII ioctls will > be added soon, so that there will be no need for additional mii-tool or > mii-diag tools. This could be easily reversed: the additional ethtool program was not needed in the first place. > > This is nice, but I would like to able to restart autonegotiation even > > without changing any of the advertised capabilities. If I missed this > > possibility, please point me to it... > > no, that is a capability which needs to be added to ethtool. > ETHTOOL_RENEG or ETHTOOL_ANRESTART or something. Basically kick the > link state machine, whether such a state machine is in the driver or in > the MII phy. That's the one big thing that mii-tool can do that ethtool > cannot, AFAICS. An additional capability of the MII ioctl() is that it permits sending "next page" extended information to the link partner. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From owner-netdev@oss.sgi.com Wed Jun 13 13:22:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DKM0N29668 for netdev-outgoing; Wed, 13 Jun 2001 13:22:00 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DKLxP29664 for ; Wed, 13 Jun 2001 13:21:59 -0700 Received: by colin.muc.de id <140651-2>; Wed, 13 Jun 2001 22:22:20 +0200 Message-ID: <20010613222217.48760@colin.muc.de> Date: Wed, 13 Jun 2001 22:22:17 +0200 From: Andi Kleen To: Richard Guy Briggs Cc: Andi Kleen , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <20010613115848.A19894@grendel.conscoop.ottawa.on.ca>; from Richard Guy Briggs on Wed, Jun 13, 2001 at 05:58:48PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 291 Lines: 9 On Wed, Jun 13, 2001 at 05:58:48PM +0200, Richard Guy Briggs wrote: > ...so checking for it is considered debugging code and thus, overhead > that need not be in a production system? In theory yes. It is a good sanity check to stop early when things go wrong though so it is kept. -Andi From owner-netdev@oss.sgi.com Wed Jun 13 13:24:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DKOwn30065 for netdev-outgoing; Wed, 13 Jun 2001 13:24:58 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DKOuP30061 for ; Wed, 13 Jun 2001 13:24:56 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 45D051F78; Wed, 13 Jun 2001 16:24:54 -0400 (EDT) Message-ID: <3B27CC15.2B92E71A@mandrakesoft.com> Date: Wed, 13 Jun 2001 16:24:53 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Donald Becker Cc: Bogdan Costescu , Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" , Linus Torvalds Subject: Re: PATCH: ethtool MII helpers References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5803 Lines: 138 Donald Becker wrote: > I was on vacation, and thus didn't have the opportunity to comment earlier. Thanks a bunch for your comments here. > On Tue, 12 Jun 2001, Jeff Garzik wrote: > > > > - You are proposing some caching for the MII registers. I suppose that you > > > would like to have this code also working with whatever caching will be > > > done for MII access that was recently discussed. Wouldn't this produce > > > double caching under some circumstances ? > > > > You misunderstood the code. The "caching" here is whatever is -already- > > being done by the driver. Many Becker-style drivers cache the > > advertising value. If such a driver uses the ethtool MII code, that is > > one less MII read that needs to occur. > > That's not the way I read the code. It appears to cache various MII > management registers. I still think there is a misunderstanding here, brought about my short explanation and lack of docs.. The key here is the lifetime of the cache. Without extra work on the part of the driver author, the data in struct ethtool_mii_info only exists for a single ioctl call. ethtool_mii_info is a container, not a data cache. So, if you already have MII register cached somewhere, like advertising, or you perform MII reads before calling ethtool_mii_[gs]set, then those values are "cached" in the sense that mii.c will not re-read the register values. Since MII reads are not the quickest operations in the world, I preferred to be flexible in allowing what will occur before and after the ethtool_mii_[gs]set call. > Caching almost any MII register, except the ID registers, may be > invalid. Agreed. I even said this in an MII thread on lkml a couple weeks ago ;-) > Caching and ioctl() rate-limiting are both a problem for a program I use > frequently. It monitors the transceiver to report the timing and state > transitions of autonegotiation. It internally handles polling rate > limiting by backing off the poll rate when nothing is happening. But > when something happens, it polls every timer tick for the next 30 > ticks. Unfortunately that is at loggerheads with the potential for a bunch of people to soak the system with unpriveleged MII reads via ioctl. That is the core problem, and caching or rate-limiting is only a suggested solution. I could forget about rate-limiting if we required CAP_NET_ADMIN and/or CAP_RAW_IO for all these ioctls, but that might cause complaints too.. > > David's argument is for ethtool, which originally comes out of the sparc > > port (see include/asm-sparc/ethtool.h in older trees), and has been > > around for a while, but doesn't enjoy the massive deployment that the > > MII ioctls enjoy. We have control over the ethtool API, and we can > > correct its deficiencies, whereas any MII spec deficiencies must be > > worked out inside the driver. > > You should first understand what MII management registers provide before > deciding that you can do better. There are some design uglinesses, > but it was put together by people that lived and breathed transceivers. > It has been proven over six or seven years or use with no incompatible > changes to the original software interface definition. > > >... > > Further, for the userland ethtool program, support for MII ioctls will > > be added soon, so that there will be no need for additional mii-tool or > > mii-diag tools. > > This could be easily reversed: the additional ethtool program was not > needed in the first place. > > > > This is nice, but I would like to able to restart autonegotiation even > > > without changing any of the advertised capabilities. If I missed this > > > possibility, please point me to it... > > > > no, that is a capability which needs to be added to ethtool. > > ETHTOOL_RENEG or ETHTOOL_ANRESTART or something. Basically kick the > > link state machine, whether such a state machine is in the driver or in > > the MII phy. That's the one big thing that mii-tool can do that ethtool > > cannot, AFAICS. > > An additional capability of the MII ioctl() is that it permits sending > "next page" extended information to the link partner. [move this down here] > This message covers > Why caching MII values doesn't work. [responded above] > Why extended MII values are useful. Ok, thanks, agreed. About the larger issue of why ethtool exists, I wonder about things like: how do the MII ioctls cover things like switching transceivers? supporting aui/10b2? supporting sym phys? ethtool is not just about 10/100 media. It's a general software diagnostics utility and tuning tool for your ethernet driver. The same kernel interface and the same userland program will allow me to associate an ethernet interface with a driver and bus location, adjust media settings, adjust interrupt mitigation settings, or perhaps even perform a driver-specific duty. I am very much convinced that the extended MII ioctls are useful, and would even support codifying them in sockios.h, using the SIOCMII* names you are already using. However I see the MII ioctls as a tuning tool for a specific (though large) subset of hardware. I am still not comfortable with considering the MII ioctls as the standard for communication between the kernel and userland... Tangent, to close on a more concrete technical note. The MII ioctls in their current form are not completely portable. For DaveM and others doing 32-bit userland on 64-bit kernel, you have to pass through ioctl translation layer. Not only will the SIOCMIIxxx ioctls need to be made official, but the structure which has so far been implicitly defined (u16* data) in the ioctls would need to be explicitly defined, in some central location. Regards, Jeff -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Wed Jun 13 14:46:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DLkJi06406 for netdev-outgoing; Wed, 13 Jun 2001 14:46:19 -0700 Received: from localhost.localdomain (cpu2747.adsl.bellglobal.com [207.236.55.216]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DLkHP06399 for ; Wed, 13 Jun 2001 14:46:17 -0700 Received: (from rgb@localhost) by localhost.localdomain (8.12.0.Beta5/8.11.1) id f5DM6JrA021112; Wed, 13 Jun 2001 18:06:19 -0400 Date: Wed, 13 Jun 2001 18:06:19 -0400 From: Richard Guy Briggs To: Andi Kleen Cc: Richard Guy Briggs , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. Message-ID: <20010613180619.C19894@grendel.conscoop.ottawa.on.ca> References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> <20010613222217.48760@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010613222217.48760@colin.muc.de>; from ak@muc.de on Wed, Jun 13, 2001 at 10:22:17PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 995 Lines: 22 On Wed, Jun 13, 2001 at 10:22:17PM +0200, Andi Kleen wrote: > On Wed, Jun 13, 2001 at 05:58:48PM +0200, Richard Guy Briggs wrote: > > ...so checking for it is considered debugging code and thus, overhead > > that need not be in a production system? > > In theory yes. It is a good sanity check to stop early when things go > wrong though so it is kept. That's fine for my code that tries to call skb_push/pull directly, but it doesn't help when other parts of the system call it, assuming everything is hunky-dory and my code has changed something so that the assumptions of other parts of the system are no longer true... It is rather rude to debug... > -Andi slainte mhath, RGB -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: From owner-netdev@oss.sgi.com Wed Jun 13 15:03:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DM32008360 for netdev-outgoing; Wed, 13 Jun 2001 15:03:02 -0700 Received: from localhost.localdomain (cpu2747.adsl.bellglobal.com [207.236.55.216]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DM30P08356 for ; Wed, 13 Jun 2001 15:03:00 -0700 Received: (from rgb@localhost) by localhost.localdomain (8.12.0.Beta5/8.11.1) id f5DMMtm9021189; Wed, 13 Jun 2001 18:22:55 -0400 Date: Wed, 13 Jun 2001 18:22:55 -0400 From: Richard Guy Briggs To: Ben Greear Cc: Richard Guy Briggs , Andi Kleen , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. Message-ID: <20010613182255.D19894@grendel.conscoop.ottawa.on.ca> References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> <3B2791AC.7EAEC35A@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3B2791AC.7EAEC35A@candelatech.com>; from greearb@candelatech.com on Wed, Jun 13, 2001 at 09:15:40AM -0700 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1812 Lines: 42 On Wed, Jun 13, 2001 at 09:15:40AM -0700, Ben Greear wrote: > Richard Guy Briggs wrote: > > > > On Wed, Jun 13, 2001 at 03:21:26AM +0200, Andi Kleen wrote: > > > On Wed, Jun 06, 2001 at 08:11:45PM +0200, Richard Guy Briggs wrote: > > > > Hi again, > > > > > > > > If this is an FAQ, can someone point me to the reasons that skb_push() > > > > and skb_put() panic rather than dropping the skb and complaining in the > > > > log? > > > > > > > > If not, why does it do that? > > > > > > Because an overflow or underflow is always a bug in the code. If you're > > > not sure if the skb has enough room you have to use *_expand_headroom and > > > friends. > > > > ...so checking for it is considered debugging code and thus, overhead > > that need not be in a production system? > > In certain cases you may have to clone and grow the skb. But, since that > is inneficient, you should only do that in wierd boundary cases at most, > and in general, you should make sure the skb has enough reserved space > from the beginning. netdevice.h has some macro magic that figures out > the max-header (ie what to reserve). If you can determine how much you > need, you can change it there... That doesn't help much if I need to add nested IPSec headers... > Ben > > -- > Ben Greear > President of Candela Technologies Inc http://www.candelatech.com > ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear slainte mhath, RGB -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: From owner-netdev@oss.sgi.com Wed Jun 13 15:23:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DMNs709883 for netdev-outgoing; Wed, 13 Jun 2001 15:23:54 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DMNqP09879 for ; Wed, 13 Jun 2001 15:23:52 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f5DMNm417505; Wed, 13 Jun 2001 15:23:48 -0700 Message-ID: <3B27E7F4.1D2BB21F@candelatech.com> Date: Wed, 13 Jun 2001 15:23:48 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Richard Guy Briggs CC: Andi Kleen , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> <3B2791AC.7EAEC35A@candelatech.com> <20010613182255.D19894@grendel.conscoop.ottawa.on.ca> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1665 Lines: 36 Richard Guy Briggs wrote: > > On Wed, Jun 13, 2001 at 09:15:40AM -0700, Ben Greear wrote: > > In certain cases you may have to clone and grow the skb. But, since that > > is inneficient, you should only do that in wierd boundary cases at most, > > and in general, you should make sure the skb has enough reserved space > > from the beginning. netdevice.h has some macro magic that figures out > > the max-header (ie what to reserve). If you can determine how much you > > need, you can change it there... > > That doesn't help much if I need to add nested IPSec headers... The 802.1Q vlan code I wrote has logic to grow the skb when needed, you can look at it if you'd like: http://scry.wanfear.com/~greear/vlan.html I'm guessing that if the header was deterministic before you started, then you should be able to determine how much needed to be reserved for the lower layers. So, you can just make sure you have that much space before sending it to the lower layers. If you can determine a reserve space that works, say, 90% of the time or better, then you can change the netdevice.h macro accordingly, and then do copies to make the skb bigger the other 10% of the time. With memory as cheap as it is, it may be perfectly reasonable to 'waste' 512 bytes or whatever at the front of the skb in order to cut down on your copying for the normal case. This could be configured at compile time, ie only enable it when ipsec is enabled. Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Wed Jun 13 15:29:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DMTp910148 for netdev-outgoing; Wed, 13 Jun 2001 15:29:51 -0700 Received: from localhost.localdomain (cpu2747.adsl.bellglobal.com [207.236.55.216]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DMTnP10145 for ; Wed, 13 Jun 2001 15:29:50 -0700 Received: (from rgb@localhost) by localhost.localdomain (8.12.0.Beta5/8.11.1) id f5DMnq1S021317; Wed, 13 Jun 2001 18:49:52 -0400 Date: Wed, 13 Jun 2001 18:49:52 -0400 From: Richard Guy Briggs To: Ben Greear Cc: Richard Guy Briggs , Andi Kleen , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. Message-ID: <20010613184952.P18366@grendel.conscoop.ottawa.on.ca> References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> <3B2791AC.7EAEC35A@candelatech.com> <20010613182255.D19894@grendel.conscoop.ottawa.on.ca> <3B27E7F4.1D2BB21F@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3B27E7F4.1D2BB21F@candelatech.com>; from greearb@candelatech.com on Wed, Jun 13, 2001 at 03:23:48PM -0700 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2497 Lines: 53 On Wed, Jun 13, 2001 at 03:23:48PM -0700, Ben Greear wrote: > Richard Guy Briggs wrote: > > > > On Wed, Jun 13, 2001 at 09:15:40AM -0700, Ben Greear wrote: > > > > In certain cases you may have to clone and grow the skb. But, since that > > > is inneficient, you should only do that in wierd boundary cases at most, > > > and in general, you should make sure the skb has enough reserved space > > > from the beginning. netdevice.h has some macro magic that figures out > > > the max-header (ie what to reserve). If you can determine how much you > > > need, you can change it there... > > > > That doesn't help much if I need to add nested IPSec headers... > > The 802.1Q vlan code I wrote has logic to grow the skb when needed, > you can look at it if you'd like: http://scry.wanfear.com/~greear/vlan.html > > I'm guessing that if the header was deterministic before you started, > then you should be able to determine how much needed to be reserved > for the lower layers. So, you can just make sure you have that much > space before sending it to the lower layers. If you can determine > a reserve space that works, say, 90% of the time or better, then > you can change the netdevice.h macro accordingly, and then do copies > to make the skb bigger the other 10% of the time. With memory as > cheap as it is, it may be perfectly reasonable to 'waste' 512 bytes > or whatever at the front of the skb in order to cut down on your copying > for the normal case. > > This could be configured at compile time, ie only enable it when ipsec is > enabled. It is not static. It depends on the number of nestings. It also depends on combinations of seperate AH, adding IPCOMP, tunnel mode, etc... I don't understand dst cache stuff well enough yet, but that may be the sort of thing that could be stored in a dst cache entry per route. (I say this, knowing that our current routing paradigm is going to change with KLIPS2...) > Ben > > -- > Ben Greear > President of Candela Technologies Inc http://www.candelatech.com > ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear slainte mhath, RGB -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: From owner-netdev@oss.sgi.com Wed Jun 13 15:42:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DMgWc11250 for netdev-outgoing; Wed, 13 Jun 2001 15:42:32 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DMgWP11247 for ; Wed, 13 Jun 2001 15:42:32 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f5DMgO419826; Wed, 13 Jun 2001 15:42:24 -0700 Message-ID: <3B27EC50.DADCBE2A@candelatech.com> Date: Wed, 13 Jun 2001 15:42:24 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Richard Guy Briggs CC: Andi Kleen , netdev@oss.sgi.com Subject: Re: skb_pull, etc. panics. References: <20010606141145.L31244@grendel.conscoop.ottawa.on.ca> <20010613032126.B5323@fred.local> <20010613115848.A19894@grendel.conscoop.ottawa.on.ca> <3B2791AC.7EAEC35A@candelatech.com> <20010613182255.D19894@grendel.conscoop.ottawa.on.ca> <3B27E7F4.1D2BB21F@candelatech.com> <20010613184952.P18366@grendel.conscoop.ottawa.on.ca> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2351 Lines: 48 Richard Guy Briggs wrote: > > On Wed, Jun 13, 2001 at 03:23:48PM -0700, Ben Greear wrote: > > The 802.1Q vlan code I wrote has logic to grow the skb when needed, > > you can look at it if you'd like: http://scry.wanfear.com/~greear/vlan.html > > > > I'm guessing that if the header was deterministic before you started, > > then you should be able to determine how much needed to be reserved > > for the lower layers. So, you can just make sure you have that much > > space before sending it to the lower layers. If you can determine > > a reserve space that works, say, 90% of the time or better, then > > you can change the netdevice.h macro accordingly, and then do copies > > to make the skb bigger the other 10% of the time. With memory as > > cheap as it is, it may be perfectly reasonable to 'waste' 512 bytes > > or whatever at the front of the skb in order to cut down on your copying > > for the normal case. > > > > This could be configured at compile time, ie only enable it when ipsec is > > enabled. > > It is not static. It depends on the number of nestings. It also > depends on combinations of seperate AH, adding IPCOMP, tunnel mode, > etc... Is it normally under 512 bytes of overhead (please god let that be true!! :)) So reserve that in netdevice.h Then just do checks each time you add layers. If you have to copy/expand, so be it (bump a counter so you can examine your performance tuning later.) At your outer-most wrapper/layer, you will be sending it down the stack, right? At that point, ensure (again), that you have enough space on the front of the header for all lower layers (those layers' max usage is known, or the current code would break). If not, allocate more and copy again. That at least should be correct, even if it is somewhat innefficient. As you better understand your heuristics, then you can optimize your code to do less copies and/or waste less memory for the common case. I believe that the limiting factor will usually be bandwidth anyway, so even a few extra ram->ram copies will probably not kill you, especially with the ever increasing processing power available. -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Jun 14 09:23:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5EGNot04000 for netdev-outgoing; Thu, 14 Jun 2001 09:23:50 -0700 Received: from web12301.mail.yahoo.com (web12301.mail.yahoo.com [216.136.173.99]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5EGNkP03997 for ; Thu, 14 Jun 2001 09:23:46 -0700 Message-ID: <20010614162346.66336.qmail@web12301.mail.yahoo.com> Received: from [63.222.137.34] by web12301.mail.yahoo.com; Thu, 14 Jun 2001 09:23:46 PDT Date: Thu, 14 Jun 2001 09:23:46 -0700 (PDT) From: houwu chen Subject: report problems To: torvalds@transmeta.com Cc: davem@redhat.com, netdev@oss.sgi.com, hwuchen@yahoo.com MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-1606224634-992535826=:63441" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 6150 Lines: 251 --0-1606224634-992535826=:63441 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Linus and David, I am Houwu Chen, currently I am software engineer in networking and wireless area. I think I may find a problem in the Linux kernel source code as follows: Version: Linux-2.4.1 File: linux/net/socket.c The related source segments are below: /* Argument list sizes for sys_socketcall */ #define AL(x) ((x) * sizeof(unsigned long)) static unsigned char nargs[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), AL(3),AL(3),AL(4),AL(4),AL(4),AL(6), AL(6),AL(2),AL(5),AL(5),AL(3),AL(3)}; #undef AL /* * System call vectors. * * Argument checking cleaned up. Saved 20% in size. * This function doesn't need to set the kernel lock because * it is set by the callees. */ asmlinkage long sys_socketcall(int call, unsigned long *args) { unsigned long a[6]; unsigned long a0,a1; int err; if(call<1||call>SYS_RECVMSG) return -EINVAL; /* copy_from_user should be SMP safe. */ if (copy_from_user(a, args, nargs[call])) return -EFAULT; a0=a[0]; a1=a[1]; switch(call) { case SYS_SOCKET: err = sys_socket(a0,a1,a[2]); break; ............................. case SYS_SENDTO: err = sys_sendto(a0,(void *)a1, a[2], a[3], (struct sockaddr *)a[4], a[5]); break; case SYS_RECV: err = sys_recv(a0, (void *)a1, a[2], a[3]); break; ............................... return err; } The problems are: 1) in case SYS_SENDTO: the parameter size is defined as nargs[10] = AL(4) for function sys_sendto(..), but it has 6 parameters. 2) in case SYS_RECV: the parameter size is defined as nargs[11] = AL(6) for function sys_recv(..), but it has 4 parameters. The question is if they are bugs, why TCP/UDP sockets have been working fine. In the problem 2, the system gives the function sys_recv(..) two more spaces, that is fine, the system only lost two memory spaces. In the problem 1, the last two parameters (struct sockaddr *)a[4], a[5] will not be passed to function sys_sendto(..) I am guessing that because the bind(..) function is working fine, in the user program, the client side and server side are all run bind(..) function to bind each other, then in sendto(sockId,..) function, the last two parameters may not be used. I also check the linux-2.0.0 and linux-2.1.0, they have the same problems as linux-2.4.1. If they are not the problems, please let me know, and I am sorry to bother you. Regards Houwu Chen hwuchen@Yahoo.com (978)658-0298 (home) __________________________________________________ Do You Yahoo!? Spot the hottest trends in music, movies, and more. http://buzz.yahoo.com/ --0-1606224634-992535826=:63441 Content-Type: text/plain; name="houdebug.txt" Content-Description: houdebug.txt Content-Disposition: inline; filename="houdebug.txt" TO: Linus Torvalds torvalds@transmeta.com TO: netdev@oss.sgi.com TO: David S. Miller davem@redhat.com Hi Linus and David, I am Houwu Chen, currently I am software engineer in networking and wireless area. I think I may find a problem in the Linux kernel source code as follows: Version: Linux-2.4.1 File: linux/net/socket.c The related source segments are below: /* Argument list sizes for sys_socketcall */ #define AL(x) ((x) * sizeof(unsigned long)) static unsigned char nargs[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), AL(3),AL(3),AL(4),AL(4),AL(4),AL(6), AL(6),AL(2),AL(5),AL(5),AL(3),AL(3)}; #undef AL /* * System call vectors. * * Argument checking cleaned up. Saved 20% in size. * This function doesn't need to set the kernel lock because * it is set by the callees. */ asmlinkage long sys_socketcall(int call, unsigned long *args) { unsigned long a[6]; unsigned long a0,a1; int err; if(call<1||call>SYS_RECVMSG) return -EINVAL; /* copy_from_user should be SMP safe. */ if (copy_from_user(a, args, nargs[call])) return -EFAULT; a0=a[0]; a1=a[1]; switch(call) { case SYS_SOCKET: err = sys_socket(a0,a1,a[2]); break; ............................. case SYS_SENDTO: err = sys_sendto(a0,(void *)a1, a[2], a[3], (struct sockaddr *)a[4], a[5]); break; case SYS_RECV: err = sys_recv(a0, (void *)a1, a[2], a[3]); break; ............................... return err; } The problems are: 1) in case SYS_SENDTO: the parameter size is defined as nargs[10] = AL(4) for function sys_sendto(..), but it has 6 parameters. 2) in case SYS_RECV: the parameter size is defined as nargs[11] = AL(6) for function sys_recv(..), but it has 4 parameters. The question is if they are bugs, why TCP/UDP sockets have been working fine. In the problem 2, the system gives the function sys_recv(..) two more spaces, that is fine, the system only lost two memory spaces. In the problem 1, the last two parameters (struct sockaddr *)a[4], a[5] will not be passed to function sys_sendto(..) I am guessing that because the bind(..) function is working fine, in the user program, the client side and server side are all run bind(..) function to bind each other, then in sendto(sockId,..) function, the last two parameters may not be used. I also check the linux-2.0.0 and linux-2.1.0, they have the same problems as linux-2.4.1. If they are not the problems, please let me know, and I am sorry to bother you. Regards Houwu Chen hwuchen@Yahoo.com (978)658-0298 (home) --0-1606224634-992535826=:63441-- From owner-netdev@oss.sgi.com Thu Jun 14 09:29:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5EGTB204236 for netdev-outgoing; Thu, 14 Jun 2001 09:29:11 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5EGTAP04233 for ; Thu, 14 Jun 2001 09:29:10 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id JAA11112; Thu, 14 Jun 2001 09:28:17 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15144.58913.832843.799792@pizda.ninka.net> Date: Thu, 14 Jun 2001 09:28:17 -0700 (PDT) To: houwu chen Cc: torvalds@transmeta.com, netdev@oss.sgi.com Subject: Re: report problems In-Reply-To: <20010614162346.66336.qmail@web12301.mail.yahoo.com> References: <20010614162346.66336.qmail@web12301.mail.yahoo.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 319 Lines: 17 houwu chen writes: > nargs[18] ... > 1) in case SYS_SENDTO: the parameter size is defined > as > nargs[10] = AL(4) for function sys_sendto(..), but it > has 6 > parameters. "nargs" means "number of arguments after past the first two" not "number of total arguments" Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Jun 14 11:31:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5EIVJk07868 for netdev-outgoing; Thu, 14 Jun 2001 11:31:19 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5EIVFP07857 for ; Thu, 14 Jun 2001 11:31:16 -0700 Received: (qmail 24216 invoked by uid 99); 14 Jun 2001 18:31:05 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 14 Jun 2001 18:31:05 -0000 Received: by fred.muc.de (Postfix, from userid 500) id F0ECFE2D4D; Thu, 14 Jun 2001 20:41:12 +0200 (CEST) Date: Thu, 14 Jun 2001 20:41:12 +0200 From: Andi Kleen To: houwu chen Cc: torvalds@transmeta.com, davem@redhat.com, netdev@oss.sgi.com Subject: Re: report problems Message-ID: <20010614204112.A7352@fred.local> References: <20010614162346.66336.qmail@web12301.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20010614162346.66336.qmail@web12301.mail.yahoo.com>; from hwuchen@yahoo.com on Thu, Jun 14, 2001 at 06:23:46PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 226 Lines: 9 > If they are not the problems, please let me know, and I am sorry > to bother you. SYS_SOCKET is 1, not 0, and the first element of nargs[] is a dummy. You were shifted by one when reading the array. So no problem. -Andi From owner-netdev@oss.sgi.com Fri Jun 15 10:23:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5FHN6S21523 for netdev-outgoing; Fri, 15 Jun 2001 10:23:06 -0700 Received: from mail.iwr.uni-heidelberg.de (mail.iwr.uni-heidelberg.de [129.206.104.30]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5FHN4k21520 for ; Fri, 15 Jun 2001 10:23:04 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f5FHMvg02820; Fri, 15 Jun 2001 19:22:57 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id TAA14096; Fri, 15 Jun 2001 19:22:44 +0200 Date: Fri, 15 Jun 2001 19:22:44 +0200 (CEST) From: Bogdan Costescu To: Jeff Garzik cc: Donald Becker , Linux Kernel Mailing List , , "David S. Miller" , Linus Torvalds Subject: Re: PATCH: ethtool MII helpers In-Reply-To: <3B27CC15.2B92E71A@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5133 Lines: 113 On Wed, 13 Jun 2001, Jeff Garzik wrote: > > Caching and ioctl() rate-limiting are both a problem for a program I use > > frequently. > > Unfortunately that is at loggerheads with the potential for a bunch of > people to soak the system with unpriveleged MII reads via ioctl. > > I could forget about rate-limiting if we required CAP_NET_ADMIN and/or > CAP_RAW_IO for all these ioctls, but that might cause complaints too.. In the last thread, I proposed that caching/rate-limiting should apply only to unpriviledged users. This way applications like Don's would still run (but require to be run as root) and normal users would not DoS it. I was thinking of something like this (caching applied to 3c59x): static int vortex_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) { ... switch(cmd) { case SIOCDEVPRIVATE: /* Get the address of the PHY in use. */ data[0] = phy; break; case SIOCDEVPRIVATE+1: /* Read the specified MII register. */ reg = data[1] & 0x1f; if (capable(CAP_NET_ADMIN) || (vp->cache.to[reg] + MII_CACHE_TIMEOUT < jiffies)) { EL3WINDOW(4); data[3] = mdio_read(dev, data[0] & 0x1f, reg); vp->cache.val[reg] = data[3]; vp->cache.to[reg] = jiffies; } else { data[3] = vp->cache.val[reg]; } retval = 0; break; case ... where MII_CACHE_TIMEOUT is a constant here, but should be something modifiable through /proc. I've choosen a resolution of HZ for cached reads as it's easily accesible. The example is obviously simplified, it only handles one transceiver; if there can be more active at the same time, each should have it's own cache. > About the larger issue of why ethtool exists, I wonder about things > like: how do the MII ioctls cover things like switching transceivers? > supporting aui/10b2? supporting sym phys? AFAIK, MII ioctl's are right now only allowing access to MII registers. Switching transceivers is a tough job, that's why it's generally done in init()/open(). If you mean by "switching" just a "set this transceiver for use", this might be possible to do, but if you want to check if a transceiver is available and then "set for use", this can't be done easily. Some transceivers can't return any info and the general way of probing them is by trying to send/receive something - which opens the window for nice races with the Tx/Rx parts which can be active at that time. That's why my impression is that changing transceivers can only be safely done at init()/open(). AUI cannot be probed, so you have to send/receive something. Furthemore, I don't think that the AUI interface tells you anything about what's connected to it, so you have to blindly activate it and hope that it works. 10base2 might give link beat, but otherwise AUI considerations apply. AFAIK, Sym phys are easier to emulate (isn't tulip already doing this ?). > The same > kernel interface and the same userland program will allow me to > associate an ethernet interface with a driver and bus location, adjust > media settings, adjust interrupt mitigation settings, or perhaps even > perform a driver-specific duty. So far, Don's work proved very well thought. There's a mii-diag which deals with media settings for MII-capable NICs and there are diag tools for each chipset (vortex-diag, tulip-diag, etc.) which deal with driver-specific duties. [ I don't have any relationship with Don. In fact, you can see on the vortex list that we disagreed many times. ] I wouldn't object to making mii-diag able to more generally deal with media settings (probably by provinding MII-like interfaces for NICs that don't have MII). But the other way around of trying to do driver-specific duties from _one_ tool seems a bit to hard for me. AFAIK, associating ethernet interfaces with drivers and bus locations has no standard right now, but I agree that is a need. So, if everybody agrees, the ethtool way can be stated as the standard. Again AFAIK, interrupt mitigation has no standard right now, in fact only few drivers support it. So the fact that is available from ethtool is not really relevant to me (before you ask, 3c59x supported hardware doesn't have hardware Rx interrupt mitigation). If Jamal's work for general interrupt mitigation will be included, then I surely see the need for a tool to control it. As it will be a core functionality, one tool will do it, there's no driver dependency. > However I see the MII ioctls as a tuning tool for a specific (though > large) subset of hardware. I am still not comfortable with considering > the MII ioctls as the standard for communication between the kernel and > userland... The low level stuff should be the same in both cases (MII-like and ethtool). Is what you add on top of it that makes it MII-like or ethtool. Or am I missing something ? Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Fri Jun 15 17:20:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5G0Ksj30626 for netdev-outgoing; Fri, 15 Jun 2001 17:20:54 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5G0Kqk30620 for ; Fri, 15 Jun 2001 17:20:52 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f5G0Kp422259 for ; Fri, 15 Jun 2001 17:20:51 -0700 Message-ID: <3B2AA663.9D8EE4A0@candelatech.com> Date: Fri, 15 Jun 2001 17:20:51 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: LANforge traffic generator updated (can set Ethernet port speeds now) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1168 Lines: 28 I updated my LANforge traffic generator to be able to set the auto-negotiate, fixed-rate, and advertise flags in mii-diag compliant drivers (tested on the eepro driver (works), and tulip driver (doesn't currently work well)). It doesn't do anything that mii-diag can't, but it does give you a nice Java GUI and CLI that can control many machines at once. I have also written a perl script (lf_verify.pl) that will try to set many different combinations on a card to see if it handles it correctly. Most of LANforge is close sourced, but I'll be happy to give free licenses to open-source hackers if it will aid them in their network/driver development. If there is general interest for a graphical tool to set port speeds, then I can probably find time to rip out that functionality and open-source it. Sound useful? Please see our web page: http://www.candelatech.com or contact me if you are interested and would like some licenses. Thanks, Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Fri Jun 15 23:12:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5G6CJj01485 for netdev-outgoing; Fri, 15 Jun 2001 23:12:19 -0700 Received: from mta2.snfc21.pbi.net (mta2.snfc21.pbi.net [206.13.28.123]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5G6CIk01482 for ; Fri, 15 Jun 2001 23:12:18 -0700 Received: from mercury.snydernet.lan ([64.170.211.250]) by mta2.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9) with SMTP id <0GF0007V0EGHVX@mta2.snfc21.pbi.net> for netdev@oss.sgi.com; Fri, 15 Jun 2001 23:09:53 -0700 (PDT) Date: Fri, 15 Jun 2001 23:09:54 -0700 From: Steve Snyder Subject: Does ISA interrupt latency harm overall system performance? To: netdev@oss.sgi.com Reply-to: swsnyder@home.com Message-id: <01061523095400.01763@mercury.snydernet.lan> MIME-version: 1.0 X-Mailer: KMail [version 1.2] Content-type: text/plain; charset="iso-8859-1" Content-transfer-encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 957 Lines: 19 I understand that ISA devices have a greater interrupt latency than PCI devices. Does this greater latency deteriorate overall system performance or just the device whose interrupts are being serviced? I've got 2 10Mbps Ethernet cards, one is an ISA card and the other PCI. I use the ISA card as a secondary net device (eth1) which is connected to a cable modem. Given that the cable modem will never saturate a 10Mbps Ethernet card, the ISA/PCI question shouldn't be relevant to networking performance. I chose to use the ISA card because it leaves another PCI slot in my box (i686-based Linux 2.4.x) available for other uses. Let me add one more thing to this context. The PCI card is a 3Com 3C590 "Vortex" so, according to the Linux doc, this device has busmastering capabilities. Is this a factor in overall system performance? Any thoughts on this? Please cc me with any responses as I am not a subscriber to this list. Thank you. From owner-netdev@oss.sgi.com Sat Jun 16 02:03:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5G93du01415 for netdev-outgoing; Sat, 16 Jun 2001 02:03:39 -0700 Received: from circuit.moureaux.com (IDENT:root@m201-2-p35.warwick.net [208.242.201.90]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5G93aZ01412 for ; Sat, 16 Jun 2001 02:03:37 -0700 Received: from localhost (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.11.2/8.11.2) with ESMTP id f5G93gB02953; Sat, 16 Jun 2001 05:03:42 -0400 Date: Sat, 16 Jun 2001 05:03:42 -0400 (EDT) From: Statux X-X-Sender: To: Steve Snyder cc: Subject: Re: Does ISA interrupt latency harm overall system performance? In-Reply-To: <01061523095400.01763@mercury.snydernet.lan> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2905 Lines: 53 > I understand that ISA devices have a greater interrupt latency than PCI > devices. Does this greater latency deteriorate overall system performance > or just the device whose interrupts are being serviced? ISA cards are quite a bit slower interrupting, yes. Logically, it would slow a system down because of this, but possibly not enough for you to detect with your eye (or whatever) unless it's a card that interrupts a lot (like a primary use NIC for instance). Even certain ISA NICs which send only so much data in so much time can be a problem. My old ISA NIC had problems on this system on data xfers above 100KB (most of what I did, tho, was less than that since it's only used for the LAN interface here (2 or 3 systems).. mainly for SMB printing, etc. My new NIC is a PCI board (3c905B). Funny thing is, this one shows collisions (which is due to the 10BaseT hub).. only a small percentage of them though... so no prob. The ISA NIC never showed collisions after about the first time I had the hub installed.. I feel that collision detection/accounting was actually turned off on the thing (which isn't right for a half-duplex interface). Anyway... PCI cards are typically much faster than ISA, but only a few types of cards will actually show a performance increase (I'm not talking just NICs) due to design, drivers, or just simply because the type of card isn't a big interrupter. > I've got 2 10Mbps Ethernet cards, one is an ISA card and the other PCI. I > use the ISA card as a secondary net device (eth1) which is connected to a > cable modem. Given that the cable modem will never saturate a 10Mbps > Ethernet card, the ISA/PCI question shouldn't be relevant to networking > performance. I chose to use the ISA card because it leaves another PCI > slot in my box (i686-based Linux 2.4.x) available for other uses. How fast do cable modems run at anyway (I live in the woods.. no service here)? I know they don't do the 1.25MB that 10Mbps NICs can do... but I'm just curious. > Let me add one more thing to this context. The PCI card is a 3Com 3C590 > "Vortex" so, according to the Linux doc, this device has busmastering > capabilities. Is this a factor in overall system performance? Again, depending on the card. NICs are always good examples because they process a lot of data quite quickly. Busmastering, I think, is a way to keep things in sync (or at least moving at a proportional speed) with the CPU/RAM and whatnot. Who knows.. my answers are prolly not exact. I've grown up with computers over the better part of the last 15 years but times have changed, I'm sure. In short, it depends on the card. With a NIC.. quite possibly cause it ties up the CPU since the CPU has to finish an interrupt before switching to another task/process. So your run queue can get a little backed up... but in most cases, you'd never know the difference unless you timed everything. From owner-netdev@oss.sgi.com Sat Jun 16 07:44:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5GEikr04562 for netdev-outgoing; Sat, 16 Jun 2001 07:44:46 -0700 Received: from mta2.snfc21.pbi.net (mta2.snfc21.pbi.net [206.13.28.123]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5GEikZ04559 for ; Sat, 16 Jun 2001 07:44:46 -0700 Received: from mercury.snydernet.lan ([64.170.211.250]) by mta2.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9) with SMTP id <0GF100MPP23BQS@mta2.snfc21.pbi.net> for netdev@oss.sgi.com; Sat, 16 Jun 2001 07:40:23 -0700 (PDT) Date: Sat, 16 Jun 2001 07:40:22 -0700 From: Steve Snyder Subject: Re: Does ISA interrupt latency harm overall system performance? In-reply-to: To: Statux Cc: netdev@oss.sgi.com Reply-to: swsnyder@home.com Message-id: <01061607402200.02529@mercury.snydernet.lan> MIME-version: 1.0 X-Mailer: KMail [version 1.2] Content-type: text/plain; charset="iso-8859-1" Content-transfer-encoding: 8bit References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1074 Lines: 23 On Saturday 16 June 2001 02:03 am, Statux wrote: > > I've got 2 10Mbps Ethernet cards, one is an ISA card and the other PCI. > > I use the ISA card as a secondary net device (eth1) which is connected > > to a cable modem. Given that the cable modem will never saturate a > > 10Mbps Ethernet card, the ISA/PCI question shouldn't be relevant to > > networking performance. I chose to use the ISA card because it leaves > > another PCI slot in my box (i686-based Linux 2.4.x) available for other > > uses. > > How fast do cable modems run at anyway (I live in the woods.. no service > here)? I know they don't do the 1.25MB that 10Mbps NICs can do... but I'm > just curious. The throughput varies by ISP and cable modem - and by time of day since you're essentially talking on a party line. The best throughput I've seen on this 3Com cable modem is 100kbytes/sec. 3Com says (make that "said", since they are out of the business now) that the device runs up to 3Mbps. So far @Home has not seen fit to let me get that level of performance. Thanks for the response. From owner-netdev@oss.sgi.com Sun Jun 17 22:42:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5I5g1S02630 for netdev-outgoing; Sun, 17 Jun 2001 22:42:01 -0700 Received: from stsl.siemens.com.tw (stsl.siemens.com.tw [192.72.45.189]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5I5fwV02627 for ; Sun, 17 Jun 2001 22:41:59 -0700 Received: from stslex.siemens.com.tw (stslex [192.72.45.13]) by stsl.siemens.com.tw (8.11.4/8.11.4) with ESMTP id f5I5sWX04266 for ; Mon, 18 Jun 2001 13:54:33 +0800 (CST) Received: by stslex.siemens.com.tw with Internet Mail Service (5.5.2650.21) id ; Mon, 18 Jun 2001 13:40:32 +0800 Message-ID: <92C0C0AC8AE8D411864300105A835CBB50A621@stslex.siemens.com.tw> From: Ra Chen To: netdev@oss.sgi.com Subject: newbie question Date: Mon, 18 Jun 2001 13:40:28 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C0F7B9.2E6A1B10" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2930 Lines: 89 This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C0F7B9.2E6A1B10 Content-Type: text/plain; charset="windows-1252" This may be off-topic but someone may be willing to help me with this small question here. Does Linux kernel aaccept a very very long timer? To be more exact, does mod_timer() accept an "expires" parameter whose value is somewhat greater than 0x80000000? Looking at the code (2.4.5, timer.c, line 143-145), it seems an expiration time that large is considered to be a special case. Could somebody shed some light on why it is handled this way? Thanks ================================== Ra Chen Siemens Telecommunication Systems Ltd. R&D Engineer E-mail: ra@stsl.siemens.com.tw Tel: 886-2-2518-6539 Fax: 886-2-2505-3866 ================================== ------_=_NextPart_001_01C0F7B9.2E6A1B10 Content-Type: text/html; charset="windows-1252" Content-Transfer-Encoding: quoted-printable newbie question

This may be off-topic but someone may = be willing to help me with this small question here.

Does Linux kernel aaccept a very very = long timer?
To be more exact,
does mod_timer() accept an = "expires" parameter whose value is somewhat greater than = 0x80000000?
Looking at the code (2.4.5, timer.c, = line 143-145),
it seems an expiration time that = large is considered to be a special case.

Could somebody shed some light on why = it is handled this way?

Thanks

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Ra Chen
Siemens Telecommunication Systems = Ltd.
R&D Engineer
E-mail: = ra@stsl.siemens.com.tw
Tel: 886-2-2518-6539
Fax: 886-2-2505-3866
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

------_=_NextPart_001_01C0F7B9.2E6A1B10-- From owner-netdev@oss.sgi.com Mon Jun 18 02:11:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5I9BDM05361 for netdev-outgoing; Mon, 18 Jun 2001 02:11:13 -0700 Received: from hindon.hss.co.in ([202.54.26.202]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5I9BAV05352 for ; Mon, 18 Jun 2001 02:11:11 -0700 Received: from sandesh.hss.hns.com (localhost [127.0.0.1]) by hindon.hss.co.in (8.10.0/8.10.0) with SMTP id f5I6Aiw01727 for ; Mon, 18 Jun 2001 11:41:25 +0530 (IST) Received: by sandesh.hss.hns.com(Lotus SMTP MTA v4.6.3 (733.2 10-16-1998)) id 65256A6F.0022286E ; Mon, 18 Jun 2001 11:43:05 +0530 X-Lotus-FromDomain: HSS From: sndtrn27@hss.hns.com To: netdev@oss.sgi.com Message-ID: <65256A6F.002227AF.00@sandesh.hss.hns.com> Date: Mon, 18 Jun 2001 11:43:05 +0530 Subject: 2.2.17 kernel compilation problems Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 449 Lines: 15 hi all i have mandrake 7.2 installed (kernel ver: 2.2.17-21mdk). i had patched the kernel for TCP connection migration by a patch by snoeren@lcs.mit.edu from the site http://nms.lcs.mit.edu/software/migrate. the kernel compiles properply but when booted from this kernel the network does not come up. the error is that no modules.dep file is generated. if anybody has any solutions please help me out. thankx rajiv From owner-netdev@oss.sgi.com Mon Jun 18 02:59:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5I9xIS06155 for netdev-outgoing; Mon, 18 Jun 2001 02:59:18 -0700 Received: from circuit.moureaux.com (IDENT:root@m202-3-p11.warwick.net [208.242.202.116]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5I9xGV06152 for ; Mon, 18 Jun 2001 02:59:16 -0700 Received: from localhost (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.11.2/8.11.2) with ESMTP id f5I9xhC02133; Mon, 18 Jun 2001 05:59:44 -0400 Date: Mon, 18 Jun 2001 05:59:43 -0400 (EDT) From: Statux X-X-Sender: To: cc: Subject: Re: 2.2.17 kernel compilation problems In-Reply-To: <65256A6F.002227AF.00@sandesh.hss.hns.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 754 Lines: 25 to make a modules.dep, you typically run something like: depmod -a which will create one for the kernel running only, and put it where it needs to be. Also, double check your /etc/modules.conf file just to be safe. On Mon, 18 Jun 2001 sndtrn27@hss.hns.com wrote: > > > hi all > i have mandrake 7.2 installed (kernel ver: 2.2.17-21mdk). i had patched > the kernel for TCP connection migration by a patch by snoeren@lcs.mit.edu from > the site http://nms.lcs.mit.edu/software/migrate. > the kernel compiles properply but when booted from this kernel the network does > not come up. > the error is that no modules.dep > file is generated. > if anybody has any solutions please help me out. > thankx > rajiv > > -- -Statux From owner-netdev@oss.sgi.com Tue Jun 19 07:50:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5JEo9o20129 for netdev-outgoing; Tue, 19 Jun 2001 07:50:09 -0700 Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.140.186]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5JEo7V20122 for ; Tue, 19 Jun 2001 07:50:07 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id XAA25387; Tue, 19 Jun 2001 23:51:18 +0900 To: netdev@oss.sgi.com CC: usagi-users@linux-ipv6.org Subject: bug in dst / hbh tlv options parser X-Mailer: Mew version 1.94.2 on XEmacs 21.1 (Capitol Reef) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010619235118W.yoshfuji@wide.ad.jp> Date: Tue, 19 Jun 2001 23:51:18 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 766 Lines: 27 Hi, Linux-2.4.4 and 2.4.5 have a bug; they do not parse tlv options in ipv6 dst / hbh extension headers. Here's patch. Branch: bFIX_2_4_5-20010619 Fix: rFIX_2_4_5-20010619->tFIX_2_4_5-20010619_20010619 Index: net/ipv6/exthdrs.c =================================================================== RCS file: /cvsroot/usagi/kernel/linux24/net/ipv6/exthdrs.c,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.2.1 diff -u -r1.1.1.2 -r1.1.1.2.2.1 --- net/ipv6/exthdrs.c 2001/05/01 09:41:01 1.1.1.2 +++ net/ipv6/exthdrs.c 2001/06/19 14:37:46 1.1.1.2.2.1 @@ -112,6 +112,7 @@ if ((skb->h.raw + len) - skb->data > skb_headlen(skb)) goto bad; + off += 2; len -= 2; while (len > 0) { -- YOSHIFUJI Hideaki @ USAGI Project From owner-netdev@oss.sgi.com Tue Jun 19 08:59:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5JFxGc21789 for netdev-outgoing; Tue, 19 Jun 2001 08:59:16 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5JFxEV21785 for ; Tue, 19 Jun 2001 08:59:15 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id IAA29559; Tue, 19 Jun 2001 08:59:06 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15151.30410.49853.260628@pizda.ninka.net> Date: Tue, 19 Jun 2001 08:59:06 -0700 (PDT) To: YOSHIFUJI.Hideaki/$B5HF#1QL@.sgi.com (B ) Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: bug in dst / hbh tlv options parser In-Reply-To: <20010619235118W.yoshfuji@wide.ad.jp> References: <20010619235118W.yoshfuji@wide.ad.jp> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 190 Lines: 9 > Linux-2.4.4 and 2.4.5 have a bug; they do not parse tlv options in > ipv6 dst / hbh extension headers. Here's patch. Patch applied, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Jun 20 03:35:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KAZtV29647 for netdev-outgoing; Wed, 20 Jun 2001 03:35:55 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KAZsV29644 for ; Wed, 20 Jun 2001 03:35:54 -0700 Received: from ws2.piuha.net (ws2.piuha.net [195.165.196.2]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id PAA00480 for ; Sun, 17 Jun 2001 15:22:39 -0700 (PDT) mail_from (Tommi.Linnakangas@piuha.net) Received: from piuha.net (ws4.piuha.net [195.165.196.4]) by ws2.piuha.net (Postfix) with ESMTP id A42F96A901 for ; Sun, 17 Jun 2001 22:28:05 +0300 (EEST) Message-ID: <3B2D04C5.633E15F8@piuha.net> Date: Sun, 17 Jun 2001 22:28:05 +0300 From: Tommi Linnakangas X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.3-ipsec i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Networking symbol exports Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 929 Lines: 35 Hi, I'm not sure whether this is the right place to ask, but at least Linux kernel 2.4.3 MAINTAINERS list the address for general networking. So I try anyway. I have a question. Is it possible to add a few symbols to netsyms.c for export? We are doing some networking development with linux kernel modules, and would like to have a few symbols exported for modules in the standard kernel so that we don't need to patch the kernel. The symbols we'd like to have in netsyms.c are: #include EXPORT_SYMBOL(ip_forward_options); EXPORT_SYMBOL(ip6_route_input); EXPORT_SYMBOL(ip6_route_output); EXPORT_SYMBOL(ipv6_parse_exthdrs); EXPORT_SYMBOL(ipv6_statistics); I see that in IPv4 side at least ip_route_input and ip_route_output are exported. Could it also be possible for IPv6? Are there any rules for exported symbols? Yours, Tommi Linnakangas -- mailto:Tommi.Linnakangas@piuha.net cphone:+358405504139 From owner-netdev@oss.sgi.com Wed Jun 20 04:42:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KBgIr08211 for netdev-outgoing; Wed, 20 Jun 2001 04:42:18 -0700 Received: from mailserver.wilnetonline.net ([202.164.96.4]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KBgAV08201 for ; Wed, 20 Jun 2001 04:42:11 -0700 Received: from zombie ([202.164.97.26]) by mailserver.wilnetonline.net (Netscape Messaging Server 4.15) with SMTP id GF88LU04.L0Y for ; Wed, 20 Jun 2001 17:14:18 +0530 Message-ID: <001901c0f97d$b1bc26a0$1a61a4ca@zombie> From: ipatel@wilnetonline.net To: References: <3B2D04C5.633E15F8@piuha.net> Subject: Re: Networking symbol exports Date: Wed, 20 Jun 2001 17:09:36 +0530 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2014.211 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2014.211 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 755 Lines: 24 > I have a question. Is it possible to add a few symbols to netsyms.c for > export? I had the same problem sometimes back. I needed some routines which weren't exposed by the kernel to be used by modules. I also dunno what is the policy of adding symbols to netsyms.c....maybe the maintainers can explain... > #include > > EXPORT_SYMBOL(ip_forward_options); > EXPORT_SYMBOL(ip6_route_input); > EXPORT_SYMBOL(ip6_route_output); > EXPORT_SYMBOL(ipv6_parse_exthdrs); > EXPORT_SYMBOL(ipv6_statistics); > > I see that in IPv4 side at least ip_route_input and ip_route_output are > exported. Could it also be possible for IPv6? Strange! i also needed the same IPv6 route calculation functions to be exported for my module! regards, Imran From owner-netdev@oss.sgi.com Wed Jun 20 05:52:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KCqPG18334 for netdev-outgoing; Wed, 20 Jun 2001 05:52:25 -0700 Received: from jamlikhet.polytechnique.org (jamlikhet.polytechnique.org [129.104.37.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KCqNV18330 for ; Wed, 20 Jun 2001 05:52:23 -0700 Received: from alibaba.le-loarer.org (ppp178-net1-idf7-bas1.isdnet.net [195.154.54.178]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by jamlikhet.polytechnique.org (Postfix) with ESMTP id EF76FF8C4 for ; Wed, 20 Jun 2001 14:52:08 +0200 (CEST) Received: by alibaba.le-loarer.org (Postfix, from userid 500) id C6D56D23F; Wed, 20 Jun 2001 14:52:13 +0200 (CEST) Date: Wed, 20 Jun 2001 14:52:13 +0200 From: Loic Le Loarer To: netdev@oss.sgi.com Subject: problem with ipv6 tunneling Message-ID: <20010620145213.A6185@alibaba.le-loarer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.3.15i X-Uptime: alibaba.le-loarer.org up for 2:46pm up 45 min Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2664 Lines: 61 Hello, I have a problem when setting up and ipv6 tunnel over ipv4. I am connected to the Internet with ADSL (pppoed) : [root@alibaba ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:10:5A:2D:D9:53 inet6 addr: fe80::210:5aff:fe2d:d953/10 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1854 errors:0 dropped:0 overruns:0 frame:0 TX packets:2007 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:1460764 (1.3 Mb) TX bytes:333600 (325.7 Kb) Interrupt:10 Base address:0xe400 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1830 errors:0 dropped:0 overruns:0 frame:0 TX packets:1830 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:160533 (156.7 Kb) TX bytes:160533 (156.7 Kb) ppp0 Link encap:Point-to-Point Protocol inet addr:195.154.54.178 P-t-P:195.154.54.129 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1492 Metric:1 RX packets:1781 errors:0 dropped:0 overruns:0 frame:0 TX packets:1924 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:3 RX bytes:682362 (666.3 Kb) TX bytes:286164 (279.4 Kb) The ifconfig sit0 up command works well and adds this to ifconfig result : sit0 Link encap:IPv6-in-IPv4 inet6 addr: ::195.154.54.178/96 Scope:Compat inet6 addr: ::127.0.0.1/96 Scope:Unknown UP RUNNING NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) but when I try to configure this tunnel with I get : [root@alibaba ~]# ifconfig sit0 tunnel ::206.123.31.102 SIOGIFINDEX: Bad file descriptor [root@alibaba ~]# ifconfig eth0 add 3ffe:b00:c18:1fff:0:0:0:6df SIOGIFINDEX: Bad file descriptor [root@alibaba ~]# ifconfig sit1 up sit1: unknown interface: No such device Can you help me, please ? Do you have and idea of the meaning of such and error ? -- Loïc mel : loic.le-loarer@polytechnique.org toile : http://www.le-loarer.org/ "heaven is not a place, it's a feeling" "Il y a 3 catégories de mathématiciens. Ceux qui savent compter, et ceux qui ne savent pas." "Le mois de l'année où le polytechnicien dit le moins de conneries, c'est le mois de février, parce qu'il n'y a que vingt-huit jours". (Coluche) From owner-netdev@oss.sgi.com Wed Jun 20 05:54:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KCsM018587 for netdev-outgoing; Wed, 20 Jun 2001 05:54:22 -0700 Received: from jamlikhet.polytechnique.org (jamlikhet.polytechnique.org [129.104.37.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KCsLV18584 for ; Wed, 20 Jun 2001 05:54:21 -0700 Received: from alibaba.le-loarer.org (ppp178-net1-idf7-bas1.isdnet.net [195.154.54.178]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by jamlikhet.polytechnique.org (Postfix) with ESMTP id C8398F8C5 for ; Wed, 20 Jun 2001 14:54:05 +0200 (CEST) Received: by alibaba.le-loarer.org (Postfix, from userid 500) id A8C21D23F; Wed, 20 Jun 2001 14:54:10 +0200 (CEST) Date: Wed, 20 Jun 2001 14:54:10 +0200 From: Loic Le Loarer To: netdev@oss.sgi.com Subject: problem with ipv6 tunneling Message-ID: <20010620145410.B6185@alibaba.le-loarer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.3.15i X-Uptime: alibaba.le-loarer.org up for 2:46pm up 45 min Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2664 Lines: 61 Hello, I have a problem when setting up and ipv6 tunnel over ipv4. I am connected to the Internet with ADSL (pppoed) : [root@alibaba ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:10:5A:2D:D9:53 inet6 addr: fe80::210:5aff:fe2d:d953/10 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1854 errors:0 dropped:0 overruns:0 frame:0 TX packets:2007 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:1460764 (1.3 Mb) TX bytes:333600 (325.7 Kb) Interrupt:10 Base address:0xe400 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1830 errors:0 dropped:0 overruns:0 frame:0 TX packets:1830 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:160533 (156.7 Kb) TX bytes:160533 (156.7 Kb) ppp0 Link encap:Point-to-Point Protocol inet addr:195.154.54.178 P-t-P:195.154.54.129 Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1492 Metric:1 RX packets:1781 errors:0 dropped:0 overruns:0 frame:0 TX packets:1924 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:3 RX bytes:682362 (666.3 Kb) TX bytes:286164 (279.4 Kb) The ifconfig sit0 up command works well and adds this to ifconfig result : sit0 Link encap:IPv6-in-IPv4 inet6 addr: ::195.154.54.178/96 Scope:Compat inet6 addr: ::127.0.0.1/96 Scope:Unknown UP RUNNING NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) but when I try to configure this tunnel with I get : [root@alibaba ~]# ifconfig sit0 tunnel ::206.123.31.102 SIOGIFINDEX: Bad file descriptor [root@alibaba ~]# ifconfig eth0 add 3ffe:b00:c18:1fff:0:0:0:6df SIOGIFINDEX: Bad file descriptor [root@alibaba ~]# ifconfig sit1 up sit1: unknown interface: No such device Can you help me, please ? Do you have and idea of the meaning of such and error ? -- Loïc mel : loic.le-loarer@polytechnique.org toile : http://www.le-loarer.org/ "heaven is not a place, it's a feeling" "Il y a 3 catégories de mathématiciens. Ceux qui savent compter, et ceux qui ne savent pas." "Le mois de l'année où le polytechnicien dit le moins de conneries, c'est le mois de février, parce qu'il n'y a que vingt-huit jours". (Coluche) From owner-netdev@oss.sgi.com Wed Jun 20 06:19:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KDJAD22104 for netdev-outgoing; Wed, 20 Jun 2001 06:19:10 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KDJ8V22098 for ; Wed, 20 Jun 2001 06:19:09 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f5KDItW26555; Wed, 20 Jun 2001 16:18:55 +0300 Date: Wed, 20 Jun 2001 16:18:54 +0300 (EEST) From: Pekka Savola To: Loic Le Loarer cc: Subject: Re: problem with ipv6 tunneling In-Reply-To: <20010620145410.B6185@alibaba.le-loarer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1147 Lines: 28 On Wed, 20 Jun 2001, Loic Le Loarer wrote: > The ifconfig sit0 up command works well and adds this to ifconfig result : > sit0 Link encap:IPv6-in-IPv4 > inet6 addr: ::195.154.54.178/96 Scope:Compat > inet6 addr: ::127.0.0.1/96 Scope:Unknown > UP RUNNING NOARP MTU:1480 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > but when I try to configure this tunnel with I get : > [root@alibaba ~]# ifconfig sit0 tunnel ::206.123.31.102 > SIOGIFINDEX: Bad file descriptor These commands work just fine on my 2.4 kernel (RHL71 system). Which kernel are you using? You could also try to use the equivalent command with /sbin/ip (it's a lot more reliable anyway), like: /sbin/ip tunnel add sit2 mode sit ttl 64 remote 206.123.31.102 -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Jun 20 07:39:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KEdwr00752 for netdev-outgoing; Wed, 20 Jun 2001 07:39:58 -0700 Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.140.186]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KEdwV00749 for ; Wed, 20 Jun 2001 07:39:58 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id XAA27709 for ; Wed, 20 Jun 2001 23:41:11 +0900 To: netdev@oss.sgi.com Subject: ARPHRD_TUNNEL6? X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010620234111J.yoshfuji@wide.ad.jp> Date: Wed, 20 Jun 2001 23:41:11 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 170 Lines: 7 What is the purpose of ARPHRD_TUNNEL6, defined in ? Header says, it is for IPIP6 tunnel. What do you mean by IPIP6? IP over IPv6 tunnel? --yoshfuji From owner-netdev@oss.sgi.com Wed Jun 20 08:19:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KFJMl06146 for netdev-outgoing; Wed, 20 Jun 2001 08:19:22 -0700 Received: from kepler.agaran.6bone.pl (postfix@[213.25.169.206]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KFJJV06141 for ; Wed, 20 Jun 2001 08:19:19 -0700 Received: by kepler.agaran.6bone.pl (Postfix+IPv6, from userid 500) id 05C75C148; Wed, 20 Jun 2001 17:18:41 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by kepler.agaran.6bone.pl (Postfix+IPv6) with ESMTP id 31E6DC0A7 for ; Wed, 20 Jun 2001 17:18:40 +0200 (CEST) Date: Wed, 20 Jun 2001 17:18:39 +0200 (CEST) From: "Maciej 'Agaran' Pijanka" To: NetDevel List Subject: Linux without IP stack Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 392 Lines: 15 Hello is possible to disable ip stack and have only v6 stack working? i looked a bit in sources but first sit.o should be separed from ipv6.o (when ipv6 is in module) best regards agaran -- Maciej 'Agaran' Pijanka MAP2-6BONE i386, Linux 2.2, Pine, Mutt, Slrn, Vi(m), IPv6, Gdb, I do not fear computers. I fear the lack of them. -- Isaac Asimov From owner-netdev@oss.sgi.com Wed Jun 20 09:19:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KGJg409016 for netdev-outgoing; Wed, 20 Jun 2001 09:19:42 -0700 Received: from jamlikhet.polytechnique.org (jamlikhet.polytechnique.org [129.104.37.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KGJeV09013 for ; Wed, 20 Jun 2001 09:19:41 -0700 Received: from alibaba.le-loarer.org (ppp178-net1-idf7-bas1.isdnet.net [195.154.54.178]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (No client certificate requested) by jamlikhet.polytechnique.org (Postfix) with ESMTP id B01BBF8AA for ; Wed, 20 Jun 2001 18:19:23 +0200 (CEST) Received: by alibaba.le-loarer.org (Postfix, from userid 500) id 52826D23F; Wed, 20 Jun 2001 18:19:28 +0200 (CEST) Date: Wed, 20 Jun 2001 18:19:28 +0200 From: Loic Le Loarer To: netdev@oss.sgi.com Subject: Re: problem with ipv6 tunneling Message-ID: <20010620181928.E6185@alibaba.le-loarer.org> References: <20010620145410.B6185@alibaba.le-loarer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.3.15i In-Reply-To: ; from pekkas@netcore.fi on Wed, Jun 20, 2001 at 04:18:54PM +0300 X-Uptime: alibaba.le-loarer.org up for 6:14pm up 4:12 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1794 Lines: 41 Le mercredi 20 juin 2001 à 16:18:54 +0300, Pekka Savola a écrit: > On Wed, 20 Jun 2001, Loic Le Loarer wrote: > > The ifconfig sit0 up command works well and adds this to ifconfig result : > > sit0 Link encap:IPv6-in-IPv4 > > inet6 addr: ::195.154.54.178/96 Scope:Compat > > inet6 addr: ::127.0.0.1/96 Scope:Unknown > > UP RUNNING NOARP MTU:1480 Metric:1 > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:0 > > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > > > but when I try to configure this tunnel with I get : > > [root@alibaba ~]# ifconfig sit0 tunnel ::206.123.31.102 > > SIOGIFINDEX: Bad file descriptor > > These commands work just fine on my 2.4 kernel (RHL71 system). Which > kernel are you using? I am using Mandrake 8.0 with and upgraded kernel 2.4.5. > You could also try to use the equivalent command with /sbin/ip (it's a lot > more reliable anyway), like: > /sbin/ip tunnel add sit2 mode sit ttl 64 remote 206.123.31.102 Well, this command works and remplace ifconfig sit1 tunnel ::206.123.31.102 command but it is not enough, and I use a command : ip addr add "my_ipv6_address" dev sit1 instead of ifconfig sit1 add "my_ipv6_address" So I do not know why but ifconfig is bogus on Mandrake 8.0 ! Thank you very much for the help ! it works now. -- Loïc mel : loic.le-loarer@polytechnique.org toile : http://www.le-loarer.org/ "heaven is not a place, it's a feeling" "Il y a 3 catégories de mathématiciens. Ceux qui savent compter, et ceux qui ne savent pas." "Le mois de l'année où le polytechnicien dit le moins de conneries, c'est le mois de février, parce qu'il n'y a que vingt-huit jours". (Coluche) From owner-netdev@oss.sgi.com Wed Jun 20 13:29:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KKTkG16256 for netdev-outgoing; Wed, 20 Jun 2001 13:29:46 -0700 Received: from circuit.moureaux.com (IDENT:root@m203-1-p27.warwick.net [208.242.203.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KKTiV16250 for ; Wed, 20 Jun 2001 13:29:44 -0700 Received: from localhost (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.11.2/8.11.2) with ESMTP id f5KKP5H08017; Wed, 20 Jun 2001 16:25:05 -0400 Date: Wed, 20 Jun 2001 16:25:05 -0400 (EDT) From: Statux X-X-Sender: To: "Maciej 'Agaran' Pijanka" cc: NetDevel List Subject: Re: Linux without IP stack In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 109 Lines: 4 > is possible to disable ip stack and have only v6 stack working? isn't that a contradictory statement? :) From owner-netdev@oss.sgi.com Wed Jun 20 13:32:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KKWBw16542 for netdev-outgoing; Wed, 20 Jun 2001 13:32:11 -0700 Received: from kepler.agaran.6bone.pl (postfix@[213.25.169.206]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KKW8V16536 for ; Wed, 20 Jun 2001 13:32:10 -0700 Received: by kepler.agaran.6bone.pl (Postfix+IPv6, from userid 500) id 6C707C148; Wed, 20 Jun 2001 22:31:54 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by kepler.agaran.6bone.pl (Postfix+IPv6) with ESMTP id 4B124C0A7; Wed, 20 Jun 2001 22:31:53 +0200 (CEST) Date: Wed, 20 Jun 2001 22:31:52 +0200 (CEST) From: "Maciej 'Agaran' Pijanka" To: Statux Cc: NetDevel List Subject: Re: Linux without IP stack In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 523 Lines: 18 On Wed, 20 Jun 2001, Statux wrote: > > is possible to disable ip stack and have only v6 stack working? > > isn't that a contradictory statement? :) sorry.. poor english here i mean about kernel without IPv4 Stack and only IPv6 stack inside (no tunneling, no autotunnels and so on) maybe now i said that more clearly -- Maciej 'Agaran' Pijanka MAP2-6BONE i386, Linux 2.2, Pine, Mutt, Slrn, Vi(m), IPv6, Gdb, I do not fear computers. I fear the lack of them. -- Isaac Asimov From owner-netdev@oss.sgi.com Wed Jun 20 13:37:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KKbti17196 for netdev-outgoing; Wed, 20 Jun 2001 13:37:55 -0700 Received: from zmailer.org (mail.zmailer.org [194.252.70.162]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KKbsV17193 for ; Wed, 20 Jun 2001 13:37:54 -0700 Received: (mea@zmailer.org) by mail.zmailer.org id ; Wed, 20 Jun 2001 23:37:41 +0300 Date: Wed, 20 Jun 2001 23:37:41 +0300 From: Matti Aarnio To: "Maciej 'Agaran' Pijanka" Cc: NetDevel List Subject: Re: Linux without IP stack Message-ID: <20010620233741.S5947@mea-ext.zmailer.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ; from agaran@agaran.6bone.pl on Wed, Jun 20, 2001 at 10:31:52PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 482 Lines: 16 On Wed, Jun 20, 2001 at 10:31:52PM +0200, Maciej 'Agaran' Pijanka wrote: > sorry.. poor english here > i mean about kernel without IPv4 Stack and only IPv6 stack inside (no > tunneling, no autotunnels and so on) Presently no. When 2.5 opens, perhaps I will redo the impossible, and split IPv4 into generic UDP/TCP, and IPv4 stuff. That definitely can be done, Solaris8 does it, for example. > -- > Maciej 'Agaran' Pijanka MAP2-6BONE /Matti Aarnio From owner-netdev@oss.sgi.com Wed Jun 20 13:45:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KKjMe17965 for netdev-outgoing; Wed, 20 Jun 2001 13:45:22 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KKjKV17959 for ; Wed, 20 Jun 2001 13:45:21 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f5KKfuY28798; Wed, 20 Jun 2001 23:41:56 +0300 Date: Wed, 20 Jun 2001 23:41:56 +0300 (EEST) From: Pekka Savola To: "Maciej 'Agaran' Pijanka" cc: NetDevel List Subject: Re: Linux without IP stack In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1253 Lines: 34 On Wed, 20 Jun 2001, Maciej 'Agaran' Pijanka wrote: > is possible to disable ip stack and have only v6 stack working? > i looked a bit in sources but first sit.o should be separed from ipv6.o > (when ipv6 is in module) Impossible at this point. A lot of code is shared in .. less than obvious ways (e.g. tcp, ip toggles in proc, tcp and udp behaviour in general, etc.). It might be interesting if someone came up with a list of dependencies (at least at some point in the future), but few people want to use _only_ ipv6 yet ;-) [From the other mail]: > sorry.. poor english here >i mean about kernel without IPv4 Stack and only IPv6 stack inside (no >tunneling, no autotunnels and so on) You didn't mean this, but triggered an important point: Now, being able to disable transitionary mechanisms would be a nice thing, something that people might want to actually use now, as there are some inherent security problems with trans. methods that cannot be avoided. Btw, I'm working on enhancing 6to4 security but it's progressing slowly.. :-) -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Jun 20 13:47:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KKlwq18300 for netdev-outgoing; Wed, 20 Jun 2001 13:47:58 -0700 Received: from kepler.agaran.6bone.pl (postfix@[213.25.169.206]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KKlMV18183 for ; Wed, 20 Jun 2001 13:47:23 -0700 Received: by kepler.agaran.6bone.pl (Postfix+IPv6, from userid 500) id CA8BBC148; Wed, 20 Jun 2001 22:47:17 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by kepler.agaran.6bone.pl (Postfix+IPv6) with ESMTP id E1ACCC0A7; Wed, 20 Jun 2001 22:47:16 +0200 (CEST) Date: Wed, 20 Jun 2001 22:47:14 +0200 (CEST) From: "Maciej 'Agaran' Pijanka" To: Matti Aarnio Cc: NetDevel List Subject: Re: Linux without IP stack In-Reply-To: <20010620233741.S5947@mea-ext.zmailer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 800 Lines: 29 On Wed, 20 Jun 2001, Matti Aarnio wrote: > On Wed, Jun 20, 2001 at 10:31:52PM +0200, Maciej 'Agaran' Pijanka wrote: > > sorry.. poor english here > > i mean about kernel without IPv4 Stack and only IPv6 stack inside (no > > tunneling, no autotunnels and so on) > > Presently no. > > When 2.5 opens, perhaps I will redo the impossible, > and split IPv4 into generic UDP/TCP, and IPv4 stuff. > > That definitely can be done, Solaris8 does it, for example. i think one of *bsd splits that too > > > -- > > Maciej 'Agaran' Pijanka MAP2-6BONE > > /Matti Aarnio > -- Maciej 'Agaran' Pijanka MAP2-6BONE i386, Linux 2.2, Pine, Mutt, Slrn, Vi(m), IPv6, Gdb, I do not fear computers. I fear the lack of them. -- Isaac Asimov From owner-netdev@oss.sgi.com Wed Jun 20 13:50:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5KKouT18672 for netdev-outgoing; Wed, 20 Jun 2001 13:50:56 -0700 Received: from kepler.agaran.6bone.pl (postfix@[213.25.169.206]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5KKoUV18630 for ; Wed, 20 Jun 2001 13:50:33 -0700 Received: by kepler.agaran.6bone.pl (Postfix+IPv6, from userid 500) id 1BF74C148; Wed, 20 Jun 2001 22:50:19 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by kepler.agaran.6bone.pl (Postfix+IPv6) with ESMTP id 26F81C0A7; Wed, 20 Jun 2001 22:50:18 +0200 (CEST) Date: Wed, 20 Jun 2001 22:50:18 +0200 (CEST) From: "Maciej 'Agaran' Pijanka" To: Pekka Savola Cc: NetDevel List Subject: Re: Linux without IP stack In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1083 Lines: 29 On Wed, 20 Jun 2001, Pekka Savola wrote: > On Wed, 20 Jun 2001, Maciej 'Agaran' Pijanka wrote: > > is possible to disable ip stack and have only v6 stack working? > > i looked a bit in sources but first sit.o should be separed from ipv6.o > > (when ipv6 is in module) > > Impossible at this point. A lot of code is shared in .. less than obvious > ways (e.g. tcp, ip toggles in proc, tcp and udp behaviour in general, > etc.). > but splitting ipv6 into ipv6 and sit is possible now ? (ex i dont want to have sit at all..) > It might be interesting if someone came up with a list of dependencies (at > least at some point in the future), but few people want to use _only_ ipv6 > yet ;-) khmm first time i tried to have box without ip configured at all.. it worked.. now i think about disabling tunelling and trying to remove v4 now for test..maybe some day we need that.. -- Maciej 'Agaran' Pijanka MAP2-6BONE i386, Linux 2.2, Pine, Mutt, Slrn, Vi(m), IPv6, Gdb, I do not fear computers. I fear the lack of them. -- Isaac Asimov From owner-netdev@oss.sgi.com Wed Jun 20 19:55:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5L2tPC24376 for netdev-outgoing; Wed, 20 Jun 2001 19:55:25 -0700 Received: from cr416993-a.ym1.on.wave.home.com (cr416993-a.ym1.on.wave.home.com [24.112.193.232]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5L2tLV24373 for ; Wed, 20 Jun 2001 19:55:21 -0700 Received: from redshift.mimosa.com (IDENT:root@redshift.mimosa.com [192.139.70.107]) by cr416993-a.ym1.on.wave.home.com (8.9.3/8.9.3) with ESMTP id WAA07027; Wed, 20 Jun 2001 22:57:48 -0400 Received: from localhost (hugh@localhost) by redshift.mimosa.com (8.11.0/8.11.0) with ESMTP id f5L2vxE28799; Wed, 20 Jun 2001 22:57:59 -0400 X-Authentication-Warning: redshift.mimosa.com: hugh owned process doing -bs Date: Wed, 20 Jun 2001 22:57:59 -0400 (EDT) From: "D. Hugh Redelmeier" Reply-To: To: cc: Marco Berizzi Subject: select says I can read, but recvfrom hangs Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3563 Lines: 103 I'm the maintainer of Pluto, the IKE daemon for the LINUX FreeS/WAN project. Some of our users have experienced situations where the Pluto process becomes unresponsive because it is waiting in a recvfrom. The thing that puzzles me is that recvfrom will not be executed unless select has indicated that there is something to read on that socket. I have no idea how that could happen. Do any of you have any ideas about what could be happening? Details: - A couple of people have noticed it happening, but not often. It may be happening without being noticed, but not at great frequency. - one user has been able to reproduce it fairly consistently. I've mutated the code to try to narrow down what is going on. Right now, there are three selects that say a message is ready, but the recvfrom still hangs. - this user's system is Slackware 7.1, with a kernel.org 2.2.19, patched by FreeS/WAN 1.91. Richard Briggs, our kernel guy doesn't see a way that FreeS/WAN affects the input path for messages that are UDP (i.e. not ESP and not AH) - the socket in question is bound to UDP, Port 500 with the IP address of the public interface. The RFCs dictate this. Socket options: SO_REUSEADDR and IP_RECVERR. Hmm, I wonder if IP_RECVERR could be the problem -- I have evidence that not many folks have used it. - I would not ask you to read the whole of Pluto to help me. But if you wish to, it can be found through www.freeswan.org. Here is the recvfrom that is hanging, and the preceding just-to-be-safe select: { fd_set nreadfds; int nndes; struct timeval tm; tm.tv_sec = 0; /* don't wait at all */ tm.tv_usec = 0; FD_ZERO(&nreadfds); FD_SET(ifp->fd, &nreadfds); do { nndes = select(ifp->fd + 1, &nreadfds, NULL, NULL, &tm); } while (nndes == -1 && errno == EINTR); if (nndes < 0) { log_errno((e, "re-select() failed in comm_handle")); return; } if (nndes == 0) { log("SURPRISE: re-select() in comm_handle finds %s no longer ready for input" , ifp->rname); return; } passert(nndes == 1 && FD_ISSET(ifp->fd, &nreadfds)); } passert(select_found == ifp->fd); zero(&from.sa); packet_len = recvfrom(ifp->fd, bigbuffer, sizeof(bigbuffer), 0 , &from.sa, &from_len); passert(select_found == ifp->fd); /* true paranoia */ select_found = NULL_FD; - the only signal handlers simply set a sigatomic_t variable and return (SIGHUP, SIGTERM). They are not firing. - The file descriptor in question is not shared with another process. Locking prevents two copies of Pluto from running at once. - the scenario that provokes the problem for the user goes as follows: + Pluto is running on a security gateway, with a Windows NT box behind it + he connects a second windows box, running PGPnet (an IPSEC implementation), through the internet, to the public interface of the security gateway. This box negotiates a tunnel with the security gateway. + he disconnects the second windows box, and reconnects the same way but with a different IP address (the IP address is dynamically assigned whenever he connects this box to the internet). + the second box starts and completes IKE negotiation. + Pluto is tricked into hanging on a recvfrom. Is there any way to tell from the system whether the select is wrong (i.e. there is no message) or the recvfrom is wrong (i.e. there is a message, but it still hangs reading it)? Thanks, Hugh Redelmeier hugh@mimosa.com voice: +1 416 482-8253 From owner-netdev@oss.sgi.com Wed Jun 20 20:14:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5L3E0F24630 for netdev-outgoing; Wed, 20 Jun 2001 20:14:00 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5L3DwV24627 for ; Wed, 20 Jun 2001 20:13:58 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f5L3DPa24307; Wed, 20 Jun 2001 20:13:25 -0700 Message-ID: <3B316655.75927BF8@candelatech.com> Date: Wed, 20 Jun 2001 20:13:25 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: hugh@mimosa.com CC: netdev@oss.sgi.com, Marco Berizzi Subject: Re: select says I can read, but recvfrom hangs References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4491 Lines: 115 "D. Hugh Redelmeier" wrote: > > I'm the maintainer of Pluto, the IKE daemon for the LINUX FreeS/WAN > project. > > Some of our users have experienced situations where the Pluto process > becomes unresponsive because it is waiting in a recvfrom. The thing > that puzzles me is that recvfrom will not be executed unless select > has indicated that there is something to read on that socket. I have > no idea how that could happen. > > Do any of you have any ideas about what could be happening? > > Details: > > - A couple of people have noticed it happening, but not often. It may > be happening without being noticed, but not at great frequency. > > - one user has been able to reproduce it fairly consistently. I've > mutated the code to try to narrow down what is going on. Right > now, there are three selects that say a message is ready, but the > recvfrom still hangs. > > - this user's system is Slackware 7.1, with a kernel.org 2.2.19, > patched by FreeS/WAN 1.91. Richard Briggs, our kernel guy doesn't > see a way that FreeS/WAN affects the input path for messages that > are UDP (i.e. not ESP and not AH) > > - the socket in question is bound to UDP, Port 500 with the IP address > of the public interface. The RFCs dictate this. Socket options: > SO_REUSEADDR and IP_RECVERR. Hmm, I wonder if IP_RECVERR could be > the problem -- I have evidence that not many folks have used it. > > - I would not ask you to read the whole of Pluto to help me. But if > you wish to, it can be found through www.freeswan.org. Here is the > recvfrom that is hanging, and the preceding just-to-be-safe select: > > { > fd_set nreadfds; > int nndes; > struct timeval tm; > > tm.tv_sec = 0; /* don't wait at all */ > tm.tv_usec = 0; > > FD_ZERO(&nreadfds); > FD_SET(ifp->fd, &nreadfds); > do { > nndes = select(ifp->fd + 1, &nreadfds, NULL, NULL, &tm); > } while (nndes == -1 && errno == EINTR); > if (nndes < 0) > { > log_errno((e, "re-select() failed in comm_handle")); > return; > } > if (nndes == 0) > { > log("SURPRISE: re-select() in comm_handle finds %s no longer ready for input" > , ifp->rname); > return; > } > passert(nndes == 1 && FD_ISSET(ifp->fd, &nreadfds)); > } > > passert(select_found == ifp->fd); > zero(&from.sa); > packet_len = recvfrom(ifp->fd, bigbuffer, sizeof(bigbuffer), 0 > , &from.sa, &from_len); > passert(select_found == ifp->fd); /* true paranoia */ > select_found = NULL_FD; > > - the only signal handlers simply set a sigatomic_t variable and > return (SIGHUP, SIGTERM). They are not firing. > > - The file descriptor in question is not shared with another process. > Locking prevents two copies of Pluto from running at once. > > - the scenario that provokes the problem for the user goes as follows: > > + Pluto is running on a security gateway, with a Windows NT box > behind it > > + he connects a second windows box, running PGPnet (an IPSEC > implementation), through the internet, to the public interface > of the security gateway. This box negotiates a tunnel with > the security gateway. > > + he disconnects the second windows box, and reconnects the same way > but with a different IP address (the IP address is dynamically > assigned whenever he connects this box to the internet). > > + the second box starts and completes IKE negotiation. > > + Pluto is tricked into hanging on a recvfrom. > > Is there any way to tell from the system whether the select is wrong > (i.e. there is no message) or the recvfrom is wrong (i.e. there is a > message, but it still hangs reading it)? Make your socket O_NONBLOCKing, and you don't have to worry about that kind of thing (just be sure you handle all the error cases, ie read no data) correctly. I always just consider select() a hint, not the Truth :) > > Thanks, > > Hugh Redelmeier > hugh@mimosa.com voice: +1 416 482-8253 -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Jun 21 05:27:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LCRGT05548 for netdev-outgoing; Thu, 21 Jun 2001 05:27:16 -0700 Received: from pusa.informat.uv.es (pusa.informat.uv.es [147.156.24.61]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LCRCV05545 for ; Thu, 21 Jun 2001 05:27:14 -0700 Received: from ulisses by pusa.informat.uv.es with local (Exim 3.12 #1 (Debian)) id 15D3YA-0003S1-00; Thu, 21 Jun 2001 14:27:06 +0200 Date: Thu, 21 Jun 2001 14:27:06 +0200 To: L:, netdev@oss.sgi.com Subject: user's wishlist on ipv6 module Message-ID: <20010621142706.A13233@pusa.informat.uv.es> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i From: uaca@alumni.uv.es Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 893 Lines: 31 Hi all first of all thank all of you for your contribution on GNU/Linux Few days ago I wanted to add a host to the 6bone without having to reboot, there were no problem, just compile the module, load, and configure... the usual most times... I found that if you don't select ipv6 as a module when you compile the kernel, the kernel will not export the needed symbols for ipv6... IMHO, it would be cool to have always exported that symbols if that not causes a trouble. Thanks for reading this best wishes Ulisses PD: I'm not subscrived on this list Debian GNU/Linux: a dream come true ----------------------------------------------------------------------------- "Computers are useless. They can only give answers." Pablo Picasso ---> Visita http://www.valux.org/ para saber acerca de la <--- ---> Asociación Valenciana de Usuarios de Linux <--- From owner-netdev@oss.sgi.com Thu Jun 21 05:31:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LCVNq05721 for netdev-outgoing; Thu, 21 Jun 2001 05:31:23 -0700 Received: from rainbow.cs.unipi.gr (IDENT:root@[195.251.230.99]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LCVLV05718 for ; Thu, 21 Jun 2001 05:31:22 -0700 Received: from rainbow.cs.unipi.gr (ppp1140.ath.forthnet.gr [194.219.27.96]) by rainbow.cs.unipi.gr (8.9.3/8.9.3) with ESMTP id OAA19029 for ; Thu, 21 Jun 2001 14:35:02 +0300 X-Mozilla-Status: 0801 Message-ID: <3B3202A5.1090302@rainbow.cs.unipi.gr> Date: Thu, 21 Jun 2001 14:20:21 +0000 From: Harry Kalogirou User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.4 i686; en-US; rv:0.9.1) Gecko/20010607 X-Accept-Language: el, en-us MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: [Fwd: Re: TCP/IP] Strange double transmit of frames.. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2585 Lines: 88 -------- Original Message -------- Subject: Re: TCP/IP Date: Thu, 21 Jun 2001 13:16:15 +0000 From: Harry Kalogirou To: Alan Cox CC: Linux 8086 References: Alan Cox wrote: > > No it isnt. > > That would be odd, and something I think everyone else would have noticed > I do this : /home/harkal# slattach -p slip -L -m -s 9600 /dev/ttyS1 & [1] 386 /home/harkal# ifconfig sl0 192.168.1.1 pointopoint elks and now ifconfig gives : /home/harkal# ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:14 errors:0 dropped:0 overruns:0 frame:0 TX packets:14 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 sl0 Link encap:Serial Line IP inet addr:192.168.1.1 P-t-P:192.168.1.100 mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:296 Metric:1 RX packets:14 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:10 route gives: /home/harkal# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface elks * 255.255.255.255 UH 0 0 0 sl0 Then I boot elks and I ping it. Just "ping elks". tcpdump records : /home/harkal# tcpdump -i sl0 tcpdump: listening on sl0 13:02:55.137218 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:55.252131 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:55.352130 192.168.1.100 > 192.168.1.1: icmp: echo reply (DF) 13:02:56.132513 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:56.252131 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:56.342129 192.168.1.100 > 192.168.1.1: icmp: echo reply (DF) 13:02:57.132471 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:57.252130 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:57.342130 192.168.1.100 > 192.168.1.1: icmp: echo reply (DF) 13:02:58.133342 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:58.252130 192.168.1.1 > 192.168.1.100: icmp: echo request (DF) 13:02:58.342130 192.168.1.100 > 192.168.1.1: icmp: echo reply (DF) ?!?!??!!? kernel version is 2.4.4 ... What can be wrong? -------- End of Original Message -------- I was sugested to post this here... I would like to mark that the ping process only sended 4 icmp echos ... -- ___ ___ ________ (*| | | (*| | / (_HarKal_) |___|--|arry |___| \alogiroy \/\/ web : http://rainbow.cs.unipi.gr/~harkal From owner-netdev@oss.sgi.com Thu Jun 21 05:49:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LCn7p06398 for netdev-outgoing; Thu, 21 Jun 2001 05:49:07 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LCn5V06393 for ; Thu, 21 Jun 2001 05:49:06 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f5LCmpj11086; Thu, 21 Jun 2001 15:48:52 +0300 Date: Thu, 21 Jun 2001 15:48:51 +0300 (EEST) From: Pekka Savola To: cc: Subject: Re: user's wishlist on ipv6 module In-Reply-To: <20010621142706.A13233@pusa.informat.uv.es> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 873 Lines: 22 On Thu, 21 Jun 2001 uaca@alumni.uv.es wrote: > Few days ago I wanted to add a host to the 6bone without having to reboot, > there were no problem, just compile the module, load, and configure... the > usual most times... > > I found that if you don't select ipv6 as a module when you compile the > kernel, the kernel will not export the needed symbols for ipv6... > > IMHO, it would be cool to have always exported that symbols if that not > causes a trouble. No, that's not sane. What would happen if this was done "as a backup" on all modules? If you didn't prepare for ipv6 when building the kernel in the first place, you have to reboot. That can't be avoided. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Thu Jun 21 05:52:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LCqZD06578 for netdev-outgoing; Thu, 21 Jun 2001 05:52:35 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LCqYV06575 for ; Thu, 21 Jun 2001 05:52:34 -0700 Received: by colin.muc.de id <140587-3>; Thu, 21 Jun 2001 14:52:51 +0200 Message-ID: <20010621145248.33287@colin.muc.de> Date: Thu, 21 Jun 2001 14:52:48 +0200 From: Andi Kleen To: Pekka Savola Cc: uaca@alumni.uv.es, netdev@oss.sgi.com Subject: Re: user's wishlist on ipv6 module References: <20010621142706.A13233@pusa.informat.uv.es> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Pekka Savola on Thu, Jun 21, 2001 at 02:48:51PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 864 Lines: 21 On Thu, Jun 21, 2001 at 02:48:51PM +0200, Pekka Savola wrote: > On Thu, 21 Jun 2001 uaca@alumni.uv.es wrote: > > Few days ago I wanted to add a host to the 6bone without having to reboot, > > there were no problem, just compile the module, load, and configure... the > > usual most times... > > > > I found that if you don't select ipv6 as a module when you compile the > > kernel, the kernel will not export the needed symbols for ipv6... > > > > IMHO, it would be cool to have always exported that symbols if that not > > causes a trouble. > > No, that's not sane. What would happen if this was done "as a backup" on > all modules? Most modules use standardized APIs that are always available. I agree with the original poster that CONFIG_IPV6_MODULE should not be needed. The code it covers is not very big anyways, so it could be just made default. -Andi From owner-netdev@oss.sgi.com Thu Jun 21 06:02:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LD2PN06839 for netdev-outgoing; Thu, 21 Jun 2001 06:02:25 -0700 Received: from metastasis.f00f.org (f00f.stub.clear.net.nz [203.167.224.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LD2EV06836 for ; Thu, 21 Jun 2001 06:02:25 -0700 Received: by metastasis.f00f.org (Postfix, from userid 1000) id 323419DC3; Fri, 22 Jun 2001 01:02:03 +1200 (NZST) Date: Fri, 22 Jun 2001 01:02:03 +1200 From: Chris Wedgwood To: Matti Aarnio Cc: "Maciej 'Agaran' Pijanka" , NetDevel List Subject: Re: Linux without IP stack Message-ID: <20010622010203.A1231@metastasis.f00f.org> References: <20010620233741.S5947@mea-ext.zmailer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010620233741.S5947@mea-ext.zmailer.org>; from matti.aarnio@zmailer.org on Wed, Jun 20, 2001 at 11:37:41PM +0300 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 442 Lines: 15 On Wed, Jun 20, 2001 at 11:37:41PM +0300, Matti Aarnio wrote: When 2.5 opens, perhaps I will redo the impossible, and split IPv4 into generic UDP/TCP, and IPv4 stuff. That definitely can be done, Solaris8 does it, for example. Its been done already, in 2.1.x time I made ipv4 a module and since then someone else also made udp/tcp modules too (icmp seems to hard to break away from ipv4 for obvious reasons). --cw From owner-netdev@oss.sgi.com Thu Jun 21 06:09:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LD9D507018 for netdev-outgoing; Thu, 21 Jun 2001 06:09:13 -0700 Received: from metastasis.f00f.org (f00f.stub.clear.net.nz [203.167.224.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LD96V07015 for ; Thu, 21 Jun 2001 06:09:11 -0700 Received: by metastasis.f00f.org (Postfix, from userid 1000) id 6C0519DC3; Fri, 22 Jun 2001 01:08:55 +1200 (NZST) Date: Fri, 22 Jun 2001 01:08:55 +1200 From: Chris Wedgwood To: Andi Kleen Cc: Pekka Savola , uaca@alumni.uv.es, netdev@oss.sgi.com Subject: Re: user's wishlist on ipv6 module Message-ID: <20010622010855.B1231@metastasis.f00f.org> References: <20010621142706.A13233@pusa.informat.uv.es> <20010621145248.33287@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010621145248.33287@colin.muc.de>; from ak@muc.de on Thu, Jun 21, 2001 at 02:52:48PM +0200 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1735 Lines: 69 On Thu, Jun 21, 2001 at 02:52:48PM +0200, Andi Kleen wrote: Most modules use standardized APIs that are always available. I agree with the original poster that CONFIG_IPV6_MODULE should not be needed. The code it covers is not very big anyways, so it could be just made default. Are changes other than netsyms.c required? I'm not sure I would suggest this right now, we may aswell rationalize netsyms.c during 2.5.x (which Alan may be thinking of starting soon). --cw --- current/net/netsyms.c~ Sun Jun 17 23:18:59 2001 +++ current/net/netsyms.c Fri Jun 22 01:05:45 2001 @@ -55,7 +55,6 @@ extern struct net_proto_family inet_family_ops; -#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) || defined (CONFIG_KHTTPD) || defined (CONFIG_KHTTPD_MODULE) #include #include #include @@ -65,7 +64,6 @@ extern int sysctl_local_port_range[2]; extern int tcp_port_rover; extern int udp_port_rover; -#endif #endif @@ -277,7 +275,7 @@ EXPORT_SYMBOL(ipv6_addr_type); EXPORT_SYMBOL(icmpv6_send); #endif -#if defined (CONFIG_IPV6_MODULE) || defined (CONFIG_KHTTPD) || defined (CONFIG_KHTTPD_MODULE) + /* inet functions common to v4 and v6 */ EXPORT_SYMBOL(inet_stream_ops); EXPORT_SYMBOL(inet_release); @@ -385,12 +383,8 @@ EXPORT_SYMBOL(sysctl_max_syn_backlog); #endif -#if defined (CONFIG_IPV6_MODULE) EXPORT_SYMBOL(secure_tcpv6_sequence_number); EXPORT_SYMBOL(secure_ipv6_id); -#endif - -#endif #ifdef CONFIG_NETLINK EXPORT_SYMBOL(netlink_set_err); @@ -425,7 +419,7 @@ EXPORT_SYMBOL(rtnl_lock); EXPORT_SYMBOL(rtnl_unlock); - + /* Used by at least ipip.c. */ EXPORT_SYMBOL(ipv4_config); EXPORT_SYMBOL(dev_open); From owner-netdev@oss.sgi.com Thu Jun 21 08:29:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LFT9310284 for netdev-outgoing; Thu, 21 Jun 2001 08:29:09 -0700 Received: from cr416993-a.ym1.on.wave.home.com (cr416993-a.ym1.on.wave.home.com [24.112.193.232]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LFT8V10281 for ; Thu, 21 Jun 2001 08:29:09 -0700 Received: from redshift.mimosa.com (IDENT:root@redshift.mimosa.com [192.139.70.107]) by cr416993-a.ym1.on.wave.home.com (8.9.3/8.9.3) with ESMTP id LAA08571; Thu, 21 Jun 2001 11:31:39 -0400 Received: from localhost (hugh@localhost) by redshift.mimosa.com (8.11.0/8.11.0) with ESMTP id f5LFVng30622; Thu, 21 Jun 2001 11:31:50 -0400 X-Authentication-Warning: redshift.mimosa.com: hugh owned process doing -bs Date: Thu, 21 Jun 2001 11:31:49 -0400 (EDT) From: "D. Hugh Redelmeier" Reply-To: To: Ben Greear cc: Subject: Re: select says I can read, but recvfrom hangs In-Reply-To: <3B316655.75927BF8@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 541 Lines: 17 On Wed, 20 Jun 2001, Ben Greear wrote: | From: Ben Greear | Make your socket O_NONBLOCKing, and you don't have to worry about that | kind of thing (just be sure you handle all the error cases, ie read | no data) correctly. | | I always just consider select() a hint, not the Truth :) Of course that is a way to work arround a bug (if that is what I'm seeing), but it should not be necessary. The specs for select say nothing about it being just a hint. Hugh Redelmeier hugh@mimosa.com voice: +1 416 482-8253 From owner-netdev@oss.sgi.com Thu Jun 21 09:26:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LGQXk12799 for netdev-outgoing; Thu, 21 Jun 2001 09:26:33 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LGQVV12793 for ; Thu, 21 Jun 2001 09:26:31 -0700 Received: by colin.muc.de id <140578-3>; Thu, 21 Jun 2001 18:26:52 +0200 Message-ID: <20010621182650.50332@colin.muc.de> Date: Thu, 21 Jun 2001 18:26:50 +0200 From: Andi Kleen To: hugh@mimosa.com Cc: Ben Greear , netdev@oss.sgi.com Subject: Re: select says I can read, but recvfrom hangs References: <3B316655.75927BF8@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from D. Hugh Redelmeier on Thu, Jun 21, 2001 at 05:31:49PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 814 Lines: 24 On Thu, Jun 21, 2001 at 05:31:49PM +0200, D. Hugh Redelmeier wrote: > On Wed, 20 Jun 2001, Ben Greear wrote: > > | From: Ben Greear > > | Make your socket O_NONBLOCKing, and you don't have to worry about that > | kind of thing (just be sure you handle all the error cases, ie read > | no data) correctly. > | > | I always just consider select() a hint, not the Truth :) > > Of course that is a way to work arround a bug (if that is what I'm > seeing), but it should not be necessary. The specs for select say > nothing about it being just a hint. It is a hint when multiple processes access the same socket. When that's not the case it would be a kernel bug. Because no such bugs are known (and such things tend to get noticed) I would suspect the freeswan kernel patches. -Andi From owner-netdev@oss.sgi.com Thu Jun 21 10:52:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LHqMP15313 for netdev-outgoing; Thu, 21 Jun 2001 10:52:22 -0700 Received: from smtp2.libero.it (smtp2.libero.it [193.70.192.52]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LHqJV15310 for ; Thu, 21 Jun 2001 10:52:20 -0700 Received: from trantor.ferrara.linux.it (151.26.142.218) by smtp2.libero.it (5.5.025) id 3AE981AF00C7B004 for netdev@oss.sgi.com; Thu, 21 Jun 2001 19:52:12 +0200 Received: by trantor.ferrara.linux.it (Postfix, from userid 500) id 4973E28A41; Thu, 21 Jun 2001 11:18:23 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by trantor.ferrara.linux.it (Postfix) with ESMTP id 4054E28A35 for ; Thu, 21 Jun 2001 11:18:23 +0200 (CEST) Date: Thu, 21 Jun 2001 11:18:23 +0200 (CEST) From: Mauro Tortonesi To: NetDevel List Subject: Re: Linux without IP stack In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1059 Lines: 28 On Wed, 20 Jun 2001, Pekka Savola wrote: > On Wed, 20 Jun 2001, Maciej 'Agaran' Pijanka wrote: > > is possible to disable ip stack and have only v6 stack working? > > i looked a bit in sources but first sit.o should be separed from ipv6.o > > (when ipv6 is in module) > > Impossible at this point. A lot of code is shared in .. less than obvious > ways (e.g. tcp, ip toggles in proc, tcp and udp behaviour in general, > etc.). i think that the hybrid ipv4-ipv6 stack architecture is the best choice at this moment. maybe when we'll have a 99% ipv6 internet it will make any sense to have a real dual stack architecture (so that you can disable ipv4), but in this moment i think that splitting common ipv4-ipv6 code in the linux kernel would be a mistake. > Now, being able to disable transitionary mechanisms would be a nice thing, yes, really. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi mauro@ferrara.linux.it Ferrara Linux User Group http://www.ferrara.linux.it Project6 - IPv6 for Linux http://project6.ferrara.linux.it From owner-netdev@oss.sgi.com Thu Jun 21 12:39:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LJdl818589 for netdev-outgoing; Thu, 21 Jun 2001 12:39:47 -0700 Received: from cr416993-a.ym1.on.wave.home.com (cr416993-a.ym1.on.wave.home.com [24.112.193.232]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LJdjV18586 for ; Thu, 21 Jun 2001 12:39:45 -0700 Received: from redshift.mimosa.com (IDENT:root@redshift.mimosa.com [192.139.70.107]) by cr416993-a.ym1.on.wave.home.com (8.9.3/8.9.3) with ESMTP id PAA09374; Thu, 21 Jun 2001 15:42:14 -0400 Received: from localhost (hugh@localhost) by redshift.mimosa.com (8.11.0/8.11.0) with ESMTP id f5LJgOV31486; Thu, 21 Jun 2001 15:42:24 -0400 X-Authentication-Warning: redshift.mimosa.com: hugh owned process doing -bs Date: Thu, 21 Jun 2001 15:42:24 -0400 (EDT) From: "D. Hugh Redelmeier" Reply-To: To: Andi Kleen cc: Subject: Re: select says I can read, but recvfrom hangs In-Reply-To: <20010621182650.50332@colin.muc.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3434 Lines: 100 | From: Andi Kleen | It is a hint when multiple processes access the same socket. Yes. In this case there is only one process. | When that's | not the case it would be a kernel bug. Because no such bugs are known | (and such things tend to get noticed) I would suspect the freeswan kernel | patches. Fair enough. That must remain as a possibility, but I don't think that it is the case. Another experiment has shed some more light. I had the user eliminate the Pluto code to exploit IP_RECVERR / MSG_ERRQUEUE. No more hangs. This means that the Pluto support for IP_RECVERR / MSG_ERRQUEUE provokes the problem. Either the Pluto code is wrong or there is a bug in the feature. From past experience, it is most likely that I've misunderstood how this is to be done. - the select says that there is something to be read from the file descriptor (socket). Remember that the socket has the option IP_RECVERR set: setsockopt(fd, SOL_IP, IP_RECVERR, (const void *)&on, sizeof(on)) - the recvfrom that hangs looks like: packet_len = recvfrom(ifp->fd, bigbuffer, sizeof(bigbuffer), 0 , &from.sa, &from_len); Note the "0" for flags. - if the packet_len is set to -1, an attempt is made to read from the MSG_ERRQUEUE: /* we are going to be daring: we'll try to use information * passed on because of IP_RECVERR. * The API is sparsely documented, and may be LINUX-only. * * - ip(7) describes IP_RECVERR * - recvmsg(2) describes MSG_ERRQUEUE * - readv(2) describes iovec * - cmsg(3) describes how to process auxilliary messages * * ??? we should link this message with one we've sent * so that the diagnostic can refer to that negotiation. */ #if defined(IP_RECVERR) && defined(MSG_ERRQUEUE) { struct msghdr emh; struct iovec eiov; union { /* force alignment (not documented as necessary) */ struct cmsghdr ecms; /* how much space is enough? */ unsigned char space[256]; } ecms_buf; struct cmsghdr *cm; char fromstr[INET6_ADDRSTRLEN + sizeof(" port 65536")]; zero(&from.sa); from_len = sizeof(from); emh.msg_name = &from.sa; /* ??? filled in? */ emh.msg_namelen = sizeof(from); emh.msg_iov = &eiov; emh.msg_iovlen = 1; emh.msg_control = &ecms_buf; emh.msg_controllen = sizeof(ecms_buf); emh.msg_flags = 0; eiov.iov_base = bigbuffer; /* see readv(2) */ eiov.iov_len = sizeof(bigbuffer); packet_len = recvmsg(ifp->fd, &emh, MSG_ERRQUEUE); ... This code assumes that for every error return from the recvfrom either there is a MSG_ERRQUEUE message, or at least it is safe to try to read one -- it won't block. As far as I can tell, this has never blocked. But the number of errors isn't high, so testing hasn't been intense. Is it the case that a queued MSG_ERRQUEUE message will cause select to say that there is something to read? I'd expect so. This code assumes that if there is a queued MSG_ERRQUEUE message, an attempt to recvfrom with flags = 0 will not hang, but instead produce an error return. Is this wrong? If it is wrong, it contradicts my understanding of an answer that Andi gave me last fall. This could explain the hang that we are observing. I don't see where this last question is clearly answered in the documentation. Can you see anything else suspicious in this code? Hugh Redelmeier hugh@mimosa.com voice: +1 416 482-8253 From owner-netdev@oss.sgi.com Thu Jun 21 15:29:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5LMTSW22802 for netdev-outgoing; Thu, 21 Jun 2001 15:29:28 -0700 Received: from lox.sandelman.ottawa.on.ca (IDENT:root@lox.sandelman.ottawa.on.ca [209.151.24.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5LMTPV22799 for ; Thu, 21 Jun 2001 15:29:25 -0700 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id SAA08563; Thu, 21 Jun 2001 18:29:13 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca ([2002:8e9a:24dc::20]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f5LMXV006942 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK); Thu, 21 Jun 2001 18:33:42 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca (localhost [[UNIX: localhost]]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f5LDHpd00621; Thu, 21 Jun 2001 09:17:52 -0400 (EDT) Message-Id: <200106211317.f5LDHpd00621@marajade.sandelman.ottawa.on.ca> To: hugh@mimosa.com, Marco Berizzi cc: netdev@oss.sgi.com Subject: Re: select says I can read, but recvfrom hangs In-reply-to: Your message of "Wed, 20 Jun 2001 22:57:59 EDT." Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Thu, 21 Jun 2001 09:17:50 -0400 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2945 Lines: 71 -----BEGIN PGP SIGNED MESSAGE----- >>>>> "D" == D Hugh Redelmeier writes: D> - the scenario that provokes the problem for the user goes as follows: D> + Pluto is running on a security gateway, with a Windows NT box D> behind it This should be immaterial. D> + he connects a second windows box, running PGPnet (an IPSEC D> implementation), through the internet, to the public interface D> of the security gateway. This box negotiates a tunnel with D> the security gateway. okay. D> + he disconnects the second windows box, and reconnects the same way D> but with a different IP address (the IP address is dynamically D> assigned whenever he connects this box to the internet). D> + the second box starts and completes IKE negotiation. okay, and completes. Was there actually data transmitted from the second box? I.e. were all messages sent by the PGPnet actually received by Pluto? D> + Pluto is tricked into hanging on a recvfrom. D> Is there any way to tell from the system whether the select is wrong D> (i.e. there is no message) or the recvfrom is wrong (i.e. there is a D> message, but it still hangs reading it)? marajade-[~] mcr 1001 %netstat -f inet -n Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 127.0.0.1.53 *.* LISTEN udp 0 0 127.0.0.1.500 *.* udp 0 0 172.16.212.1.500 *.* udp 0 0 127.0.0.1.65522 127.0.0.1.65523 udp 0 0 127.0.0.1.65523 127.0.0.1.65522 udp 0 0 127.0.0.1.53 *.* (-f is BSD-ism. I think that there is -u for UDP on Linux, but plain netstat would work as well.) That should tell you if the system thinks that there is in fact any data queued on the socket. Perhaps throw in a: system("netstat") in after the second select to help debug. If not, then this sounds like a race condition in the kernel with delivery of multiple data available messages. I admit to have not read those pieces of 2.2. ] Internet Security. Have encryption, will travel |1 Fish/2 Fish [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |Red F./Blow F [ ] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |strong crypto [ ] for everyone [ -----BEGIN PGP SIGNATURE----- Version: 2.6.3ia Charset: latin1 Comment: Processed by Mailcrypt 3.5.5, an Emacs/PGP interface iQCVAwUBOzHz+oqHRg3pndX9AQFf9QP/RvIhI/CK6sC2Tt/txIH6p2QtYe9VZYs1 SBgmHoKPxAu1bPKGQiTzTD94KwdheT1lnDjFzvO3pgWKvSjqa7ypsYM/lZSeYU9f mmVj8QFQk6MTJWCeE8iihTx+rqlyS1vuEs0moIQlqtzt0N3EnSNBzF7INmb1S1zS htBr0ta3n20= =cvSw -----END PGP SIGNATURE----- From owner-netdev@oss.sgi.com Thu Jun 21 22:00:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5M50n027494 for netdev-outgoing; Thu, 21 Jun 2001 22:00:49 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5M50lV27491 for ; Thu, 21 Jun 2001 22:00:48 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id CE2CF1F6D for ; Fri, 22 Jun 2001 01:00:45 -0400 (EDT) Message-ID: <3B32D0FA.9451D3AA@mandrakesoft.com> Date: Fri, 22 Jun 2001 01:00:42 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: net driver directory moves for 2.5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 745 Lines: 24 I've mentioned this before, but since its impending, I wanted to bring it up again: I'd like to shuffle some drivers into sub-directories, and shrink drivers/net a bit. This is NOT a wholesale reorganization; simply an opportunity to dump similar code into subdirectories. Here are my suggested moves: lance-like drivers into drivers/net/lance 8390 into drivers/net/8390 tulip-like (winbond, dmfe, xircom) into drivers/net/tulip ISA-only drivers into drivers/net/isa 3com drivers into drivers/net/3com I can definitely be convinced to reduce this list, but I would prefer not to diversify too much, and add to this list too much (if at all). Jeff -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Thu Jun 21 22:10:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5M5Ade27649 for netdev-outgoing; Thu, 21 Jun 2001 22:10:39 -0700 Received: from metastasis.f00f.org (f00f.stub.clear.net.nz [203.167.224.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5M5AcV27646 for ; Thu, 21 Jun 2001 22:10:38 -0700 Received: by metastasis.f00f.org (Postfix, from userid 1000) id 179429DC3; Fri, 22 Jun 2001 17:10:37 +1200 (NZST) Date: Fri, 22 Jun 2001 17:10:37 +1200 From: Chris Wedgwood To: Jeff Garzik Cc: Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" Subject: Re: PATCH: ethtool MII helpers Message-ID: <20010622171037.D2576@metastasis.f00f.org> References: <3B23AFC3.71CE2FD2@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3B23AFC3.71CE2FD2@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Sun, Jun 10, 2001 at 01:34:59PM -0400 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 635 Lines: 19 On Sun, Jun 10, 2001 at 01:34:59PM -0400, Jeff Garzik wrote: Initial draft of a helper which uses generic elements present in several net drivers to implement ethtool ioctl support in a minimum amount of code. I have included a sample implementation in the epic100 driver, to illustrate how these helpers may be used. This should make it easier to implement support across 10/100 hardware which uses primarily an MII phy. Comments appreciated. Can someone explain to me why we have ethtool and mii-tool? Can we not extend ethtool for the mii-tool stuff, even if only at userland? --cw From owner-netdev@oss.sgi.com Thu Jun 21 22:24:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5M5OiP27864 for netdev-outgoing; Thu, 21 Jun 2001 22:24:44 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5M5OfV27861 for ; Thu, 21 Jun 2001 22:24:41 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id E42CB1F6D; Fri, 22 Jun 2001 01:24:39 -0400 (EDT) Message-ID: <3B32D694.CACF46D0@mandrakesoft.com> Date: Fri, 22 Jun 2001 01:24:36 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Chris Wedgwood Cc: Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" Subject: Re: PATCH: ethtool MII helpers References: <3B23AFC3.71CE2FD2@mandrakesoft.com> <20010622171037.D2576@metastasis.f00f.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 367 Lines: 13 Chris Wedgwood wrote: > Can we > not extend ethtool for the mii-tool stuff, even if only at userland? Sure, and that's planned. Wanna send me a patch for it? :) It will definitely fall back on the MII ioctls if ethtool media support for the desired command doesn't exist. -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Thu Jun 21 22:35:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5M5Z1o28078 for netdev-outgoing; Thu, 21 Jun 2001 22:35:01 -0700 Received: from metastasis.f00f.org (f00f.stub.clear.net.nz [203.167.224.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5M5Z0V28075 for ; Thu, 21 Jun 2001 22:35:00 -0700 Received: by metastasis.f00f.org (Postfix, from userid 1000) id 9FB3F9DC3; Fri, 22 Jun 2001 17:34:59 +1200 (NZST) Date: Fri, 22 Jun 2001 17:34:59 +1200 From: Chris Wedgwood To: Jeff Garzik Cc: Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" Subject: Re: PATCH: ethtool MII helpers Message-ID: <20010622173459.D2642@metastasis.f00f.org> References: <3B23AFC3.71CE2FD2@mandrakesoft.com> <20010622171037.D2576@metastasis.f00f.org> <3B32D694.CACF46D0@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3B32D694.CACF46D0@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Fri, Jun 22, 2001 at 01:24:36AM -0400 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 562 Lines: 17 On Fri, Jun 22, 2001 at 01:24:36AM -0400, Jeff Garzik wrote: Sure, and that's planned. Wanna send me a patch for it? :) Possibly, but I wonder if this is a kernel-space problem or not. Why not put all the smarts into userland for it? It will definitely fall back on the MII ioctls if ethtool media support for the desired command doesn't exist. Well, that is more or less as much as needs to be done. That, and some kind of super-set API to be defined for all new stuff, having two slightly different APIs for the same things sucks. --cw From owner-netdev@oss.sgi.com Thu Jun 21 22:58:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5M5wjG28349 for netdev-outgoing; Thu, 21 Jun 2001 22:58:45 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5M5whV28346 for ; Thu, 21 Jun 2001 22:58:43 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 28EB41F74; Fri, 22 Jun 2001 01:58:41 -0400 (EDT) Message-ID: <3B32DE8D.72C1CFF9@mandrakesoft.com> Date: Fri, 22 Jun 2001 01:58:37 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Chris Wedgwood Cc: Linux Kernel Mailing List , netdev@oss.sgi.com, "David S. Miller" Subject: Re: PATCH: ethtool MII helpers References: <3B23AFC3.71CE2FD2@mandrakesoft.com> <20010622171037.D2576@metastasis.f00f.org> <3B32D694.CACF46D0@mandrakesoft.com> <20010622173459.D2642@metastasis.f00f.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1793 Lines: 48 Chris Wedgwood wrote: > > On Fri, Jun 22, 2001 at 01:24:36AM -0400, Jeff Garzik wrote: > > Sure, and that's planned. Wanna send me a patch for it? :) > > Possibly, but I wonder if this is a kernel-space problem or not. Why > not put all the smarts into userland for it? I meant, send me a patch for userland ethtool, to do exactly what you described. > It will definitely fall back on the MII ioctls if ethtool media > support for the desired command doesn't exist. > > Well, that is more or less as much as needs to be done. That, and > some kind of super-set API to be defined for all new stuff, having > two slightly different APIs for the same things sucks. Both APIs do different things but have a common subset, yes. The MII ioctls only do their thing for MII-like hardware. ethtool can be applied to any hardware. Old ISA drivers that don't do MII, or do it in a really nonstandard way. For example I have ethtool code locally which allows ne2k-pci to do media selection via ioctl, for two popular ne2k cards, something its never been able to do before. Emulating media selection support for things like 10base2<->10baseT<->AUI just isn't possible with the MII ioctls. MII is a standard and incredibly popular, thus mii-tool works most popular PCI NICs, for the most popular media types. But it's still basically a hardware interface. I am not convinced its a good idea for make the [G]MII ioctls the Linux software media interface for all network hardware. I see ethtool as the interface for tuning your NIC, that works across all hardware. I see mii-diag as the way to do advance MII-specific hardware stuff, like next page or HA monitoring or whatever. Jeff -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Thu Jun 21 23:01:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5M61BX28415 for netdev-outgoing; Thu, 21 Jun 2001 23:01:11 -0700 Received: from deliverator.sgi.com (deliverator.sgi.com [204.94.214.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5M61AV28412 for ; Thu, 21 Jun 2001 23:01:10 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id XAA17852 for ; Thu, 21 Jun 2001 23:01:04 -0700 (PDT) mail_from (kaos@ocs.com.au) Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA08261; Fri, 22 Jun 2001 15:59:03 +1000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Chris Wedgwood cc: netdev@oss.sgi.com Subject: Re: user's wishlist on ipv6 module In-reply-to: Your message of "Fri, 22 Jun 2001 01:08:55 +1200." <20010622010855.B1231@metastasis.f00f.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 22 Jun 2001 15:59:03 +1000 Message-ID: <22658.993189543@kao2.melbourne.sgi.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 553 Lines: 12 On Fri, 22 Jun 2001 01:08:55 +1200, Chris Wedgwood wrote: >I'm not sure I would suggest this right now, we may aswell >rationalize netsyms.c during 2.5.x (which Alan may be thinking of >starting soon). In 2.5 you can expect that all the xxxsyms.c files will disappear. They are a hangover from the original 2.0 method of exporting symbols and, to a lesser extent, because of the extra makefile work required for exported symbols. Once the makefiles are cleaned up in 2.5, I want symbols to be exported in the sources that define them. From owner-netdev@oss.sgi.com Fri Jun 22 03:58:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5MAwUB00791 for netdev-outgoing; Fri, 22 Jun 2001 03:58:30 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5MAwSV00788 for ; Fri, 22 Jun 2001 03:58:29 -0700 Received: (qmail 12024 invoked by uid 99); 22 Jun 2001 10:58:24 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 22 Jun 2001 10:58:24 -0000 Received: by fred.muc.de (Postfix, from userid 500) id F3CBCE2D4F; Fri, 22 Jun 2001 13:09:29 +0200 (CEST) Date: Fri, 22 Jun 2001 13:09:29 +0200 From: Andi Kleen To: "D. Hugh Redelmeier" Cc: Andi Kleen , netdev@oss.sgi.com Subject: Re: select says I can read, but recvfrom hangs Message-ID: <20010622130929.A7135@fred.local> References: <20010621182650.50332@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from hugh@mimosa.com on Thu, Jun 21, 2001 at 09:42:24PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1355 Lines: 35 On Thu, Jun 21, 2001 at 09:42:24PM +0200, D. Hugh Redelmeier wrote: > This code assumes that for every error return from the recvfrom either > there is a MSG_ERRQUEUE message, or at least it is safe to try to read > one -- it won't block. As far as I can tell, this has never blocked. > But the number of errors isn't high, so testing hasn't been intense. It should not block. > Is it the case that a queued MSG_ERRQUEUE message will cause select to > say that there is something to read? I'd expect so. Yes. select meshes error into read/write, while poll will also give you more accurate events; separated for error and read. > > This code assumes that if there is a queued MSG_ERRQUEUE message, an > attempt to recvfrom with flags = 0 will not hang, but instead produce > an error return. Is this wrong? If it is wrong, it contradicts my > understanding of an answer that Andi gave me last fall. This could > explain the hang that we are observing. It is right. As long as there is a errqueue message the pending error of the socket is regenerated; and should be returned in recvmsg or reported by select. In this case it looks indeed like a subtle kernel bug; although I cannot see it in 2.2.19 on quick source review. Is the affected machine a SMP box? -Andi -- Life would be so much easier if we could just look at the source code. From owner-netdev@oss.sgi.com Fri Jun 22 06:29:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5MDTR102989 for netdev-outgoing; Fri, 22 Jun 2001 06:29:27 -0700 Received: from smtp1.cern.ch (smtp1.cern.ch [137.138.128.38]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5MDTPV02986 for ; Fri, 22 Jun 2001 06:29:26 -0700 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id PAA24706; Fri, 22 Jun 2001 15:29:18 +0200 (MET DST) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id PAA31958; Fri, 22 Jun 2001 15:29:12 +0200 To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: Re: net driver directory moves for 2.5 References: <3B32D0FA.9451D3AA@mandrakesoft.com> From: Jes Sorensen Date: 22 Jun 2001 15:29:12 +0200 In-Reply-To: Jeff Garzik's message of "Fri, 22 Jun 2001 01:00:42 -0400" Message-ID: User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 811 Lines: 20 >>>>> "Jeff" == Jeff Garzik writes: Jeff> I've mentioned this before, but since its impending, I wanted to Jeff> bring it up again: I'd like to shuffle some drivers into Jeff> sub-directories, and shrink drivers/net a bit. Jeff> This is NOT a wholesale reorganization; simply an opportunity to Jeff> dump similar code into subdirectories. Here are my suggested Jeff> moves: Jeff> lance-like drivers into drivers/net/lance 8390 into Jeff> drivers/net/8390 tulip-like (winbond, dmfe, xircom) into Jeff> drivers/net/tulip ISA-only drivers into drivers/net/isa 3com Jeff> drivers into drivers/net/3com Where do Lance ISA drivers go? It seems like a somewhat random choice of groups to me, ie. 8390, tulip & Lance directories kinda match but ISA and 3Com are orthogonal to this. Jes From owner-netdev@oss.sgi.com Fri Jun 22 08:39:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5MFd3S07074 for netdev-outgoing; Fri, 22 Jun 2001 08:39:03 -0700 Received: from holly.csn.ul.ie (holly.csn.ul.ie [136.201.105.4]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5MFd2V07071 for ; Fri, 22 Jun 2001 08:39:02 -0700 Received: from skynet.csn.ul.ie (skynet [136.201.105.2]) by holly.csn.ul.ie (Postfix) with ESMTP id CB79D2B331; Fri, 22 Jun 2001 16:38:55 +0100 (IST) Received: by skynet.csn.ul.ie (Postfix, from userid 2139) id D6782A8A5; Fri, 22 Jun 2001 16:38:49 +0100 (IST) Received: from localhost (localhost [127.0.0.1]) by skynet.csn.ul.ie (Postfix) with ESMTP id D495AA8A4; Fri, 22 Jun 2001 16:38:49 +0100 (IST) Date: Fri, 22 Jun 2001 16:38:49 +0100 (IST) From: Dave Airlie X-X-Sender: To: Jes Sorensen Cc: Jeff Garzik , Subject: Re: net driver directory moves for 2.5 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1145 Lines: 38 what about the 3c503... that's a 3COM, 8390, ISA network card :-) he's the bext example I can think of ... Dave. On 22 Jun 2001, Jes Sorensen wrote: > >>>>> "Jeff" == Jeff Garzik writes: > > Jeff> I've mentioned this before, but since its impending, I wanted to > Jeff> bring it up again: I'd like to shuffle some drivers into > Jeff> sub-directories, and shrink drivers/net a bit. > > Jeff> This is NOT a wholesale reorganization; simply an opportunity to > Jeff> dump similar code into subdirectories. Here are my suggested > Jeff> moves: > > Jeff> lance-like drivers into drivers/net/lance 8390 into > Jeff> drivers/net/8390 tulip-like (winbond, dmfe, xircom) into > Jeff> drivers/net/tulip ISA-only drivers into drivers/net/isa 3com > Jeff> drivers into drivers/net/3com > > Where do Lance ISA drivers go? It seems like a somewhat random choice > of groups to me, ie. 8390, tulip & Lance directories kinda match but > ISA and 3Com are orthogonal to this. > > Jes > -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied@skynet.ie pam_smb / Linux DecStation / Linux VAX / ILUG person From owner-netdev@oss.sgi.com Fri Jun 22 08:48:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5MFmXP07496 for netdev-outgoing; Fri, 22 Jun 2001 08:48:33 -0700 Received: from cr416993-a.ym1.on.wave.home.com (cr416993-a.ym1.on.wave.home.com [24.112.193.232]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5MFmVV07493 for ; Fri, 22 Jun 2001 08:48:31 -0700 Received: from redshift.mimosa.com (IDENT:root@redshift.mimosa.com [192.139.70.107]) by cr416993-a.ym1.on.wave.home.com (8.9.3/8.9.3) with ESMTP id LAA12220; Fri, 22 Jun 2001 11:50:57 -0400 Received: from localhost (hugh@localhost) by redshift.mimosa.com (8.11.0/8.11.0) with ESMTP id f5MFp6D02195; Fri, 22 Jun 2001 11:51:06 -0400 X-Authentication-Warning: redshift.mimosa.com: hugh owned process doing -bs Date: Fri, 22 Jun 2001 11:51:05 -0400 (EDT) From: "D. Hugh Redelmeier" Reply-To: To: Andi Kleen cc: , Marco Berizzi Subject: Re: select says I can read, but recvfrom hangs In-Reply-To: <20010622130929.A7135@fred.local> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 774 Lines: 24 | From: Andi Kleen | Is the affected machine a SMP box? Marco, who is the person who can reproduce this problem, is going on vacation. I don't know if he'll be able to answer for 10 days :-( Our error reporting machinery showed the following. + cat /proc/version Linux version 2.2.19 (root@darkstar) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Wed Jun 20 11:20:37 CEST 2001 If this were a vendor's pre-compiled kernel, this would tell us, but I don't think that it helps with a kernel.org 2.2.19. Our exploration of this bug has been hampered by our being six timezones apart -- we only manage one mail exchange a day. He is in your timezone. Thanks for looking at this, Hugh Redelmeier hugh@mimosa.com voice: +1 416 482-8253 From owner-netdev@oss.sgi.com Fri Jun 22 08:55:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5MFtPd07834 for netdev-outgoing; Fri, 22 Jun 2001 08:55:25 -0700 Received: from metastasis.f00f.org (f00f.stub.clear.net.nz [203.167.224.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5MFtOV07831 for ; Fri, 22 Jun 2001 08:55:24 -0700 Received: by metastasis.f00f.org (Postfix, from userid 1000) id AF9A69F2F; Sat, 23 Jun 2001 03:55:22 +1200 (NZST) Date: Sat, 23 Jun 2001 03:55:22 +1200 From: Chris Wedgwood To: Dave Airlie Cc: Jes Sorensen , Jeff Garzik , netdev@oss.sgi.com Subject: Re: net driver directory moves for 2.5 Message-ID: <20010623035522.C3732@metastasis.f00f.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from airlied@csn.ul.ie on Fri, Jun 22, 2001 at 04:38:49PM +0100 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 513 Lines: 19 On Fri, Jun 22, 2001 at 04:38:49PM +0100, Dave Airlie wrote: what about the 3c503... that's a 3COM, 8390, ISA network card :-) he's the bext example I can think of ... drivers/net//blah for those that apply drivers/net//blah for 3com chips that don't fit sort of thing seems reasonable. Maybe even for some architectures doing arch or sub specific directories where there will be no overlap outside of the driver (e.g. old sun sbus cards?) --cw From owner-netdev@oss.sgi.com Fri Jun 22 11:02:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5MI20U10997 for netdev-outgoing; Fri, 22 Jun 2001 11:02:00 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5MI1xV10993 for ; Fri, 22 Jun 2001 11:01:59 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id D2A781F67; Fri, 22 Jun 2001 14:01:57 -0400 (EDT) Message-ID: <3B338810.C39CA0CE@mandrakesoft.com> Date: Fri, 22 Jun 2001 14:01:52 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jes Sorensen Cc: netdev@oss.sgi.com Subject: Re: net driver directory moves for 2.5 References: <3B32D0FA.9451D3AA@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1330 Lines: 36 Jes Sorensen wrote: > > >>>>> "Jeff" == Jeff Garzik writes: > > Jeff> I've mentioned this before, but since its impending, I wanted to > Jeff> bring it up again: I'd like to shuffle some drivers into > Jeff> sub-directories, and shrink drivers/net a bit. > > Jeff> This is NOT a wholesale reorganization; simply an opportunity to > Jeff> dump similar code into subdirectories. Here are my suggested > Jeff> moves: > > Jeff> lance-like drivers into drivers/net/lance 8390 into > Jeff> drivers/net/8390 tulip-like (winbond, dmfe, xircom) into > Jeff> drivers/net/tulip ISA-only drivers into drivers/net/isa 3com > Jeff> drivers into drivers/net/3com > > Where do Lance ISA drivers go? It seems like a somewhat random choice > of groups to me, ie. 8390, tulip & Lance directories kinda match but > ISA and 3Com are orthogonal to this. It's not a random choice of groups at all, it's where common code will get grouped together the most. Drivers that fit into 8390, tulip, or lance categories will go into those directories. Which would include a lance ISA driver. After that, you still have a ton of ISA and 3com drivers left over which are not often used. Thus, the isa and 3com subdirectories. Jeff -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Fri Jun 22 19:14:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5N2E5Z05283 for netdev-outgoing; Fri, 22 Jun 2001 19:14:05 -0700 Received: from cr416993-a.ym1.on.wave.home.com (cr416993-a.ym1.on.wave.home.com [24.112.193.232]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5N2E4V05278 for ; Fri, 22 Jun 2001 19:14:04 -0700 Received: from redshift.mimosa.com (IDENT:root@redshift.mimosa.com [192.139.70.107]) by cr416993-a.ym1.on.wave.home.com (8.9.3/8.9.3) with ESMTP id WAA13739; Fri, 22 Jun 2001 22:16:28 -0400 Received: from localhost (hugh@localhost) by redshift.mimosa.com (8.11.0/8.11.0) with ESMTP id f5N2Gak03895; Fri, 22 Jun 2001 22:16:36 -0400 X-Authentication-Warning: redshift.mimosa.com: hugh owned process doing -bs Date: Fri, 22 Jun 2001 22:16:36 -0400 (EDT) From: "D. Hugh Redelmeier" Reply-To: To: Andi Kleen cc: , Marco Berizzi Subject: Re: select says I can read, but recvfrom hangs In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 6252 Lines: 140 | | From: Andi Kleen | | | Is the affected machine a SMP box? Marco says: No, only one processor. P100 with F0 0F bug. Marco also answered another question that I asked: | From: Marco Berizzi | > I have a question: does the hang occur every time you go through the | > procedure you describe (I infer that from what you've said), or just | > too often? | > | | every time I go through the procedure I also think it might be helpful if I showed some of the logging done by Pluto. This was captured by Marco (thanks!): Remember that this machine (darkstar) is talking to a "Road Warrior", a Windows box running PGPnet IPSEC. The windows machine had IP address 151.20.110.219 and had negotiated a tunnel. It was then disconnected and reconnected with IP address 151.25.5.30. Jun 20 17:21:43 darkstar Pluto[121]: | *received 196 bytes from 151.25.5.30:500 on eth0 Jun 20 17:21:43 darkstar Pluto[121]: | state object #3 found, in STATE_MAIN_R1 Jun 20 17:21:43 darkstar Pluto[121]: | sending: message from and response to 151.25.5.30:500. Included for context (timestamp, in particular) Jun 20 17:21:44 darkstar Pluto[121]: | *received 1732 bytes from 151.25.5.30:500 on eth0 Jun 20 17:21:44 darkstar Pluto[121]: | state object #3 found, in STATE_MAIN_R2 Jun 20 17:21:45 darkstar Pluto[121]: "Berizzi" #3: deleting connection "Berizzi" instance with peer 151.20.110.219 Jun 20 17:21:45 darkstar Pluto[121]: "Berizzi" #2: deleting state (STATE_QUICK_R2) Jun 20 17:21:45 darkstar Pluto[121]: | ***emit ISAKMP Delete Payload: Jun 20 17:21:45 darkstar Pluto[121]: | sending: This is not FreeS/WAN code. It must be part of the x.509 patch sends about 68 bytes, probably to 151.20.110.219:500 Jun 20 17:21:45 darkstar Pluto[121]: | ***emit ISAKMP Delete Payload: Jun 20 17:21:45 darkstar Pluto[121]: | sending: This is not FreeS/WAN code. It must be part of the x.509 patch sends about 68 bytes, probably to 151.20.110.219:500 Jun 20 17:21:46 darkstar Pluto[121]: | sending: sends about 1652 bytes to 151.25.5.30:500 Jun 20 17:21:46 darkstar Pluto[121]: ERROR: "Berizzi" #3: sendto() on eth0 to 151.25.5.30:500 failed in STF_REPLY. Errno 113: No route to host I have no idea why a sendto to 151.25.5.30:500 would fail. It is more likely that the sendtos to 151.20.110.219:500 (the former address) would fail. The error report is just based on what Pluto tried to do: it tried to sendto 151.25.5.30:500, and got an EHOSTUNREACH. Perhaps that error return was caused by an earlier sendto. There is no attempt to read MSG_ERRQUE for a write failure. Perhaps there should be. But I think that the next time through the master select call, if there is a message on the MSG_ERRQUEUE, a recvfrom will be triggered, get an error return, and a recvmsg(,, MSG_ERRQUEUE) will then read the error report. Jun 20 17:21:46 darkstar Pluto[121]: | *received 1732 bytes from 151.25.5.30:500 on eth0 Jun 20 17:21:46 darkstar Pluto[121]: | state object #3 found, in STATE_MAIN_R3 Jun 20 17:21:46 darkstar Pluto[121]: "Berizzi" #3: retransmitting in response to duplicate packet; already STATE_MAIN_R3 Interestingly, the next read actually got a retransmitted packet, not an error indication. It isn't clear why the other side retransmitted. It is *possible* that we were hung in a read. But the timestamps don't indicate an excessive delay. Could it be that the ERROR quoted above supressed the message actually being sent? Makes some sense. Jun 20 17:21:46 darkstar Pluto[121]: | sending: sends about 1652 bytes to 151.25.5.30:500 This time we don't get an error, even though the destination is identical?!?! Jun 20 17:21:49 darkstar Pluto[121]: | *received 52 bytes from 151.25.5.30:500 on eth0 Jun 20 17:21:49 darkstar Pluto[121]: | state object #4 found, in STATE_QUICK_R1 Pluto receives the last message of an IKE Quick Mode exchange. This would be the end of negotiation with the Road Warrior (mobile IPSEC host, which in this case is running PGPnet IPSEC software, not ours). Pluto then proceeds to install a new tunnel. Jun 20 17:21:50 darkstar Pluto[121]: "Berizzi" #4: STATE_QUICK_R2: IPsec SA established Jun 20 17:21:50 darkstar Pluto[121]: | next event EVENT_SHUNT_SCAN in 47 seconds Pluto now goes into select, waiting for next event. select tells it that a message is ready. It hangs in the recvfrom. Jun 20 17:24:42 darkstar Pluto[121]: ERROR: recvfrom() on eth0 failed in comm_handle (Pluto cannot decode source sockaddr in rejection: unknown source). Errno 4: Interrupted system call Marco does a SIGHUP to Pluto. Pluto catches the signal, sets a variable, and returns. Pluto is stupid enough (thankfully) to treat the EINTR as any other read error: it goes on to look at the MSG_ERRQUEUE. Notice the almost 3 minute delay between this message and the preceding one. Fascinating, since there was supposed to be a timer event sooner than that. Somewhere within the first 47 seconds of that select, the select said a message was ready (the timer is implemented as part of the select). Jun 20 17:24:42 darkstar Pluto[121]: | rejected packet: Jun 20 17:24:42 darkstar Pluto[121]: | Jun 20 17:24:42 darkstar Pluto[121]: | control: Jun 20 17:24:42 darkstar Pluto[121]: | 2c 00 00 00 00 00 00 00 0b 00 00 00 71 00 00 00 Jun 20 17:24:42 darkstar Pluto[121]: | 02 03 01 00 00 00 00 00 00 00 00 00 02 00 96 c3 Jun 20 17:24:42 darkstar Pluto[121]: | 97 05 b8 0a a0 07 ec c2 c0 00 00 00 Jun 20 17:24:42 darkstar Pluto[121]: | name: Jun 20 17:24:42 darkstar Pluto[121]: | Jun 20 17:24:42 darkstar Pluto[121]: extended network error info for message to unknown: compainant 151.5.184.10, errno 113 No route to host, origin ICMP (not authenticated) 2, type 3, code 1 This is what was found from the MSG_ERRQUEUE. There was no rejected packet, nor name. 97 05 b8 0a: 151.5.184.a0 I haven't looked at what the other parts of the control portion mean. So there is an interpretation of the log (with lots deleted) around the time of the recvfrom hang. Other than KLIPS messages, no kernel messages were logged during this period. I hope that this additional information helps. Hugh Redelmeier hugh@mimosa.com voice: +1 416 482-8253 From owner-netdev@oss.sgi.com Sat Jun 23 13:00:31 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5NK0Vv03675 for netdev-outgoing; Sat, 23 Jun 2001 13:00:31 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5NK0TV03671 for ; Sat, 23 Jun 2001 13:00:29 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f5NK0M213759; Sat, 23 Jun 2001 23:00:22 +0300 Date: Sat, 23 Jun 2001 23:00:21 +0300 (EEST) From: Pekka Savola To: cc: , Subject: patch: ipv6 nexthop can be 6to4 address Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="1589707168-808551844-993326421=:13709" Sender: owner-netdev@oss.sgi.com Precedence: bulk This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --1589707168-808551844-993326421=:13709 Content-Type: TEXT/PLAIN; charset=US-ASCII Hi, There's ongoing debate that ipv6 compatible addresses may get deprecated. Some 6to4 relay routers don't support them anymore. Thus it becomes necessary to allow nexthop for the default route to native ipv6 to be a 6to4 address. The current implementation basically assumes the only v6-over-v4 tunneling method is automatic tunneling with compatible addresses. Attached patch fixes this. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords --1589707168-808551844-993326421=:13709 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="linux-2.4.3-6to4-nexthop.diff" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="linux-2.4.3-6to4-nexthop.diff" VGhlcmUncyBvbmdvaW5nIGRlYmF0ZSB0aGF0IGNvbXBhdGlibGUgYWRkcmVz c2VzIG1heSBiZSBkZXByZWNhdGVkLiAgU29tZQ0KNnRvNCByZWxheSByb3V0 ZXJzIGRvbid0IHN1cHBvcnQgdGhlbSBhbnltb3JlLiAgVGh1cyBpdCBiZWNv bWVzIG5lY2Vzc2FyeSB0bw0KYWxsb3cgbmV4dGhvcCBmb3IgdGhlIG5hdGl2 ZSBpcHY2IGRlZmF1bHQgcm91dGUgdG8gYmUgYSA2dG80IGFkZHJlc3MuICAN Cg0KVGhlIGN1cnJlbnQgaW1wbGVtZW50YXRpb24gYmFzaWNhbGx5IGFzc3Vt ZXMgdGhlIG9ubHkgdjYtb3Zlci12NCB0dW5uZWxpbmcgDQptZXRob2QgaXMg YXV0b21hdGljIHR1bm5lbGluZyB3aXRoIGNvbXBhdGlibGUgYWRkcmVzc2Vz Lg0KDQotLS0gbGludXgtMi40LjMvbmV0L2lwdjYvc2l0LmMJU2F0IEp1biAy MyAxOTo0MDo1NiAyMDAxDQorKysgbGludXgtMi40LjMuZml4L25ldC9pcHY2 L3NpdC5jCVNhdCBKdW4gMjMgMTk6NTQ6MTIgMjAwMQ0KQEAgLTQ5NCwxMCAr NDk0LDE1IEBADQogCQkJYWRkcl90eXBlID0gaXB2Nl9hZGRyX3R5cGUoYWRk cjYpOw0KIAkJfQ0KIA0KLQkJaWYgKChhZGRyX3R5cGUgJiBJUFY2X0FERFJf Q09NUEFUdjQpID09IDApDQotCQkJZ290byB0eF9lcnJvcl9pY21wOw0KKwkJ LyogY2hlY2sgaWYgbmV4dGhvcCBmb3Igbm9uLTZ0bzQgYWRkcmVzcyBpcyA2 dG80IGFkZHJlc3MgKi8NCisJCWRzdCA9IHRyeV82dG80KGFkZHI2KTsNCisJ CQ0KKwkJaWYgKCFkc3QpICB7DQorCQkJaWYgKChhZGRyX3R5cGUgJiBJUFY2 X0FERFJfQ09NUEFUdjQpID09IDApDQorCQkJCWdvdG8gdHhfZXJyb3JfaWNt cDsNCiANCi0JCWRzdCA9IGFkZHI2LT5zNl9hZGRyMzJbM107DQorCQkJZHN0 ID0gYWRkcjYtPnM2X2FkZHIzMlszXTsNCisJCX0NCiAJfQ0KIA0KIAlpZiAo aXBfcm91dGVfb3V0cHV0KCZydCwgZHN0LCB0aXBoLT5zYWRkciwgUlRfVE9T KHRvcyksIHR1bm5lbC0+cGFybXMubGluaykpIHsNCg== --1589707168-808551844-993326421=:13709-- From owner-netdev@oss.sgi.com Sun Jun 24 05:11:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5OCBkw10498 for netdev-outgoing; Sun, 24 Jun 2001 05:11:46 -0700 Received: from crotus.sc.intel.com (scfdns02.sc.intel.com [143.183.152.26]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5OCBdV10478 for ; Sun, 24 Jun 2001 05:11:44 -0700 Received: from hasmsxvs01.iil.intel.com (hasmsxvs01.iil.intel.com [143.185.63.58]) by crotus.sc.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.40 2001/06/06 21:14:49 root Exp $) with SMTP id MAA04447 for ; Sun, 24 Jun 2001 12:11:33 GMT Received: from hasmsx17.iil.intel.com ([143.185.63.203]) by hasmsxvs01.iil.intel.com (NAVIEG 2.1 bld 68) with SMTP id M2001062415112829094 ; Sun, 24 Jun 2001 15:11:28 +0300 Received: by hasmsx17.iil.intel.com with Internet Mail Service (5.5.2653.19) id ; Sun, 24 Jun 2001 15:11:28 +0300 Message-ID: <07E6E3B8C072D211AC4100A0C9C5758302B27266@hasmsx52.iil.intel.com> From: "Hen, Shmulik" To: "'LKML'" , "'LNML'", netdev@oss.sgi.com Subject: [OT] ethtool MII helpers (actually two OT's) Date: Sun, 24 Jun 2001 15:11:23 +0300 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk MII --- Is there any support in the MII standard for 1000Mbps (GbE Fiber/Copper) ? Perhaps an extension to the standard ? I could see that some of the Gigabit adapters supported by the kernel provide the MII IOCTLs interface, but couldn't figure out how to extract the correct speed information from the registers I can read. I know it's a bit of a hassle and I have to get the local capabilities and match them against the partner's capabilities and find the highest common speed etc. etc. but I'm sure that if the driver can do it I can reproduce it in userland too. EthTool ------- Is there a way that I can extract the link status information out of the ethtool struct ? I could see that at least one Gigabit adapter driver (bcm5700.c), provides the EthTool interface and reports the correct speed and duplex mode but not the link status. Is there a place that defines how a driver is supposed to implement the support for EthTool ? I figured that since there is no separate field for link status (at least in version 1.2), a driver is supposed to report speed=0 or something like that when the link is down. I know this driver detects link status changes for sure because it prints messages every time, but the speed and duplex are always reported the same. Thanks, Shmulik Hen Software Engineer Linux Advanced Networking Services Intel Network Communications Group Jerusalem, Israel -----Original Message----- From: Jeff Garzik [mailto:jgarzik@mandrakesoft.com] Sent: Friday, June 22, 2001 8:59 AM To: Chris Wedgwood Cc: Linux Kernel Mailing List; netdev@oss.sgi.com; David S. Miller Subject: Re: PATCH: ethtool MII helpers Chris Wedgwood wrote: > > On Fri, Jun 22, 2001 at 01:24:36AM -0400, Jeff Garzik wrote: > > Sure, and that's planned. Wanna send me a patch for it? :) > > Possibly, but I wonder if this is a kernel-space problem or not. Why > not put all the smarts into userland for it? I meant, send me a patch for userland ethtool, to do exactly what you described. > It will definitely fall back on the MII ioctls if ethtool media > support for the desired command doesn't exist. > > Well, that is more or less as much as needs to be done. That, and > some kind of super-set API to be defined for all new stuff, having > two slightly different APIs for the same things sucks. Both APIs do different things but have a common subset, yes. The MII ioctls only do their thing for MII-like hardware. ethtool can be applied to any hardware. Old ISA drivers that don't do MII, or do it in a really nonstandard way. For example I have ethtool code locally which allows ne2k-pci to do media selection via ioctl, for two popular ne2k cards, something its never been able to do before. Emulating media selection support for things like 10base2<->10baseT<->AUI just isn't possible with the MII ioctls. MII is a standard and incredibly popular, thus mii-tool works most popular PCI NICs, for the most popular media types. But it's still basically a hardware interface. I am not convinced its a good idea for make the [G]MII ioctls the Linux software media interface for all network hardware. I see ethtool as the interface for tuning your NIC, that works across all hardware. I see mii-diag as the way to do advance MII-specific hardware stuff, like next page or HA monitoring or whatever. Jeff -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Sun Jun 24 07:16:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5OEGP231124 for netdev-outgoing; Sun, 24 Jun 2001 07:16:25 -0700 Received: from smtp1.cern.ch (smtp1.cern.ch [137.138.128.38]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5OEGOV31118 for ; Sun, 24 Jun 2001 07:16:24 -0700 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id QAA19886; Sun, 24 Jun 2001 16:16:14 +0200 (MET DST) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id QAA02060; Sun, 24 Jun 2001 16:16:14 +0200 To: Chris Wedgwood Cc: Dave Airlie , Jeff Garzik , netdev@oss.sgi.com Subject: Re: net driver directory moves for 2.5 References: <20010623035522.C3732@metastasis.f00f.org> From: Jes Sorensen Date: 24 Jun 2001 16:16:13 +0200 In-Reply-To: Chris Wedgwood's message of "Sat, 23 Jun 2001 03:55:22 +1200" Message-ID: Lines: 13 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk >>>>> "Chris" == Chris Wedgwood writes: Chris> Maybe even for some architectures doing arch or sub specific Chris> directories where there will be no overlap outside of the Chris> driver (e.g. old sun sbus cards?) No no no no Those Sun SBUS cards are using the same chips found on other cards, we should be able to share wherever possible. Keep drivers out of arch specific directories. Jes From owner-netdev@oss.sgi.com Sun Jun 24 07:18:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5OEIM631367 for netdev-outgoing; Sun, 24 Jun 2001 07:18:22 -0700 Received: from smtp1.cern.ch (smtp1.cern.ch [137.138.128.38]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5OEILV31360 for ; Sun, 24 Jun 2001 07:18:21 -0700 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id QAA01142; Sun, 24 Jun 2001 16:18:15 +0200 (MET DST) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id QAA02777; Sun, 24 Jun 2001 16:18:15 +0200 To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: Re: net driver directory moves for 2.5 References: <3B32D0FA.9451D3AA@mandrakesoft.com> <3B338810.C39CA0CE@mandrakesoft.com> From: Jes Sorensen Date: 24 Jun 2001 16:18:14 +0200 In-Reply-To: Jeff Garzik's message of "Fri, 22 Jun 2001 14:01:52 -0400" Message-ID: Lines: 23 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk >>>>> "Jeff" == Jeff Garzik writes: Jeff> Jes Sorensen wrote: >> Where do Lance ISA drivers go? It seems like a somewhat random >> choice of groups to me, ie. 8390, tulip & Lance directories kinda >> match but ISA and 3Com are orthogonal to this. Jeff> It's not a random choice of groups at all, it's where common Jeff> code will get grouped together the most. Jeff> Drivers that fit into 8390, tulip, or lance categories will go Jeff> into those directories. Which would include a lance ISA driver. Jeff> After that, you still have a ton of ISA and 3com drivers left Jeff> over which are not often used. Thus, the isa and 3com Jeff> subdirectories. I still think you should be consistent and stick to chip specific directories. Some of those ISA drivers support both ISA, EISA and PCI cards, some fall into multiple categories as mentioned earlier. Sticking to drivers/net// solves a lot of the problem and keeps it a lot cleaner IMHO. Jes From owner-netdev@oss.sgi.com Sun Jun 24 09:38:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5OGcW012485 for netdev-outgoing; Sun, 24 Jun 2001 09:38:32 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5OGcUV12482 for ; Sun, 24 Jun 2001 09:38:30 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA05831; Sun, 24 Jun 2001 20:38:05 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200106241638.UAA05831@ms2.inr.ac.ru> Subject: Re: patch: ipv6 nexthop can be 6to4 address To: pekkas@netcore.fi (Pekka Savola) Date: Sun, 24 Jun 2001 20:38:05 +0400 (MSK DST) Cc: netdev@oss.sgi.com, davem@redhat.com In-Reply-To: from "Pekka Savola" at Jun 23, 1 11:00:21 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > The current implementation basically assumes the only v6-over-v4 tunneling > method is automatic tunneling with compatible addresses. This is not true. Next hop address on tunnel routes is not more than a trick to tunnel to given IPv4 address, it is dummy and its IPv6 format is totally meaningless. No matter, what transition scheme is used today or tomorrow, compat addresses are enough and we do not need to change this place each time when someone invents a new format with inlined IPv4 address. Alexey From owner-netdev@oss.sgi.com Sun Jun 24 10:34:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5OHY6813207 for netdev-outgoing; Sun, 24 Jun 2001 10:34:06 -0700 Received: from exch-connector.netcomsystems.com (mushroom.netcomsystems.com [12.9.24.195]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5OHY5V13204 for ; Sun, 24 Jun 2001 10:34:05 -0700 Received: by exch-connector.netcomsystems.com with Internet Mail Service (5.5.2653.19) id ; Sun, 24 Jun 2001 10:34:00 -0700 Message-ID: <9384475DFC05D2118F9C00805F6F2631038B2F2B@exchange1.netcomsystems.com> From: "Perches, Joe" To: "'Hen, Shmulik'" , netdev@oss.sgi.com Subject: RE: [OT] ethtool MII helpers (actually two OT's) Date: Sun, 24 Jun 2001 10:33:53 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Yes. see: IEEE Standard 802.3y 1997 - subclause 32.5.4.2 Auto-Negotiation Link Partner Ability register, Table 32-1 Link Partner Next Page Ability register bit definitions. General Ethernet info: http://www.ots.utexas.edu/ethernet/ Ethernet Standards: http://www.ots.utexas.edu/ethernet/standard.html The IEEE has announced a pilot program that offers free online copies (PDF format) of IEEE standards, including the Ethernet 802.3 standard, at: http://standards.ieee.org/getieee802/. Once you have agreed to the IEEE's "Terms and Conditions" then select IEEE 802.3: CSMA/CD Access Method for access to an online copy of the latest version of the Ethernet standard. Regards, Joe Perches > -----Original Message----- > From: Hen, Shmulik [mailto:shmulik.hen@intel.com] > MII > --- > Is there any support in the MII standard for 1000Mbps (GbE > Fiber/Copper) ? Perhaps an extension to the standard ? From owner-netdev@oss.sgi.com Tue Jun 26 01:14:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5Q8ETN23680 for netdev-outgoing; Tue, 26 Jun 2001 01:14:29 -0700 Received: from gateserver.tesis.com.ru (IDENT:root@[213.147.56.248]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5Q8ERV23677 for ; Tue, 26 Jun 2001 01:14:27 -0700 Received: from feoktistov (feoktistov [192.168.1.28]) by gateserver.tesis.com.ru (8.10.1/8.10.1) with SMTP id f5Q8FTc20121 for ; Tue, 26 Jun 2001 12:15:30 +0400 From: "Yuri Feoktistov" To: Subject: Question. Date: Tue, 26 Jun 2001 12:15:53 +0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 145 Lines: 7 Hello, Do you know, where to get information about Linux 2.4.4 zero-copy technology and sendile() system call ? Yuri Feoktistov, Tesis/Russia. From owner-netdev@oss.sgi.com Wed Jun 27 01:11:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5R8BQT08975 for netdev-outgoing; Wed, 27 Jun 2001 01:11:26 -0700 Received: from MAIL.USINOR.FR (mail.usinor.fr [195.6.82.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5R8BOV08972 for ; Wed, 27 Jun 2001 01:11:25 -0700 Received: from mail.usinor.com (2.0.0.51 [2.0.0.51]) by MAIL.USINOR.FR with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id M0J8MDLL; Wed, 27 Jun 2001 10:04:13 +0200 MIME-Version: 1.0 Date: Wed, 27 Jun 2001 11:10 +0200 Message-id: <200106270910.CUQZ@mail.usinor.com> From: DELHAYE OLIVIER To: sbertin@mindspring.com, Bogdan.Costescu@IWR.Uni-Heidelberg.De To: netdev@oss.sgi.com Subject: ioctl in module Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f5R8BPV08973 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 594 Lines: 19 --- Reçu de EUTI.ZZDELHAY 03 28 29 78 61 27-06-01 10.10 Hello I have a problem in my module program, i must set the RTS in my module for reading my ttySX, in consequense i must use the ioctl function , but in fact i can't; because it's a module. can you tell me how setting the RTS in my module, (with syscall() ?; ...) i use open_flip for open the ttyS0 Thanks for your response , Olivier ---- 27-06-01 10.10 ---- Envoyé à ------------------------------------ -> sbertin(a)mindspring.com -> Bogdan.Costescu(a)IWR.Uni-Heidelberg.De -> netdev(a)oss.sgi.com From owner-netdev@oss.sgi.com Wed Jun 27 05:03:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5RC3XX12464 for netdev-outgoing; Wed, 27 Jun 2001 05:03:33 -0700 Received: from l.himel.bg (IDENT:root@unamed.infotel.bg [212.39.68.18] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5RC3MV12460 for ; Wed, 27 Jun 2001 05:03:25 -0700 Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.9.3/8.9.3) with ESMTP id PAA17951; Wed, 27 Jun 2001 15:03:52 +0300 Date: Wed, 27 Jun 2001 15:03:52 +0300 (EEST) From: Julian Anastasov X-Sender: ja@l To: netdev@oss.sgi.com cc: kuznet@ms2.inr.ac.ru Subject: static routes and dead gateway detection Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4238 Lines: 120 Hello, In the current kernels I see that all routes are deleted when a device goes down (!IFF_UP -> NETDEV_DOWN). I can't find a place where the proto static routes are used. So, I implemented a way to make the proto static routes permanent. By this way devices that change the device status will not delete these static routes. They are marked dead and are not used (they are ignored). The current kernels already have the ability to detect these dead routes and to ignore them. So, I'm wondering whether such patch (appended) is useful for the mainstream kernel(s). It is for 2.2 and can be ported to 2.4 too. How these RTPROT codes are really used in the routing daemons and do they use static routes too? The patch works as expected and I don't see problems for now. The dead static routes as preserved until all nexthop devices are deleted or until the pref source is deleted. What I see as problem even in the plain 2.2.19 kernel is that when one device for one of the nexthops (when the prefsrc is not from this device) is removed and added again it can receive another dev index and the nexthop remains unused until the multipath route is recreated. May be due to missing ref counter. I see ref counting in 2.4 but I'm not sure whether the same problem exists there. But it seems such problem can be solved only with a nh_ifname field or something similar. This can be a 2.5 issue may be. I'm just not sure whether an 2.2 kernel oops can occur when the nexthop uses a removed device and nh_dev points to "somewhere". The patch contains a fix in fib_sync_up() about similar problem, i.e. not to touch nh_dev for DEAD routes. Comments? Regards -- Julian Anastasov --- v2.2.19/linux/include/net/ip_fib.h.orig Mon Jun 25 15:56:49 2001 +++ linux/include/net/ip_fib.h Wed Jun 27 12:02:23 2001 @@ -214,6 +214,7 @@ extern int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, u8 tb_id, u8 type, u8 scope, void *dst, int dst_len, u8 tos, struct fib_info *fi); +extern int fib_num_down_nh_devs(struct fib_info *fi); extern int fib_sync_down(u32 local, struct device *dev, int force); extern int fib_sync_up(struct device *dev); extern int fib_convert_rtentry(int cmd, struct nlmsghdr *nl, struct rtmsg *rtm, --- v2.2.19/linux/net/ipv4/fib_hash.c.orig Mon Feb 19 13:44:54 2001 +++ linux/net/ipv4/fib_hash.c Mon Jun 25 16:36:43 2001 @@ -349,7 +349,8 @@ if ((f->fn_state&FN_S_ZOMBIE) || f->fn_scope != res->scope || - f->fn_type != RTN_UNICAST) + f->fn_type != RTN_UNICAST || + next_fi->fib_flags&RTNH_F_DEAD) continue; if (next_fi->fib_priority > res->fi->fib_priority) @@ -686,7 +687,10 @@ while ((f = *fp) != NULL) { struct fib_info *fi = FIB_INFO(f); - if (fi && ((f->fn_state&FN_S_ZOMBIE) || (fi->fib_flags&RTNH_F_DEAD))) { + if (fi && ((f->fn_state&FN_S_ZOMBIE) || + (fi->fib_flags&RTNH_F_DEAD && + (fi->fib_protocol != RTPROT_STATIC || + !fib_num_down_nh_devs(fi))))) { *fp = f->fn_next; synchronize_bh(); --- v2.2.19/linux/net/ipv4/fib_semantics.c.orig Mon Feb 19 13:44:44 2001 +++ linux/net/ipv4/fib_semantics.c Wed Jun 27 14:19:00 2001 @@ -170,6 +170,30 @@ return -1; } +/* + * Return 0 only when we are sure that the preferred source is deleted + * or when all nexthop devices are removed + */ + +int fib_num_down_nh_devs(struct fib_info *fi) +{ +struct in_device *in_dev; +struct device *dev; +int dead = 0; + + change_nexthops(fi) { + if (!(nh->nh_flags&RTNH_F_DEAD)) + return 0; + dev = dev_get_by_index(nh->nh_oif); + if (dev && !(dev->flags&IFF_UP) && + ((in_dev = dev->ip_ptr) != NULL) && + in_dev->ifa_list) + dead ++; + } endfor_nexthops(fi) + /* dead>0: All are marked DEAD but there is one in DOWN state */ + return dead; +} + #ifdef CONFIG_IP_ROUTE_MULTIPATH static u32 fib_get_attr32(struct rtattr *attr, int attrlen, int type) @@ -880,9 +904,9 @@ alive++; continue; } - if (nh->nh_dev == NULL || !(nh->nh_dev->flags&IFF_UP)) + if (nh->nh_oif != dev->ifindex || dev->ip_ptr == NULL) continue; - if (nh->nh_dev != dev || dev->ip_ptr == NULL) + if (nh->nh_dev == NULL || !(nh->nh_dev->flags&IFF_UP)) continue; alive++; nh->nh_power = 0; From owner-netdev@oss.sgi.com Thu Jun 28 17:26:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5T0QPs21317 for netdev-outgoing; Thu, 28 Jun 2001 17:26:25 -0700 Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5T0QKV21314 for ; Thu, 28 Jun 2001 17:26:21 -0700 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id KAA21552 for ; Fri, 29 Jun 2001 10:26:16 +1000 X-Authentication-Warning: blackbird.intercode.com.au: jmorris owned process doing -bs Date: Fri, 29 Jun 2001 10:26:16 +1000 (EST) From: James Morris To: Subject: [PATCH] ip_queue postrouting oops fix Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3987 Lines: 149 This patch is a resend, it was originally sent to Linus by Rusty, but has not appeared in the kernel yet. Description: Fixes socket ownership oops and routing logic for mangled locally generated outgoing packets. - James -- James Morris diff -ur linux-2.4.4/net/ipv4/netfilter/ip_queue.c linux-2.4.4-a/net/ipv4/netfilter/ip_queue.c --- linux-2.4.4/net/ipv4/netfilter/ip_queue.c Tue Dec 12 07:37:04 2000 +++ linux-2.4.4-a/net/ipv4/netfilter/ip_queue.c Mon Apr 30 19:33:58 2001 @@ -24,19 +24,28 @@ #include #include #include +#include #include +#include #define IPQ_QMAX_DEFAULT 1024 #define IPQ_PROC_FS_NAME "ip_queue" #define NET_IPQ_QMAX 2088 #define NET_IPQ_QMAX_NAME "ip_queue_maxlen" +typedef struct ipq_rt_info { + __u8 tos; + __u32 daddr; + __u32 saddr; +} ipq_rt_info_t; + typedef struct ipq_queue_element { struct list_head list; /* Links element into queue */ int verdict; /* Current verdict */ struct nf_info *info; /* Extra info from netfilter */ struct sk_buff *skb; /* Packet inside */ + ipq_rt_info_t rt_info; /* May need post-mangle routing */ } ipq_queue_element_t; typedef int (*ipq_send_cb_t)(ipq_queue_element_t *e); @@ -64,7 +73,6 @@ * Packet queue * ****************************************************************************/ - /* Dequeue a packet if matched by cmp, or the next available if cmp is NULL */ static ipq_queue_element_t * ipq_dequeue(ipq_queue_t *q, @@ -150,9 +158,19 @@ printk(KERN_ERR "ip_queue: OOM in enqueue\n"); return -ENOMEM; } + e->verdict = NF_DROP; e->info = info; e->skb = skb; + + if (e->info->hook == NF_IP_LOCAL_OUT) { + struct iphdr *iph = skb->nh.iph; + + e->rt_info.tos = iph->tos; + e->rt_info.daddr = iph->daddr; + e->rt_info.saddr = iph->saddr; + } + spin_lock_bh(&q->lock); if (q->len >= *q->maxlen) { spin_unlock_bh(&q->lock); @@ -198,6 +216,32 @@ kfree(q); } +/* With a chainsaw... */ +static int route_me_harder(struct sk_buff *skb) +{ + struct iphdr *iph = skb->nh.iph; + struct rtable *rt; + + struct rt_key key = { + dst:iph->daddr, src:iph->saddr, + oif:skb->sk ? skb->sk->bound_dev_if : 0, + tos:RT_TOS(iph->tos)|RTO_CONN, +#ifdef CONFIG_IP_ROUTE_FWMARK + fwmark:skb->nfmark +#endif + }; + + if (ip_route_output_key(&rt, &key) != 0) { + printk("route_me_harder: No more route.\n"); + return -EINVAL; + } + + /* Drop old route. */ + dst_release(skb->dst); + skb->dst = &rt->u.dst; + return 0; +} + static int ipq_mangle_ipv4(ipq_verdict_msg_t *v, ipq_queue_element_t *e) { int diff; @@ -223,6 +267,8 @@ "in mangle, dropping packet\n"); return -ENOMEM; } + if (e->skb->sk) + skb_set_owner_w(newskb, e->skb->sk); kfree_skb(e->skb); e->skb = newskb; } @@ -230,6 +276,19 @@ } memcpy(e->skb->data, v->payload, v->data_len); e->skb->nfcache |= NFC_ALTERED; + + /* + * Extra routing may needed on local out, as the QUEUE target never + * returns control to the table. + */ + if (e->info->hook == NF_IP_LOCAL_OUT) { + struct iphdr *iph = e->skb->nh.iph; + + if (!(iph->tos == e->rt_info.tos + && iph->daddr == e->rt_info.daddr + && iph->saddr == e->rt_info.saddr)) + return route_me_harder(e->skb); + } return 0; } diff -ur linux-2.4.4/net/ipv4/netfilter/iptable_mangle.c linux-2.4.4-a/net/ipv4/netfilter/iptable_mangle.c --- linux-2.4.4/net/ipv4/netfilter/iptable_mangle.c Tue Jan 30 03:07:30 2001 +++ linux-2.4.4-a/net/ipv4/netfilter/iptable_mangle.c Mon Apr 30 19:29:48 2001 @@ -148,7 +148,7 @@ ret = ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL); /* Reroute for ANY change. */ - if (ret != NF_DROP && ret != NF_STOLEN + if (ret != NF_DROP && ret != NF_STOLEN && ret != NF_QUEUE && ((*pskb)->nh.iph->saddr != saddr || (*pskb)->nh.iph->daddr != daddr || (*pskb)->nfmark != nfmark From owner-netdev@oss.sgi.com Thu Jun 28 17:28:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5T0SQu21395 for netdev-outgoing; Thu, 28 Jun 2001 17:28:26 -0700 Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5T0SOV21392 for ; Thu, 28 Jun 2001 17:28:24 -0700 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id KAA21556 for ; Fri, 29 Jun 2001 10:28:21 +1000 X-Authentication-Warning: blackbird.intercode.com.au: jmorris owned process doing -bs Date: Fri, 29 Jun 2001 10:28:21 +1000 (EST) From: James Morris To: Subject: [PATCH] ip_queue malformed netlink message oops fix Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 892 Lines: 39 This patch is a resend. Description: Fixes oops caused by short/malformed length Netlink messages. - James -- James Morris diff -urN linux-2.4.5.orig/net/ipv4/netfilter/ip_queue.c linux/net/ipv4/netfilter/ip_queue.c --- linux-2.4.5.orig/net/ipv4/netfilter/ip_queue.c Tue Dec 12 07:37:04 2000 +++ linux/net/ipv4/netfilter/ip_queue.c Fri Jun 1 22:25:17 2001 @@ -431,10 +431,15 @@ int status, type; struct nlmsghdr *nlh; + if (skb->len < sizeof(struct nlmsghdr)) + return; + nlh = (struct nlmsghdr *)skb->data; - if (nlh->nlmsg_len < sizeof(*nlh) - || skb->len < nlh->nlmsg_len - || nlh->nlmsg_pid <= 0 + if (nlh->nlmsg_len < sizeof(struct nlmsghdr) + || skb->len < nlh->nlmsg_len) + return; + + if(nlh->nlmsg_pid <= 0 || !(nlh->nlmsg_flags & NLM_F_REQUEST) || nlh->nlmsg_flags & NLM_F_MULTI) RCV_SKB_FAIL(-EINVAL); From owner-netdev@oss.sgi.com Thu Jun 28 17:32:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5T0W1o21536 for netdev-outgoing; Thu, 28 Jun 2001 17:32:01 -0700 Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5T0VwV21527 for ; Thu, 28 Jun 2001 17:31:59 -0700 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id KAA21584 for ; Fri, 29 Jun 2001 10:31:56 +1000 X-Authentication-Warning: blackbird.intercode.com.au: jmorris owned process doing -bs Date: Fri, 29 Jun 2001 10:31:56 +1000 (EST) From: James Morris To: Subject: [PATCH] ip_queue restore MAC copying Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 917 Lines: 28 Description: Restores ability for ip_queue to copy MAC data to userspace following changes made to skbuf structure. - James -- James Morris diff -ur linux-2.4.4/net/ipv4/netfilter/ip_queue.c linux-2.4.4-nf1/net/ipv4/netfilter/ip_queue.c --- linux-2.4.4/net/ipv4/netfilter/ip_queue.c Tue Dec 12 07:37:04 2000 +++ linux-2.4.4-nf1/net/ipv4/netfilter/ip_queue.c Fri May 4 20:47:45 2001 @@ -400,6 +400,13 @@ if (e->info->outdev) strcpy(pm->outdev_name, e->info->outdev->name); else pm->outdev_name[0] = '\0'; pm->hw_protocol = e->skb->protocol; + if (e->info->indev && e->skb->dev) { + pm->hw_type = e->skb->dev->type; + if (e->skb->dev->hard_header_parse) + pm->hw_addrlen = + e->skb->dev->hard_header_parse(e->skb, + pm->hw_addr); + } if (data_len) memcpy(pm->payload, e->skb->data, data_len); nlh->nlmsg_len = skb->tail - old_tail; From owner-netdev@oss.sgi.com Fri Jun 29 05:28:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TCSKm31973 for netdev-outgoing; Fri, 29 Jun 2001 05:28:20 -0700 Received: from amsfire1.eur.nai.com (firewall-user@[161.69.153.125]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TCSJV31970 for ; Fri, 29 Jun 2001 05:28:19 -0700 Received: by amsfire1.eur.nai.com; id MAA28008; Fri, 29 Jun 2001 12:29:58 GMT Received: from unknown(161.69.147.199) by amsfire1.eur.nai.com via smap (V5.5) id xma027998; Fri, 29 Jun 01 12:29:52 GMT Received: FROM ams-ex-bridge1.nai.com BY ams-webshield1.eur.nai.com ; Fri Jun 29 14:29:00 2001 +0200 Received: by AMS-147-229.nai.com with Internet Mail Service (5.5.2653.19) id ; Fri, 29 Jun 2001 14:31:52 +0200 Message-ID: From: "Crowe, Simon" To: "'netdev@oss.sgi.com'" Subject: Transparent Proxying Date: Fri, 29 Jun 2001 14:27:14 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 413 Lines: 12 Dear All, I'm trying to develop a transparent proxy where the user land process is trying to proxy for the client, but the request sent from that proxy needs to be as if it came from the client. That is, I want to pretend to be the client IP address and port number. On the 2.2 kernels I could do this with a bind call to the non local address. On 2.4 how do I pretend to be the client address ? Regards Simon From owner-netdev@oss.sgi.com Fri Jun 29 07:27:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TERr204849 for netdev-outgoing; Fri, 29 Jun 2001 07:27:53 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TERpV04846 for ; Fri, 29 Jun 2001 07:27:51 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 0D2621F86; Fri, 29 Jun 2001 10:27:49 -0400 (EDT) Message-ID: <3B3C9089.D85A26@mandrakesoft.com> Date: Fri, 29 Jun 2001 10:28:25 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton , "David S. Miller" , netdev@oss.sgi.com Cc: Alan Cox , Linus Torvalds Subject: Re: alloc_etherdev breaks ether= References: <3B3C0137.1896A895@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1414 Lines: 38 Andrew Morton wrote: > > And it looks hard to fix. > > Problem is that ether= "knows" the device name, eth0. Whereas > alloc_etherdev defers it. And the probe() method needs to get > at dev->mem_start. Important. Here is my proposed -2.4- fix: ether=XXX is legacy mess that doesn't really apply in the present day. It is useful for managing a bunch of ISA drivers built into the kernel, and that's about it. This has been broken in various PCI drivers since alloc_etherdev was introduced in 2.4.2-preXX, and this is the first time it's come up. ether=XXX should still work with ISA drivers (they use dev->init() not alloc_etherdev), so my suggestion is to state that, for 2.4, ether=XXX is only supported for ISA drivers (and other drivers with similar manual probing/ordering needs). For all other drivers, do the normal thing and use a __setup function to set boot-time options, which is really all that 3c59x and the other Becker-derived drivers need. Comments? As Andrew said on IRC, really the only people who build their drivers into their kernels are propellerheads like us :) All vendors I know of build PCI net drivers as modules. So, yes this change would be a "flag day" type change, but it should affect very few people, and the people it does affect are smart enough to s/ether=/3c59x=/ Jeff -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Fri Jun 29 08:19:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TFJFI08181 for netdev-outgoing; Fri, 29 Jun 2001 08:19:15 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TFJCV08176 for ; Fri, 29 Jun 2001 08:19:12 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 15G028-0000TQ-00; Fri, 29 Jun 2001 16:18:12 +0100 Subject: Re: alloc_etherdev breaks ether= To: jgarzik@mandrakesoft.com (Jeff Garzik) Date: Fri, 29 Jun 2001 16:18:12 +0100 (BST) Cc: andrewm@uow.edu.au (Andrew Morton), davem@redhat.com (David S. Miller), netdev@oss.sgi.com, alan@lxorguk.ukuu.org.uk (Alan Cox), torvalds@transmeta.com (Linus Torvalds) In-Reply-To: <3B3C9089.D85A26@mandrakesoft.com> from "Jeff Garzik" at Jun 29, 2001 10:28:25 AM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1157 Lines: 33 > Here is my proposed -2.4- fix: Sounds like a 2.5 fix > ether=XXX is legacy mess that doesn't really apply in the present day. > It is useful for managing a bunch of ISA drivers built into the kernel, And several PCI ones > For all other drivers, do the normal thing and use a __setup function to > set boot-time options, which is really all that 3c59x and the other > Becker-derived drivers need. By all means add the extra setup options. It makes migration easier. However I would point out that the original implementation of eth= worked perfectly for 2.4 and someone broke it. When you create ethfoo it creates a net_device initialised with ethfoo's parameters, so its also trivial to add eth_get_params(struct something *) to handle the other cases without breaking compatibility > know of build PCI net drivers as modules. So, yes this change would be > a "flag day" type change, but it should affect very few people, and the > people it does affect are smart enough to s/ether=/3c59x=/ 2.5 IMHO, and I agree entirely that 3c59x= is a lot saner, but I just happen to think its about time 2.4 started pretending to be a production OS From owner-netdev@oss.sgi.com Fri Jun 29 08:26:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TFQXt09976 for netdev-outgoing; Fri, 29 Jun 2001 08:26:33 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TFQVV09966 for ; Fri, 29 Jun 2001 08:26:32 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 40F2C1F86; Fri, 29 Jun 2001 11:26:30 -0400 (EDT) Message-ID: <3B3C9E4A.124EC206@mandrakesoft.com> Date: Fri, 29 Jun 2001 11:27:06 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Alan Cox Cc: Andrew Morton , "David S. Miller" , netdev@oss.sgi.com, Linus Torvalds Subject: Re: alloc_etherdev breaks ether= References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1093 Lines: 39 Alan Cox wrote: > By all means add the extra setup options. It makes migration easier. However > I would point out that the original implementation of eth= worked perfectly > for 2.4 and someone broke it. Not really, - init_etherdev - failure, call unregister_netdev - ether= is no longer correct or - init_etherdev, calls /sbin/hotplug - device comes up before dev->mem_start is read/set from ether= It worked perfectly for the perfect case. Anything -not- explicitly initialized in Space.c has fundamental incompatibilities with ether= no matter which way you slice it, alloc_etherdev or no. > When you create ethfoo it creates a net_device initialised with ethfoo's > parameters, so its also trivial to add > > eth_get_params(struct something *) > > to handle the other cases without breaking compatibility hmmm... solving the race where register_netdev is called before struct net_device is filled in is in conflict with a useable ether= AFAICS. Better suggestions welcome... -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Fri Jun 29 08:38:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TFc3313641 for netdev-outgoing; Fri, 29 Jun 2001 08:38:03 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TFc1V13626 for ; Fri, 29 Jun 2001 08:38:01 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 15G0KM-0000Va-00; Fri, 29 Jun 2001 16:37:02 +0100 Subject: Re: alloc_etherdev breaks ether= To: jgarzik@mandrakesoft.com (Jeff Garzik) Date: Fri, 29 Jun 2001 16:37:02 +0100 (BST) Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), andrewm@uow.edu.au (Andrew Morton), davem@redhat.com (David S. Miller), netdev@oss.sgi.com, torvalds@transmeta.com (Linus Torvalds) In-Reply-To: <3B3C9E4A.124EC206@mandrakesoft.com> from "Jeff Garzik" at Jun 29, 2001 11:27:06 AM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 791 Lines: 24 > - init_etherdev > - failure, call unregister_netdev > - ether= is no longer correct I tested this path in 2.4.0pre - it worked then. The device is unregistered, the next device registered is created with the same name and assigned the same options > - init_etherdev, calls /sbin/hotplug > - device comes up before dev->mem_start is read/set from ether= If your device is coming up before you register it then yes you need to re-order stuff and get the parameters seperately. But that isnt a big problem - its also already buggy as hell when this occurs and we have drivers reporting eth%s: blah blah and stuff > Anything -not- explicitly initialized in Space.c has fundamental > incompatibilities with ether= no matter which way you slice it, > alloc_etherdev or no. I disagree. From owner-netdev@oss.sgi.com Fri Jun 29 08:47:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TFlU816661 for netdev-outgoing; Fri, 29 Jun 2001 08:47:30 -0700 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TFlSV16647 for ; Fri, 29 Jun 2001 08:47:28 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id BAA04412; Sat, 30 Jun 2001 01:46:28 +1000 (EST) Message-ID: <3B3CA2D4.608135EF@uow.edu.au> Date: Sat, 30 Jun 2001 01:46:28 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Alan Cox CC: Jeff Garzik , "David S. Miller" , netdev@oss.sgi.com, Linus Torvalds Subject: Re: alloc_etherdev breaks ether= References: <3B3C9089.D85A26@mandrakesoft.com> from "Jeff Garzik" at Jun 29, 2001 10:28:25 AM Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1874 Lines: 54 Alan Cox wrote: > > ... > > When you create ethfoo it creates a net_device initialised with ethfoo's > parameters, so its also trivial to add > > eth_get_params(struct something *) > > to handle the other cases without breaking compatibility Unfortunately there's an ordering problem. The driver doesn't know its value of `foo' until the final act, where it calls register_netdevice(). But it needs to know its setup parameters a long time before that, and it needs `foo' for that. And we can't call register_netdevice() earlier, because the device isn't ready to be opened yet. And it can't be made ready to be opened until it knows its setup parameters, ad inifintum. So to support `ether=' we need to know the interface's actual name at the *start* of probing, not the end. Pseudo-code: xxx_probe() { dev = alloc_etherdev(); eth_get_params(dev, dev->name); /* dev->name is "eth%d". oops */ register_netdevice(dev); /* "eth0" gets determined here. oops */ } Which basically takes us back to the thing I did in December: allocate and reserve the device name at the start of probe, and publish it (ie: make it eligible for open) at the end of probe. alloc_etherdev() can do this. Allocate the name, then populate *dev with it, then just reserve the interface number in some little array, then call netdev_boot_setup_check(). Drivers need to be changed so that if xxx_probe() decides that it's not going to register the interface after all, it will need to call a new API function to clear the slot in the array. This means that the availability of device names is stored in two places: the current device list and the array. That's rather unpleasant. Alternative is to always use the device list, but add a "hidden" state. Deja Vu. I can put that stuff back together over the weekend - I think it'll be quite straightforward. From owner-netdev@oss.sgi.com Fri Jun 29 08:55:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TFtaO19120 for netdev-outgoing; Fri, 29 Jun 2001 08:55:36 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TFtXV19099 for ; Fri, 29 Jun 2001 08:55:34 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 15G0ba-0000XJ-00; Fri, 29 Jun 2001 16:54:50 +0100 Subject: Re: alloc_etherdev breaks ether= To: andrewm@uow.edu.au (Andrew Morton) Date: Fri, 29 Jun 2001 16:54:50 +0100 (BST) Cc: alan@lxorguk.ukuu.org.uk (Alan Cox), jgarzik@mandrakesoft.com (Jeff Garzik), davem@redhat.com (David S. Miller), netdev@oss.sgi.com, torvalds@transmeta.com (Linus Torvalds) In-Reply-To: <3B3CA2D4.608135EF@uow.edu.au> from "Andrew Morton" at Jun 30, 2001 01:46:28 AM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 349 Lines: 12 > Which basically takes us back to the thing I did in > December: allocate and reserve the device name at the > start of probe, and publish it (ie: make it eligible for > open) at the end of probe. Which was the right thing to have done anyway > I can put that stuff back together over the weekend - I think > it'll be quite straightforward. Ok From owner-netdev@oss.sgi.com Fri Jun 29 09:01:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TG1OM20903 for netdev-outgoing; Fri, 29 Jun 2001 09:01:24 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TG1MV20890 for ; Fri, 29 Jun 2001 09:01:22 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id CF68B1F8A; Fri, 29 Jun 2001 12:01:20 -0400 (EDT) Message-ID: <3B3CA674.E0C2C0D0@mandrakesoft.com> Date: Fri, 29 Jun 2001 12:01:56 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Alan Cox Cc: Andrew Morton , "David S. Miller" , netdev@oss.sgi.com, Linus Torvalds Subject: Re: alloc_etherdev breaks ether= References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1393 Lines: 41 Alan Cox wrote: > > > - init_etherdev > > - failure, call unregister_netdev > > - ether= is no longer correct > > I tested this path in 2.4.0pre - it worked then. The device is unregistered, > the next device registered is created with the same name and assigned the > same options Yeah but I don't view that behavior as perfect :) For cases where the reason for device unregister is not ENODEV, you are now assigning device B options intended for device A. That sort of stuff has all sorts of unintended consequences. > > - init_etherdev, calls /sbin/hotplug > > - device comes up before dev->mem_start is read/set from ether= > > If your device is coming up before you register it then yes you need to > re-order stuff and get the parameters seperately. But that isnt a big problem > - its also already buggy as hell when this occurs and we have drivers reporting > eth%s: blah blah I won't repeat what Andrew said. WRT drivers reporting "eth%d" now, yes that is a cleanup. Printing dev->name was similar to the ether= case above... it was usually right, but not always. Consider, in 2.4, 2.2, or 2.0: dev = init_etherdev(...); /* assigned eth0 */ printk(... dev->name ...); failure, unregister_netdev(dev); dev = init_etherdev(...); /* assigned eth0 */ printk(... dev->name ...); -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Fri Jun 29 09:08:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5TG82322050 for netdev-outgoing; Fri, 29 Jun 2001 09:08:02 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5TG81V22047 for ; Fri, 29 Jun 2001 09:08:01 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 303B81F88; Fri, 29 Jun 2001 12:08:00 -0400 (EDT) Message-ID: <3B3CA803.FF5BFD68@mandrakesoft.com> Date: Fri, 29 Jun 2001 12:08:35 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton Cc: Alan Cox , "David S. Miller" , netdev@oss.sgi.com, Linus Torvalds Subject: Re: alloc_etherdev breaks ether= References: <3B3C9089.D85A26@mandrakesoft.com> from "Jeff Garzik" at Jun 29, 2001 10:28:25 AM <3B3CA2D4.608135EF@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 412 Lines: 13 Andrew Morton wrote: > Which basically takes us back to the thing I did in > December: allocate and reserve the device name at the > start of probe, and publish it (ie: make it eligible for > open) at the end of probe. How does this solve the problem I just described, where device B gets options intended for device A? -- Jeff Garzik | Andre the Giant has a posse. Building 1024 | MandrakeSoft | From owner-netdev@oss.sgi.com Fri Jun 29 17:55:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5U0tT624094 for netdev-outgoing; Fri, 29 Jun 2001 17:55:29 -0700 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5U0tSV24088 for ; Fri, 29 Jun 2001 17:55:28 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id KAA12825; Sat, 30 Jun 2001 10:55:05 +1000 (EST) Message-ID: <3B3D2366.FDD316E3@uow.edu.au> Date: Sat, 30 Jun 2001 10:55:02 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: Alan Cox , "David S. Miller" , netdev@oss.sgi.com, Linus Torvalds Subject: Re: alloc_etherdev breaks ether= References: <3B3C9089.D85A26@mandrakesoft.com> from "Jeff Garzik" at Jun 29, 2001 10:28:25 AM <3B3CA2D4.608135EF@uow.edu.au> <3B3CA803.FF5BFD68@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 662 Lines: 17 Jeff Garzik wrote: > > Andrew Morton wrote: > > Which basically takes us back to the thing I did in > > December: allocate and reserve the device name at the > > start of probe, and publish it (ie: make it eligible for > > open) at the end of probe. > > How does this solve the problem I just described, where device B gets > options intended for device A? Or where someone swaps your NICs around. Or where the kernel changes its bus scan direction. Or when you're using /sbin/hotplug to load the drivers and the wind is in the South East. It doesn't. We need to be able to address interfaces by MAC address, bus location, etcetera to solve these things. From owner-netdev@oss.sgi.com Sat Jun 30 22:15:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f615FOP16686 for netdev-outgoing; Sat, 30 Jun 2001 22:15:24 -0700 Received: from vindaloo.ras.ucalgary.ca (vindaloo.ras.ucalgary.ca [136.159.55.21]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f615FNV16682 for ; Sat, 30 Jun 2001 22:15:23 -0700 Received: (from rgooch@localhost) by vindaloo.ras.ucalgary.ca (8.10.0/8.10.0) id f615EuA26546; Sat, 30 Jun 2001 23:14:56 -0600 Date: Sat, 30 Jun 2001 23:14:56 -0600 Message-Id: <200107010514.f615EuA26546@vindaloo.ras.ucalgary.ca> From: Richard Gooch To: Andrew Morton Cc: Jeff Garzik , Alan Cox , "David S. Miller" , netdev@oss.sgi.com, Linus Torvalds Subject: Re: alloc_etherdev breaks ether= In-Reply-To: <3B3D2366.FDD316E3@uow.edu.au> References: <3B3C9089.D85A26@mandrakesoft.com> <3B3CA2D4.608135EF@uow.edu.au> <3B3CA803.FF5BFD68@mandrakesoft.com> <3B3D2366.FDD316E3@uow.edu.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 892 Lines: 26 Andrew Morton writes: > Jeff Garzik wrote: > > > > Andrew Morton wrote: > > > Which basically takes us back to the thing I did in > > > December: allocate and reserve the device name at the > > > start of probe, and publish it (ie: make it eligible for > > > open) at the end of probe. > > > > How does this solve the problem I just described, where device B gets > > options intended for device A? > > Or where someone swaps your NICs around. Or where the kernel changes > its bus scan direction. Or when you're using /sbin/hotplug to load > the drivers and the wind is in the South East. > > It doesn't. We need to be able to address interfaces by MAC address, > bus location, etcetera to solve these things. ^^^^^^^^^^^^ /dev/netif/eth0 symlink to /dev/bus/pci0/slot1/function0/eth Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca