From davem@redhat.com Thu May 1 01:08:20 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 01:08:32 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4188GFu012758 for ; Thu, 1 May 2003 01:08:18 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA32338; Thu, 1 May 2003 00:00:58 -0700 Date: Thu, 01 May 2003 00:00:58 -0700 (PDT) Message-Id: <20030501.000058.39187964.davem@redhat.com> To: kuznet@ms2.inr.ac.ru Cc: shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br, rusty@rustcorp.com.au Subject: Re: dev->destructor From: "David S. Miller" In-Reply-To: <200305010110.FAA08689@sex.inr.ac.ru> References: <20030429.232631.68131803.davem@redhat.com> <200305010110.FAA08689@sex.inr.ac.ru> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2393 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: kuznet@ms2.inr.ac.ru Date: Thu, 1 May 2003 05:10:33 +0400 (MSD) [ Rusty, just skip down to "Ok ok ok!", it's something we've discussed before. Some of these problems are becomming so widespread that we need to implement a fix, I'll probably be the one to end up doing it... ] > 1) dev_get() gets module reference and dev_put() puts is. > Ugly, as this means dev_get() can fail, but this does > cover all the possible cases. Seems, you eventually _really_ understood why I histerically moan about bogosity of modules and maybe ready to recongnize that it is not just histerical moaning in some part. :-) Yes, I know, and we can talk about this until the cows come home... :-) > 2) Make unregister_netdev() wait for refcount to reach 1 > regardless of whether dev->destructor is NULL or not. > > I don't like #1. Do you see some holes in #2? It is deadlock. What exactly is this deadlock? Let me think... None of destructors kill reference to device object, and I mean none of them. It is why I thought the idea works. Also, holding RTNL semaphore does not block potential holders of device reference. Or does it? > As Stephen brought up, this also means we should do something > about that NETDEV_UNREGISTER code in dst_dev_event() :-( It is just one and the simplest subcase of general situation. Module must not be unloaded while device is held, that's all. Ok ok ok!!! Let us depict how it might work in your idealized module scheme ok? Your idea, as I understand it, is to add callback to module that freezes module then at some time in the future makes indication that module is clean and may be unloaded safely. The logic is that module knows about reference, internal attributes, etc. and thus can check for unloadability better than any simple refcounting system can. The best argument for this are things like ipv6 which are enormously complicated to load/unload safely. If we were to use the module refcounting system to make ipv6 unloading work cleanly, every other line of the ipv6 code would be some module get/put. :-) So the unload sequence is something like: /* Shutdown the module, so that no new references may * be created. */ rc = module->shutdown(module); if (rc) goto out_err; wait_event(&module->unload_waitq, module->unloadable); module->exit(); Then, using netdevice as an example, register_netdevice() would do something like: module_add_instance(dev->owner); which is simply: void module_add_instance(struct module *module) { if (module) atomic_inc(&module->instances); } The net device drivers add: module_shutdown(netdev_module_shutdown); And then netdev_module_shutdown() would go: int netdev_module_shutdown(struct module *module) { struct net_device *d, *d_next; rtnl_lock(); d_next = NULL; for (d = dev_base; d != NULL; d = d_next) { d_next = d->next; if (d->owner == module) { if (unregister_netdevice(d)) BUG(); /* Keep traversing, this module may drive * multiple device instances. */ } } rtnl_unlock(); return 0; } And, at final dev_put(), netdev_finish_unregister() is invoked, and we change it to look something like this: int netdev_finish_unregister(struct net_device *dev) { BUG_TRAP(!dev->ip_ptr); BUG_TRAP(!dev->ip6_ptr); BUG_TRAP(!dev->dn_ptr); if (!dev->deadbeaf) { printk(KERN_ERR "Freeing alive device %p, %s\n", dev, dev->name); return 0; } #ifdef NET_REFCNT_DEBUG printk(KERN_DEBUG "netdev_finish_unregister: %s%s.\n", dev->name, (dev->destructor != NULL)?"":", old style"); #endif if (dev->destructor) dev->destructor(dev); module_dec_instance(dev->owner); return 0; } We implement module_dev_instance as: void module_dec_instance(struct module *module) { if (module && atomic_dec_and_test(&module->instances)) module_is_unloadable(module); } Finally, we implement module_is_unloadable() which is simply: void module_is_unloadable(struct module *module) { if (module) { module->unloadable = 1; wake_up(&module->unload_waitq); } } Next, let us make socket example for "simple protocol". static int __init simple_proto_init(void) { int rc; rc = sock_register(&simple_proto_ops); if (rc) return rc; return 0; } static int __exit simple_proto_shutdown(struct module *module) { int rc; rc = sock_unregister(AF_SIMPLE); if (rc) return rc; return 0; } module_init(simple_proto_init); module_shutdown(simple_proto_shutdown); Now, where to place the code to mark module as unloadable? We could maintain state internally using: static atomic_t nr_simple_sockets; And as sockets are created/destroyed, we inc/dec this thing. When it is decreamented to zero, we can say module_is_unloadable(THIS_MODULE); Question is, where to make this? It cannot be in the module itself of course, that would create race condition similar to one which exists today making this exercise quite pointless :-) Alexey and I had some private conversations about this with Rusty, he agreed with our sentiments mostly, but he was concerned about unload semantics and some unfortunate side effects of two-stage unload. Specifically, consider: int example_module_shutdown(struct module *module) { int rc; rc = unregister_foo(&foo); if (rc) return rc; rc = unregister_bar(&bar); if (rc) { register_foo(&foo); return rc; } return 0; } Note the problem. If between the unregister_foo() and re-register of foo in the failure path, someone asks to open a "foo" it will fail. Those semantics absolutely stink. Rusty went on to describe that this is one of the reasons for the current "enable refcounts" scheme with try_module_get(). A single binary state enables all of the interfaces to start to succeed. ie. opening an object created by the module does not succeed until module->init() finishes successfully and thus try_module_get() starts to return success. I think he was trying to hint to us that some analogue needs to occur for shutdown as well. Ie. to rewrite my shutdown logic above: rc = stop_refcounts(mod); if (rc) goto out_err; if (mod->shutdown) { /* Shutdown the module, so that no new references may * be created. */ rc = mod->shutdown(mod); if (rc) { restart_refcounts(mod); goto out_err; } wait_event(&mod->unload_waitq, mod->unloadable); if (module_refcount(mod) != 0) BUG(); } restart_refcounts(mod); /* ... put here all the existing code in sys_delete_module() * ... dealing O_NONBLOCK etc. etc. */ module->exit(); module_free(mod); I am not even sure this is right. It's quite a tricky area and requires real brains to solve. From Robert.Olsson@data.slu.se Thu May 1 01:22:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 01:22:16 -0700 (PDT) Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h418M4Fu013102 for ; Thu, 1 May 2003 01:22:06 -0700 Received: (from robert@localhost) by robur.slu.se (8.9.3p2/8.9.3) id KAA15796; Thu, 1 May 2003 10:21:59 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16048.55590.505315.91846@robur.slu.se> Date: Thu, 1 May 2003 10:21:58 +0200 To: "Randy.Dunlap" Cc: Stephen Hemminger , davem@redhat.com, netdev@oss.sgi.com Subject: Re: Favorite network stress tests? In-Reply-To: <20030430165244.4b247318.rddunlap@osdl.org> References: <20030430163542.4e09076f.shemminger@osdl.org> <20030430165244.4b247318.rddunlap@osdl.org> X-Mailer: VM 6.92 under Emacs 19.34.1 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2394 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Robert.Olsson@data.slu.se Precedence: bulk X-list: netdev Randy.Dunlap writes: > ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/ > (is this the same pktgen that is now in 2.5 kernels?) No I've asked for testers but haven't got any response at all so it's not pushed towards 2.5 yet. Cheers. --ro From davem@redhat.com Thu May 1 03:27:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 03:27:31 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41ARLFu016848 for ; Thu, 1 May 2003 03:27:21 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA32684; Thu, 1 May 2003 02:20:40 -0700 Date: Thu, 01 May 2003 02:20:40 -0700 (PDT) Message-Id: <20030501.022040.15235711.davem@redhat.com> To: maxk@qualcomm.com Cc: acme@conectiva.com.br, netdev@oss.sgi.com Subject: Re: [PATCH] af_pppox: create module infrastructure for protocol modules From: "David S. Miller" In-Reply-To: <1051726260.13512.59.camel@localhost.localdomain> References: <5.1.0.14.2.20030429123317.10d71178@unixmail.qualcomm.com> <20030429.192931.104061911.davem@redhat.com> <1051726260.13512.59.camel@localhost.localdomain> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2395 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Max Krasnyansky Date: 30 Apr 2003 11:11:37 -0700 And like I said before if protocol wants for some reason to be around until sk is destroyed it will simply do sk_set_owner() right after alloc_sk(). Ok Maxim, thanks for your comments and insight. I believe this whole discussion is going to take a different course towards a solution, different than anything discussed in this thread so far. Please follow the discussion with topic "dev->destructor" on this list (netdev) to see precisely what I am talking about. From fw@deneb.enyo.de Thu May 1 03:38:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 03:39:02 -0700 (PDT) Received: from mail.enyo.de (gw.enyo.de [212.9.189.178]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41ActFu017517 for ; Thu, 1 May 2003 03:38:57 -0700 Received: from [212.9.189.171] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.34 #2) id 19BBSZ-0004yq-00; Thu, 01 May 2003 12:38:39 +0200 Received: from fw by deneb.enyo.de with local (Exim 3.34 #4) id 19BBSY-00019J-00; Thu, 01 May 2003 12:38:38 +0200 To: Robert Olsson Cc: Christoph Hellwig , davem@redhat.com, netdev@oss.sgi.com Subject: Re: purpose of the skb head pool References: <20030429135506.A22411@lst.de> <16046.30879.738356.495523@robur.slu.se> In-Reply-To: <16046.30879.738356.495523@robur.slu.se> (Robert Olsson's message of "Tue, 29 Apr 2003 15:05:35 +0200") From: Florian Weimer Date: Thu, 01 May 2003 12:38:38 +0200 Message-ID: <877k9bc5ox.fsf@deneb.enyo.de> User-Agent: Gnus/5.09002 (Oort Gnus v0.20) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2396 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: fw@deneb.enyo.de Precedence: bulk X-list: netdev Robert Olsson writes: > 2.6.66 IP. Forwarding of two input simplex flows. eth0->eth1, eth2->eth3 > Fixed affinity CPU0: eth0, eth3. CPU1: eth1, eth2. Which common for routing > and should be "worst case" for other use. The test should give a very high > load on the packet memory system. As seen at least we don't see any > improvement from skb_head_pool code. > > > Vanilla 2.5.66 381 kpps > Magazine 431 kpps > Magazine + no skb_head_pool 435 kpps Can you rerun this test with random source/destination addresses, to get more realistic (for some configurations) numbers? From davem@redhat.com Thu May 1 03:42:07 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 03:42:10 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41Ag6Fu017831 for ; Thu, 1 May 2003 03:42:07 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id CAA32743; Thu, 1 May 2003 02:35:28 -0700 Date: Thu, 01 May 2003 02:35:28 -0700 (PDT) Message-Id: <20030501.023528.116383508.davem@redhat.com> To: fw@deneb.enyo.de Cc: Robert.Olsson@data.slu.se, hch@lst.de, netdev@oss.sgi.com Subject: Re: purpose of the skb head pool From: "David S. Miller" In-Reply-To: <877k9bc5ox.fsf@deneb.enyo.de> References: <20030429135506.A22411@lst.de> <16046.30879.738356.495523@robur.slu.se> <877k9bc5ox.fsf@deneb.enyo.de> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2397 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Florian Weimer Date: Thu, 01 May 2003 12:38:38 +0200 Robert Olsson writes: > Vanilla 2.5.66 381 kpps > Magazine 431 kpps > Magazine + no skb_head_pool 435 kpps Can you rerun this test with random source/destination addresses, to get more realistic (for some configurations) numbers? He can do this, but the issues we're trying to tackle first have nothing to even do with what kind of routing cache accesses are done. It's all networking buffer overhead we're worried about at this stage. From davem@redhat.com Thu May 1 05:16:43 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 05:16:46 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41CGgFu019668 for ; Thu, 1 May 2003 05:16:42 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id EAA00589; Thu, 1 May 2003 04:09:36 -0700 Date: Thu, 01 May 2003 04:09:35 -0700 (PDT) Message-Id: <20030501.040935.68070653.davem@redhat.com> To: rusty@rustcorp.com.au Cc: kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br, wa@almesberger.net Subject: Re: dev->destructor From: "David S. Miller" In-Reply-To: <20030501120815.25BE22C155@lists.samba.org> References: <20030501.000058.39187964.davem@redhat.com> <20030501120815.25BE22C155@lists.samba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2398 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Rusty Russell Date: Thu, 01 May 2003 22:01:19 +1000 There are 70 calls to dev_hold() in the kernel. The vast majority of them already have a reference, they just want another one: dev_hold() can do __module_get(). Rusty, this is precisely the what Alexey and myself want to avoid. On the surface, it looks fine, only 70 dev_get's in the kernel right? But think further... So you propose to add this kind of thing for every ARP entry, every route cache entry, every IPSEC policy, every socket, every struct sock, every networking dynamic object ever created? When we add SKB recycling, will we need to do a module get/put on every SKB alloc/free/clone/copy? I think this way lies insanity :) You may make the decision to eat this kind of overhead inside of netfilter, but Alexey and I do not accept this. I disagreed with Alexey initially, but now I truly see his wisdom. This networking device example is just the tip of the iceberg. We can continue to add bandaids across the kernel, instead of solving the real problem that modules need to manage their own removal. It is at the core of the reason the current module scheme has to be extended to let the module manage unloading. From rusty@samba.org Thu May 1 05:49:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 05:49:40 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41CnDFu020246 for ; Thu, 1 May 2003 05:49:14 -0700 Received: by lists.samba.org (Postfix, from userid 590) id 25BE22C155; Thu, 1 May 2003 12:08:15 +0000 (GMT) From: Rusty Russell To: "David S. Miller" Cc: kuznet@ms2.inr.ac.ru Cc: shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br, Werner Almesberger Subject: Re: dev->destructor In-reply-to: Your message of "Thu, 01 May 2003 00:00:58 MST." <20030501.000058.39187964.davem@redhat.com> Date: Thu, 01 May 2003 22:01:19 +1000 Message-Id: <20030501120815.25BE22C155@lists.samba.org> X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2399 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rusty@rustcorp.com.au Precedence: bulk X-list: netdev In message <20030501.000058.39187964.davem@redhat.com> you write: > From: kuznet@ms2.inr.ac.ru > Date: Thu, 1 May 2003 05:10:33 +0400 (MSD) > > [ Rusty, just skip down to "Ok ok ok!", it's something we've > discussed before. Some of these problems are becomming so > widespread that we need to implement a fix, I'll probably be > the one to end up doing it... ] > > > 1) dev_get() gets module reference and dev_put() puts is. > > Ugly, as this means dev_get() can fail, but this does > > cover all the possible cases. > > Seems, you eventually _really_ understood why I histerically moan > about bogosity of modules and maybe ready to recongnize that it is > not just histerical moaning in some part. :-) > > Yes, I know, and we can talk about this until the cows come > home... :-) I agree with Alexey. Modules are poor-man's microkernel: allowing them to be unloaded has always been a horror. But I failed to convince my collegues of this at the 2002 kernel summit, so I did the best with what we had. If I had my way, we would *never* remove modules (even on failed init: we might re-try init later, but never free the memory). But before we redesign module architecture from scratch, let's look at a solution with what we do have (assuming Linus takes my damn __module_get() patch some day, see below). There are 70 calls to dev_hold() in the kernel. The vast majority of them already have a reference, they just want another one: dev_hold() can do __module_get(). There are a few *sources* of devices: dev_get, dev_get_by*. These should check and fail, using "try_dev_hold()" or something. Unfortunately auditing all the __dev_get_by* is quite a task, since it's used very widely (and I think, sometime erroneously). Completely untested patch below other patch. I need more time to digest your proposal in detail, Dave. Expect reply w/in 24 hours. Thanks. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. Name: __module_get Author: Rusty Russell Status: Tested on 2.5.68-bk10 D: Introduces __module_get for places where we know we already hold D: a reference and ignoring the fact that the module is being "rmmod --wait"ed D: is simpler. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5001-linux-2.5.67-bk5/fs/filesystems.c .5001-linux-2.5.67-bk5.updated/fs/filesystems.c --- .5001-linux-2.5.67-bk5/fs/filesystems.c 2003-04-14 13:45:44.000000000 +1000 +++ .5001-linux-2.5.67-bk5.updated/fs/filesystems.c 2003-04-14 15:44:36.000000000 +1000 @@ -32,17 +32,7 @@ static rwlock_t file_systems_lock = RW_L /* WARNING: This can be used only if we _already_ own a reference */ void get_filesystem(struct file_system_type *fs) { - if (!try_module_get(fs->owner)) { -#ifdef CONFIG_MODULE_UNLOAD - unsigned int cpu = get_cpu(); - local_inc(&fs->owner->ref[cpu].count); - put_cpu(); -#else - /* Getting filesystem while it's starting up? We're - already supposed to have a reference. */ - BUG(); -#endif - } + __module_get(fs->owner); } void put_filesystem(struct file_system_type *fs) diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5001-linux-2.5.67-bk5/include/linux/module.h .5001-linux-2.5.67-bk5.updated/include/linux/module.h --- .5001-linux-2.5.67-bk5/include/linux/module.h 2003-04-08 11:15:01.000000000 +1000 +++ .5001-linux-2.5.67-bk5.updated/include/linux/module.h 2003-04-14 15:45:15.000000000 +1000 @@ -255,6 +255,7 @@ struct module *module_text_address(unsig #ifdef CONFIG_MODULE_UNLOAD +unsigned int module_refcount(struct module *mod); void __symbol_put(const char *symbol); #define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x) void symbol_put_addr(void *addr); @@ -265,6 +266,17 @@ void symbol_put_addr(void *addr); #define local_dec(x) atomic_dec(x) #endif +/* Sometimes we know we already have a refcount, and it's easier not + to handle the error case (which only happens with rmmod --wait). */ +static inline void __module_get(struct module *module) +{ + if (module) { + BUG_ON(module_refcount(module) == 0); + local_inc(&module->ref[get_cpu()].count); + put_cpu(); + } +} + static inline int try_module_get(struct module *module) { int ret = 1; @@ -300,6 +317,9 @@ static inline int try_module_get(struct static inline void module_put(struct module *module) { } +static inline void __module_get(struct module *module) +{ +} #define symbol_put(x) do { } while(0) #define symbol_put_addr(p) do { } while(0) @@ -357,6 +377,10 @@ static inline struct module *module_text #define symbol_put(x) do { } while(0) #define symbol_put_addr(x) do { } while(0) +static inline void __module_get(struct module *module) +{ +} + static inline int try_module_get(struct module *module) { return 1; diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .5001-linux-2.5.67-bk5/kernel/module.c .5001-linux-2.5.67-bk5.updated/kernel/module.c --- .5001-linux-2.5.67-bk5/kernel/module.c 2003-04-14 13:45:46.000000000 +1000 +++ .5001-linux-2.5.67-bk5.updated/kernel/module.c 2003-04-14 15:44:36.000000000 +1000 @@ -431,7 +431,7 @@ static inline void restart_refcounts(voi } #endif -static unsigned int module_refcount(struct module *mod) +unsigned int module_refcount(struct module *mod) { unsigned int i, total = 0; @@ -439,6 +439,7 @@ static unsigned int module_refcount(stru total += atomic_read(&mod->ref[i].count); return total; } +EXPORT_SYMBOL(module_refcount); /* This exists whether we can unload or not */ static void free_module(struct module *mod); ================ Name: try_dev_hold Author: Rusty Russell Status: Experimental Depends: Module/module_dup.patch.gz D: Make dev_hold() actually duplicate the module reference count, and D: introduce try_dev_hold() for where refcount may be zero. Some D: places seemed to use dev_hold() to initialize the device reference D: count to 1. D: D: Lots of fixmes caused by quick audit of __dev_get_by* and D: dev_getbyhwaddr. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/drivers/net/shaper.c working-2.5.68-bk10-netdevice/drivers/net/shaper.c --- linux-2.5.68-bk10/drivers/net/shaper.c 2003-04-08 11:14:26.000000000 +1000 +++ working-2.5.68-bk10-netdevice/drivers/net/shaper.c 2003-05-01 21:21:46.000000000 +1000 @@ -526,6 +526,7 @@ static int shaper_neigh_setup_dev(struct static int shaper_attach(struct net_device *shdev, struct shaper *sh, struct net_device *dev) { + /* FIXME: No reference to dev --RR */ sh->dev = dev; sh->hard_start_xmit=dev->hard_start_xmit; sh->get_stats=dev->get_stats; diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/drivers/net/wan/dlci.c working-2.5.68-bk10-netdevice/drivers/net/wan/dlci.c --- linux-2.5.68-bk10/drivers/net/wan/dlci.c 2003-03-18 05:01:46.000000000 +1100 +++ working-2.5.68-bk10-netdevice/drivers/net/wan/dlci.c 2003-05-01 21:20:31.000000000 +1000 @@ -457,6 +457,7 @@ int dlci_add(struct dlci_add *dlci) *(short *)(master->dev_addr) = dlci->dlci; dlp = (struct dlci_local *) master->priv; + /* FIXME: We have no reference to slave here. --RR */ dlp->slave = slave; flp = slave->priv; @@ -484,6 +485,7 @@ int dlci_del(struct dlci_add *dlci) /* validate slave device */ master = __dev_get_by_name(dlci->devname); + /* FIXME: No lock, no reference held to master. --RR */ if (!master) return(-ENODEV); diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/include/linux/netdevice.h working-2.5.68-bk10-netdevice/include/linux/netdevice.h --- linux-2.5.68-bk10/include/linux/netdevice.h 2003-04-08 11:15:01.000000000 +1000 +++ working-2.5.68-bk10-netdevice/include/linux/netdevice.h 2003-05-01 20:03:27.000000000 +1000 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -634,7 +635,25 @@ static inline void dev_put(struct net_de } #define __dev_put(dev) atomic_dec(&(dev)->refcnt) -#define dev_hold(dev) atomic_inc(&(dev)->refcnt) + +/* If you already have a reference, and are duplicating it. */ +#define dev_hold(dev) \ +do { \ + atomic_inc(&(dev)->refcnt); \ + __module_get((dev)->owner); \ +} while(0) + +/* If you need a new reference, or will be holding it for a long time. + If this returns 0, pretend dev doesn't exist (it's being removed now). */ +#define try_dev_hold(dev) \ +({ \ + int __ret = 1; \ + if (try_module_get((dev)->owner)) \ + atomic_inc(&(dev)->refcnt); \ + else \ + __ret = 0; \ + __ret; \ +}) /* Carrier loss detection, dial on demand. The functions netif_carrier_on * and _off may be called from IRQ context, but it is caller diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/802/tr.c working-2.5.68-bk10-netdevice/net/802/tr.c --- linux-2.5.68-bk10/net/802/tr.c 2003-02-07 19:21:54.000000000 +1100 +++ working-2.5.68-bk10-netdevice/net/802/tr.c 2003-05-01 21:24:46.000000000 +1000 @@ -479,6 +479,7 @@ static int rif_get_info(char *buffer,cha for(i=0;i < RIF_TABLE_SIZE;i++) { for(entry=rif_table[i];entry;entry=entry->next) { + /* FIXME: No lock, and no reference to dev. --RR */ struct net_device *dev = __dev_get_by_index(entry->iface); size=sprintf(buffer+len,"%s %02X:%02X:%02X:%02X:%02X:%02X %7li ", diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/appletalk/ddp.c working-2.5.68-bk10-netdevice/net/appletalk/ddp.c --- linux-2.5.68-bk10/net/appletalk/ddp.c 2003-05-01 09:29:34.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/appletalk/ddp.c 2003-05-01 21:28:03.000000000 +1000 @@ -920,6 +920,7 @@ static int atrtr_ioctl(unsigned int cmd, * space, isn't it? */ if (rt.rt_dev) { + /* FIXME: No lock, and no reference to dev --RR */ dev = __dev_get_by_name(rt.rt_dev); if (!dev) return -ENODEV; @@ -1217,6 +1218,7 @@ static __inline__ int is_ip_over_ddp(str static int handle_ip_over_ddp(struct sk_buff *skb) { + /* FIXME: No lock, and no reference held to dev. --RR */ struct net_device *dev = __dev_get_by_name("ipddp0"); struct net_device_stats *stats; diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/core/dev.c working-2.5.68-bk10-netdevice/net/core/dev.c --- linux-2.5.68-bk10/net/core/dev.c 2003-05-01 09:29:35.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/core/dev.c 2003-05-01 20:19:24.000000000 +1000 @@ -440,8 +440,8 @@ struct net_device *dev_get_by_name(const read_lock(&dev_base_lock); dev = __dev_get_by_name(name); - if (dev) - dev_hold(dev); + if (dev && !try_dev_hold(dev)) + dev = NULL; read_unlock(&dev_base_lock); return dev; } @@ -513,8 +513,8 @@ struct net_device *dev_get_by_index(int read_lock(&dev_base_lock); dev = __dev_get_by_index(ifindex); - if (dev) - dev_hold(dev); + if (dev && !try_dev_hold(dev)) + dev = NULL; read_unlock(&dev_base_lock); return dev; } @@ -563,8 +563,8 @@ struct net_device * dev_get_by_flags(uns read_lock(&dev_base_lock); dev = __dev_get_by_flags(if_flags, mask); - if (dev) - dev_hold(dev); + if (dev && !try_dev_hold(dev)) + dev = NULL; read_unlock(&dev_base_lock); return dev; } @@ -2611,7 +2611,7 @@ int register_netdevice(struct net_device dev_init_scheduler(dev); write_lock_bh(&dev_base_lock); *dp = dev; - dev_hold(dev); + atomic_set(&dev->refcnt, 1); dev->deadbeaf = 0; write_unlock_bh(&dev_base_lock); @@ -2899,7 +2899,7 @@ static int __init net_dev_init(void) #endif dev->xmit_lock_owner = -1; dev->iflink = -1; - dev_hold(dev); + atomic_set(&dev->refcnt, 1); /* * Allocate name. If the init() fails diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/core/pktgen.c working-2.5.68-bk10-netdevice/net/core/pktgen.c --- linux-2.5.68-bk10/net/core/pktgen.c 2003-05-01 09:29:35.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/core/pktgen.c 2003-05-01 21:31:45.000000000 +1000 @@ -226,21 +226,20 @@ static struct net_device *setup_inject(s { struct net_device *odev; - rtnl_lock(); - odev = __dev_get_by_name(info->outdev); + odev = dev_get(info->outdev); if (!odev) { sprintf(info->result, "No such netdevice: \"%s\"", info->outdev); - goto out_unlock; + return NULL; } if (odev->type != ARPHRD_ETHER) { sprintf(info->result, "Not ethernet device: \"%s\"", info->outdev); - goto out_unlock; + goto out_put; } if (!netif_running(odev)) { sprintf(info->result, "Device is down: \"%s\"", info->outdev); - goto out_unlock; + goto out_put; } /* Default to the interface's mac if not explicitly set. */ @@ -281,14 +280,11 @@ static struct net_device *setup_inject(s info->cur_daddr = info->daddr_min; info->cur_udp_dst = info->udp_dst_min; info->cur_udp_src = info->udp_src_min; - - atomic_inc(&odev->refcnt); - rtnl_unlock(); return odev; -out_unlock: - rtnl_unlock(); +out_put: + dev_put(odev); return NULL; } diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/decnet/dn_dev.c working-2.5.68-bk10-netdevice/net/decnet/dn_dev.c --- linux-2.5.68-bk10/net/decnet/dn_dev.c 2003-04-20 18:05:16.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/decnet/dn_dev.c 2003-05-01 21:35:29.000000000 +1000 @@ -312,9 +312,7 @@ struct net_device *dn_dev_get_default(vo read_lock(&dndev_lock); dev = decnet_default_device; if (dev) { - if (dev->dn_ptr) - dev_hold(dev); - else + if (!dev->dn_ptr || !try_dev_hold(dev)) dev = NULL; } read_unlock(&dndev_lock); @@ -584,6 +582,8 @@ int dn_dev_ioctl(unsigned int cmd, void goto done; } + /* FIXME: if cmd == SIOCGIFADDR, don't hold lock, and don't + have reference to dev. --RR */ if ((dn_db = dev->dn_ptr) != NULL) { for (ifap = &dn_db->ifa_list; (ifa=*ifap) != NULL; ifap = &ifa->ifa_next) if (strcmp(ifr->ifr_name, ifa->ifa_label) == 0) @@ -677,6 +677,7 @@ static int dn_dev_rtm_newaddr(struct sk_ if (rta[IFA_LOCAL-1] == NULL) return -EINVAL; + /* FIXME: Don't have lock, and don't hold reference to dev. --RR */ if ((dev = __dev_get_by_index(ifm->ifa_index)) == NULL) return -ENODEV; @@ -1189,9 +1190,10 @@ void dn_dev_up(struct net_device *dev) * configured ethernet card in the system. */ if (maybe_default) { - dev_hold(dev); - if (dn_dev_set_default(dev, 0)) - dev_put(dev); + if (try_dev_hold(dev)) { + if (dn_dev_set_default(dev, 0)) + dev_put(dev); + } } } diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/decnet/dn_fib.c working-2.5.68-bk10-netdevice/net/decnet/dn_fib.c --- linux-2.5.68-bk10/net/decnet/dn_fib.c 2003-04-20 18:05:16.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/decnet/dn_fib.c 2003-05-01 21:43:34.000000000 +1000 @@ -218,7 +218,9 @@ static int dn_fib_check_nh(const struct if (!(dev->flags&IFF_UP)) return -ENETDOWN; nh->nh_dev = dev; - atomic_inc(&dev->refcnt); + /* FIXME: Must hold lock, or use dev_get_by_index. + --RR */ + dev_hold(dev); nh->nh_scope = RT_SCOPE_LINK; return 0; } @@ -262,7 +264,7 @@ out: if (!(dev->flags&IFF_UP)) return -ENETDOWN; nh->nh_dev = dev; - atomic_inc(&nh->nh_dev->refcnt); + dev_hold(&nh->nh_dev); nh->nh_scope = RT_SCOPE_HOST; } diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/decnet/dn_route.c working-2.5.68-bk10-netdevice/net/decnet/dn_route.c --- linux-2.5.68-bk10/net/decnet/dn_route.c 2003-05-01 09:29:35.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/decnet/dn_route.c 2003-05-01 20:58:08.000000000 +1000 @@ -891,7 +891,9 @@ static int dn_route_output_slow(struct d read_unlock(&dev_base_lock); if (dev_out == NULL) goto out; - dev_hold(dev_out); + /* FIXME: Shouldn't this be inside the lock? --RR */ + if (!try_dev_hold(dev_out)) + goto out; source_ok: ; } @@ -960,8 +962,10 @@ source_ok: } else { dev_out = neigh->dev; } - dev_hold(dev_out); - goto select_source; + if (try_dev_hold(dev_out)) + goto select_source; + else + dev_out = NULL; } } } @@ -1035,7 +1039,10 @@ select_source: if (dev_out) dev_put(dev_out); dev_out = DN_FIB_RES_DEV(res); - dev_hold(dev_out); + if (!try_dev_hold(dev_out)) { + dev_out = NULL; + goto e_addr; + } fl.oif = dev_out->ifindex; gateway = DN_FIB_RES_GW(res); @@ -1231,7 +1238,8 @@ static int dn_route_input_slow(struct sk "No output device\n"); goto e_inval; } - dev_hold(out_dev); + if (!try_dev_hold(out_dev)) + goto e_inval; if (res.r) src_map = dn_fib_rules_policy(fl.fld_src, &res, &flags); @@ -1349,8 +1357,12 @@ make_route: rt->u.dst.input = dn_blackhole; } rt->rt_flags = flags; - if (rt->u.dst.dev) - dev_hold(rt->u.dst.dev); + if (rt->u.dst.dev) { + if (!try_dev_hold(rt->u.dst.dev)) { + dst_free(&rt->u.dst); + goto e_inval; + } + } if (dn_rt_set_next_hop(rt, &res)) goto e_neighbour; diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/decnet/dn_rules.c working-2.5.68-bk10-netdevice/net/decnet/dn_rules.c --- linux-2.5.68-bk10/net/decnet/dn_rules.c 2003-04-20 18:05:16.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/decnet/dn_rules.c 2003-05-01 21:37:58.000000000 +1000 @@ -173,6 +173,7 @@ int dn_fib_rtm_newrule(struct sk_buff *s memcpy(new_r->r_ifname, RTA_DATA(rta[RTA_IIF-1]), IFNAMSIZ); new_r->r_ifname[IFNAMSIZ-1] = 0; new_r->r_ifindex = -1; + /* FIXME: Don't hold lock, and don't get reference. --RR */ dev = __dev_get_by_name(new_r->r_ifname); if (dev) new_r->r_ifindex = dev->ifindex; diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/ipv4/fib_frontend.c working-2.5.68-bk10-netdevice/net/ipv4/fib_frontend.c --- linux-2.5.68-bk10/net/ipv4/fib_frontend.c 2003-05-01 20:36:37.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/ipv4/fib_frontend.c 2003-01-02 12:30:47.000000000 +1100 @@ -115,8 +115,8 @@ struct net_device * ip_dev_find(u32 addr if (res.type != RTN_LOCAL) goto out; dev = FIB_RES_DEV(res); - if (dev && !try_dev_get(dev)) - dev = NULL; + if (dev) + atomic_inc(&dev->refcnt); out: fib_res_put(&res); diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/ipv4/fib_semantics.c working-2.5.68-bk10-netdevice/net/ipv4/fib_semantics.c --- linux-2.5.68-bk10/net/ipv4/fib_semantics.c 2003-05-01 09:29:35.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/ipv4/fib_semantics.c 2003-05-01 21:39:20.000000000 +1000 @@ -405,6 +405,8 @@ static int fib_check_nh(const struct rtm return -ENODEV; if (!(dev->flags&IFF_UP)) return -ENETDOWN; + /* FIXME: Must hold lock, or use dev_get_by_index. + --RR */ nh->nh_dev = dev; atomic_inc(&dev->refcnt); nh->nh_scope = RT_SCOPE_LINK; diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/ipv4/ip_gre.c working-2.5.68-bk10-netdevice/net/ipv4/ip_gre.c --- linux-2.5.68-bk10/net/ipv4/ip_gre.c 2003-05-01 09:29:35.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/ipv4/ip_gre.c 2003-05-01 21:50:07.000000000 +1000 @@ -289,7 +289,7 @@ static struct ip_tunnel * ipgre_tunnel_l if (register_netdevice(dev) < 0) goto failed; - dev_hold(dev); + atomic_set(&dev->refcnt, 1); ipgre_tunnel_link(nt); /* Do not decrement MOD_USE_COUNT here. */ return nt; @@ -1205,6 +1205,7 @@ static int ipgre_tunnel_init(struct net_ } if (!tdev && tunnel->parms.link) + /* FIXME: Don't hold lock, don't grab reference. --RR */ tdev = __dev_get_by_index(tunnel->parms.link); if (tdev) { diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/ipv4/ipip.c working-2.5.68-bk10-netdevice/net/ipv4/ipip.c --- linux-2.5.68-bk10/net/ipv4/ipip.c 2003-04-20 18:05:16.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/ipv4/ipip.c 2003-05-01 21:51:47.000000000 +1000 @@ -259,7 +259,7 @@ static struct ip_tunnel * ipip_tunnel_lo if (register_netdevice(dev) < 0) goto failed; - dev_hold(dev); + atomic_set(&dev->refcnt, 1); ipip_tunnel_link(nt); /* Do not decrement MOD_USE_COUNT here. */ return nt; @@ -841,6 +841,7 @@ static int ipip_tunnel_init(struct net_d } if (!tdev && tunnel->parms.link) + /* FIXME: Don't hold lock, don't grab reference. --RR */ tdev = __dev_get_by_index(tunnel->parms.link); if (tdev) { diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/ipv4/ipmr.c working-2.5.68-bk10-netdevice/net/ipv4/ipmr.c --- linux-2.5.68-bk10/net/ipv4/ipmr.c 2003-03-25 12:17:32.000000000 +1100 +++ working-2.5.68-bk10-netdevice/net/ipv4/ipmr.c 2003-05-01 20:39:29.000000000 +1000 @@ -443,7 +443,6 @@ static int vif_add(struct vifctl *vifc, /* And finish update writing critical data */ write_lock_bh(&mrt_lock); - dev_hold(dev); v->dev=dev; #ifdef CONFIG_IP_PIMSM if (v->flags&VIFF_REGISTER) @@ -1441,8 +1440,8 @@ int pim_rcv_v1(struct sk_buff * skb) read_lock(&mrt_lock); if (reg_vif_num >= 0) reg_dev = vif_table[reg_vif_num].dev; - if (reg_dev) - dev_hold(reg_dev); + if (reg_dev && !try_dev_hold(reg_dev)) + reg_dev = NULL; read_unlock(&mrt_lock); if (reg_dev == NULL) { @@ -1508,8 +1507,8 @@ int pim_rcv(struct sk_buff * skb) read_lock(&mrt_lock); if (reg_vif_num >= 0) reg_dev = vif_table[reg_vif_num].dev; - if (reg_dev) - dev_hold(reg_dev); + if (reg_dev && !try_dev_hold(reg_dev)) + reg_dev = NULL; read_unlock(&mrt_lock); if (reg_dev == NULL) { diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/ipv4/route.c working-2.5.68-bk10-netdevice/net/ipv4/route.c --- linux-2.5.68-bk10/net/ipv4/route.c 2003-05-01 09:29:35.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/ipv4/route.c 2003-05-01 20:41:28.000000000 +1000 @@ -1992,7 +1992,8 @@ int ip_route_output_slow(struct rtable * if (dev_out) dev_put(dev_out); dev_out = FIB_RES_DEV(res); - dev_hold(dev_out); + if (!try_dev_hold(dev_out)) + goto e_inval; fl.oif = dev_out->ifindex; make_route: diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/ipv6/sit.c working-2.5.68-bk10-netdevice/net/ipv6/sit.c --- linux-2.5.68-bk10/net/ipv6/sit.c 2003-04-20 18:05:17.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/ipv6/sit.c 2003-05-01 21:55:21.000000000 +1000 @@ -197,7 +197,7 @@ static struct ip_tunnel * ipip6_tunnel_l if (register_netdevice(dev) < 0) goto failed; - dev_hold(dev); + atomic_set(&dev->refcount, 1); ipip6_tunnel_link(nt); /* Do not decrement MOD_USE_COUNT here. */ return nt; @@ -778,6 +778,7 @@ static int ipip6_tunnel_init(struct net_ } if (!tdev && tunnel->parms.link) + /* FIXME: Don't hold lock, don't grab reference. --RR */ tdev = __dev_get_by_index(tunnel->parms.link); if (tdev) { diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/llc/af_llc.c working-2.5.68-bk10-netdevice/net/llc/af_llc.c --- linux-2.5.68-bk10/net/llc/af_llc.c 2003-05-01 09:29:36.000000000 +1000 +++ working-2.5.68-bk10-netdevice/net/llc/af_llc.c 2003-05-01 21:13:49.000000000 +1000 @@ -256,6 +256,7 @@ static int llc_ui_autobind(struct socket rc = -ENETUNREACH; if (!dev) goto out; + /* FIXME: We don't hold a reference to dev --RR */ llc->dev = dev; } /* bind to a specific sap, optional. */ @@ -419,6 +420,7 @@ static int llc_ui_connect(struct socket rtnl_unlock(); if (!dev) goto out; + /* FIXME: We don't hold a reference to dev --RR */ llc->dev = dev; } else dev = llc->dev; @@ -764,6 +766,7 @@ static int llc_ui_sendmsg(struct kiocb * dev = dev_getbyhwaddr(addr->sllc_arphrd, addr->sllc_smac); rtnl_unlock(); rc = -ENETUNREACH; + /* FIXME: We don't hold a reference to dev --RR */ if (!dev) goto release; } else diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/netrom/nr_route.c working-2.5.68-bk10-netdevice/net/netrom/nr_route.c --- linux-2.5.68-bk10/net/netrom/nr_route.c 2003-01-02 12:27:51.000000000 +1100 +++ working-2.5.68-bk10-netdevice/net/netrom/nr_route.c 2003-05-01 20:51:51.000000000 +1000 @@ -567,8 +567,8 @@ struct net_device *nr_dev_get(ax25_addre read_lock(&dev_base_lock); for (dev = dev_base; dev != NULL; dev = dev->next) { if ((dev->flags & IFF_UP) && dev->type == ARPHRD_NETROM && ax25cmp(addr, (ax25_address *)dev->dev_addr) == 0) { - dev_hold(dev); - goto out; + if (try_dev_hold(dev)) + goto out; } } out: diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.68-bk10/net/rose/rose_route.c working-2.5.68-bk10-netdevice/net/rose/rose_route.c --- linux-2.5.68-bk10/net/rose/rose_route.c 2003-03-18 12:21:41.000000000 +1100 +++ working-2.5.68-bk10-netdevice/net/rose/rose_route.c 2003-05-01 20:51:59.000000000 +1000 @@ -629,8 +629,8 @@ struct net_device *rose_dev_get(rose_add read_lock(&dev_base_lock); for (dev = dev_base; dev != NULL; dev = dev->next) { if ((dev->flags & IFF_UP) && dev->type == ARPHRD_ROSE && rosecmp(addr, (rose_address *)dev->dev_addr) == 0) { - dev_hold(dev); - goto out; + if (try_dev_hold(dev)) + goto out; } } out: From Eric.Lemoine@Sun.COM Thu May 1 10:29:22 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 10:29:25 -0700 (PDT) Received: from patan.sun.com (patan.Sun.COM [192.18.98.43]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41HTLFu025897 for ; Thu, 1 May 2003 10:29:22 -0700 Received: from esunmail ([129.147.58.198]) by patan.sun.com (8.9.3p2+Sun/8.9.3) with ESMTP id LAA04945 for ; Thu, 1 May 2003 11:29:21 -0600 (MDT) Received: from xpa-fe2 (esunmail [129.147.58.120]) by edgemail1.Central.Sun.COM (iPlanet Messaging Server 5.2 HotFix 1.12 (built Feb 13 2003)) with ESMTP id <0HE7001HYXV0X8@edgemail1.Central.Sun.COM> for netdev@oss.sgi.com; Thu, 01 May 2003 11:28:13 -0600 (MDT) Received: from udine ([140.77.13.122]) by mail.sun.net (iPlanet Messaging Server 5.2 HotFix 1.12 (built Feb 13 2003)) with ESMTPSA id <0HE700AA2XUZRJ@mail.sun.net> for netdev@oss.sgi.com; Thu, 01 May 2003 11:28:12 -0600 (MDT) Received: by udine (sSMTP sendmail emulation); Thu, 01 May 2003 19:28:05 +0200 Date: Thu, 01 May 2003 19:28:05 +0200 From: Eric Lemoine Subject: sendfile() & unlink() To: netdev@oss.sgi.com Message-id: <20030501172805.GB4580@udine> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT Content-disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2401 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Eric.Lemoine@Sun.COM Precedence: bulk X-list: netdev Hi What happens when unlink()ing a file that has just been sendfile()d? Is there any guarantee that the file remains in the buffer cache while TCP completes the xmit? Thx. -- Eric From acme@conectiva.com.br Thu May 1 10:28:24 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 10:28:41 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41HSMFu025815 for ; Thu, 1 May 2003 10:28:24 -0700 Received: from [200.181.169.6] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19BHxD-0003zH-00; Thu, 01 May 2003 14:34:43 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id AE6B01966C; Thu, 1 May 2003 17:28:22 +0000 (UTC) Date: Thu, 1 May 2003 14:28:22 -0300 From: Arnaldo Carvalho de Melo To: Rusty Russell Cc: "David S. Miller" , kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, Werner Almesberger Subject: Re: dev->destructor Message-ID: <20030501172822.GE3387@conectiva.com.br> References: <20030501.000058.39187964.davem@redhat.com> <20030501120815.25BE22C155@lists.samba.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030501120815.25BE22C155@lists.samba.org> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2400 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Em Thu, May 01, 2003 at 10:01:19PM +1000, Rusty Russell escreveu: > But before we redesign module architecture from scratch, let's look at > a solution with what we do have (assuming Linus takes my damn > __module_get() patch some day, see below). Linus took the __module_get patch, I even used it in redesigning the way struct sock and struct socket are handled in response to Max Krasnyansky alternative patches > There are 70 calls to dev_hold() in the kernel. The vast majority of > them already have a reference, they just want another one: dev_hold() > can do __module_get(). yes > There are a few *sources* of devices: dev_get, dev_get_by*. These > should check and fail, using "try_dev_hold()" or something. > Unfortunately auditing all the __dev_get_by* is quite a task, since > it's used very widely (and I think, sometime erroneously). > > Completely untested patch below other patch. > > I need more time to digest your proposal in detail, Dave. Expect > reply w/in 24 hours. I'm digesting it as well 8) - Arnaldo From modica@sgi.com Thu May 1 10:33:36 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 10:33:53 -0700 (PDT) Received: from zok.sgi.com (zok.SGI.COM [204.94.215.101]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41HXaFu026460 for ; Thu, 1 May 2003 10:33:36 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by zok.sgi.com (8.12.9/8.12.2/linux-outbound_gateway-1.2) with ESMTP id h41HXVVV025119 for ; Thu, 1 May 2003 10:33:31 -0700 Received: from sgi.com (eagdhcp-232-154.americas.sgi.com [128.162.232.154]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id h41HXT4O5978526; Thu, 1 May 2003 10:33:30 -0700 (PDT) Message-ID: <3EB15A69.1000706@sgi.com> Date: Thu, 01 May 2003 12:33:29 -0500 From: Steve Modica Organization: SGI User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030425 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Eric Lemoine CC: netdev@oss.sgi.com Subject: Re: sendfile() & unlink() References: <20030501172805.GB4580@udine> In-Reply-To: <20030501172805.GB4580@udine> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2402 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: modica@sgi.com Precedence: bulk X-list: netdev Unlinking an open tmp file (so it doesn't show up in a directory listing ) is a pretty common thing to do on Irix. I sure hope one can do that on linux :) Eric Lemoine wrote: > Hi > > What happens when unlink()ing a file that has just been sendfile()d? Is > there any guarantee that the file remains in the buffer cache while TCP > completes the xmit? > > Thx. > -- Steve Modica work: 651-683-3224 mobile: 651-261-3201 Manager - Networking Drivers Group "Give a man a fish, and he will eat for a day, hit him with a fish and he leaves you alone" - me From acme@conectiva.com.br Thu May 1 10:51:10 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 10:51:28 -0700 (PDT) Received: from orion.netbank.com.br (orion.netbank.com.br [200.203.199.90]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41Hp8Fu027122 for ; Thu, 1 May 2003 10:51:09 -0700 Received: from [200.181.169.6] (helo=brinquendo.conectiva.com.br) by orion.netbank.com.br with asmtp (Exim 3.33 #1) id 19BIJL-00040v-00; Thu, 01 May 2003 14:57:36 -0300 Received: by brinquendo.conectiva.com.br (Postfix, from userid 500) id 8B17E1966C; Thu, 1 May 2003 17:51:12 +0000 (UTC) Date: Thu, 1 May 2003 14:51:11 -0300 From: Arnaldo Carvalho de Melo To: "David S. Miller" Cc: rusty@rustcorp.com.au, kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, wa@almesberger.net Subject: Re: dev->destructor Message-ID: <20030501175111.GF3387@conectiva.com.br> References: <20030501.000058.39187964.davem@redhat.com> <20030501120815.25BE22C155@lists.samba.org> <20030501.040935.68070653.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030501.040935.68070653.davem@redhat.com> X-Url: http://advogato.org/person/acme Organization: Conectiva S.A. User-Agent: Mutt/1.5.4i X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2403 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: acme@conectiva.com.br Precedence: bulk X-list: netdev Em Thu, May 01, 2003 at 04:09:35AM -0700, David S. Miller escreveu: > From: Rusty Russell > Date: Thu, 01 May 2003 22:01:19 +1000 > > There are 70 calls to dev_hold() in the kernel. The vast majority of > them already have a reference, they just want another one: dev_hold() > can do __module_get(). > > Rusty, this is precisely the what Alexey and myself want to avoid. On > the surface, it looks fine, only 70 dev_get's in the kernel right? > But think further... > > So you propose to add this kind of thing for every ARP entry, every > route cache entry, every IPSEC policy, every socket, every struct > sock, every networking dynamic object ever created? ALERT: brainstorming and expecting for comments from the people who knows this better. Well, I think that because there are a graph of relationships here we perhaps can be safe by protecting just some of the higher level objects (e.g. struct sock, struct socket, struct net_device) while leaving some other lower level objects managed by those higher level ones, e.g. struct sk_buff managed by struct sock. This came to me while discussing the struct socket and struct sock module infrastructure with Max, specifically when net family modules (e.g. AF_INET) doesn't requires protecting for each and every struct socket created, as the protocol modules (e.g.: udp, raw, tcp) have to somehow register with the net family module and by just using one exported function (register_protocol type functions: register_pppox_proto, bt_sock_register, register_8022_client, register_snap_client, llc_sap_open, etc) makes the net family module/lower level protocol protected. So we need to have a graph of these relationships to see what we have to protect at a higher level, reducing the overhead of otherwise having to call try_module_get/__module_get & module_put on _all_ objects creation/use. - Arnaldo From davem@redhat.com Thu May 1 11:02:35 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 11:02:42 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h41I2YFu027508 for ; Thu, 1 May 2003 11:02:35 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id JAA01734; Thu, 1 May 2003 09:55:21 -0700 Date: Thu, 01 May 2003 09:55:20 -0700 (PDT) Message-Id: <20030501.095520.63025177.davem@redhat.com> To: acme@conectiva.com.br Cc: rusty@rustcorp.com.au, kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, wa@almesberger.net Subject: Re: dev->destructor From: "David S. Miller" In-Reply-To: <20030501175111.GF3387@conectiva.com.br> References: <20030501120815.25BE22C155@lists.samba.org> <20030501.040935.68070653.davem@redhat.com> <20030501175111.GF3387@conectiva.com.br> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2404 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Arnaldo Carvalho de Melo Date: Thu, 1 May 2003 14:51:11 -0300 Well, I think that because there are a graph of relationships here we perhaps can be safe by protecting just some of the higher level objects (e.g. struct sock, struct socket, struct net_device) while leaving some other lower level objects managed by those higher level ones, e.g. struct sk_buff managed by struct sock. The graphs are unfortunately not completely connected. For example, sk_buff's can be sent not assosciated with any socket. Routing cache entries are not attached to any particular client, similar with ARP/neighbour entires, and sk_buff's in turn hold references to these things. See, long ago we used to not do proper reference counting on struct sock's. We used to rely on graphs of relationships and certain sock states to control destruction of these objects. The networking was riddled with obscure bugs because of this. From kuznet@ms2.inr.ac.ru Thu May 1 21:07:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 21:07:35 -0700 (PDT) Received: from sex.inr.ac.ru (sex.inr.ac.ru [193.233.7.165]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4247OFu010887 for ; Thu, 1 May 2003 21:07:27 -0700 Received: (from kuznet@localhost) by sex.inr.ac.ru (8.6.13/ANK) id IAA10719; Fri, 2 May 2003 08:06:51 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200305020406.IAA10719@sex.inr.ac.ru> Subject: Re: dev->destructor To: davem@redhat.com (David S. Miller) Date: Fri, 2 May 2003 08:06:51 +0400 (MSD) Cc: shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br, rusty@rustcorp.com.au In-Reply-To: <20030501.000058.39187964.davem@redhat.com> from "David S. Miller" at May 1, 3 00:00:58 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2405 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: kuznet@ms2.inr.ac.ru Precedence: bulk X-list: netdev Hello! > None of destructors kill reference to device object, and I mean none > of them. It is why I thought the idea works. When you call unregister_something() you hold a reference to this something, and you have no idea how much of the references you hold. This is invisible when unregister_something() is called from a single place sort of cleanup_module(). > Also, holding RTNL semaphore does not block potential holders > of device reference. Or does it? When that branch with waiting inside unregister() exists you can't hold any reference when grabbing this semaphore. That's why dev_ioctl() takes the semaphore first. It is damnly inconvenient, fragile, et cetera and such bugs do exist. That's why unregister_netdev() is logically wrong function: it takes dev as argument, so any sane programmer would assume caller holds a reference. But he can't. So, call of the function is allowed only from contexts where device is presumed to be held, i.e. from cleanup_module() and from no other places. > Your idea, as I understand it, is to add callback to module that > freezes module then at some time in the future makes indication > that module is clean and may be unloaded safely. Yes, the description is mostly right. > Question is, where to make this? It cannot be in the module > itself of course, that would create race condition similar to > one which exists today making this exercise quite pointless :-) Probably, you should look at the most first module implementation. :-) Desite of it was horrible (like almost everything in kernel that days) it was logically correct. rmmod deleted module not depending on refcnt and module body was destroyed later, when refcnt reached zero. See? So, that cleanup_module() is replaced with shutdown() and a destructor is added to allow to cleanup something but memory, if it is necessary. And to handle the situation when we do not want to use module refcnt, a predicate to ask module for readiness to kill is added. I think it can be combined to destructor, so that for such modules destructor can return -EAGAIN. Well, when refcnt is zero you can try to destroy module and it might disagree. > Note the problem. If between the unregister_foo() and re-register of > foo in the failure path, someone asks to open a "foo" it will fail. > Those semantics absolutely stink. It is not a problem at all comparing to real ones. :-) > Rusty went on to describe that this is one of the reasons for the > current "enable refcounts" scheme with try_module_get(). Rusty forgot that crippling xxx_get() is million times more painful. :-) He also forgot that in 99% of cases there is a single registry and this registry must be self-consistent, so all the work is already done and module.c just invades area out of its competence. Anyway, this approach is legal and the simplest one, I understand this. It can be used optionally. Only, frankly speaking, I do not see any applications for this, because when more than one registry exists, module is surely so complicated that tracking references is painful enough to forget about this. Anyway, we always know number of sockets et al., so why to count them in more than one place? netdevices is the simplest example, but it shows the most didctively that all the ocurences of module_** there are illegal. We want to register/unregister them dynamically, we have to do all the job not depending on modules. We have to do our own refcounting. And incorrect design of modules only prevents to make final small step to make this right. Well, the key moment is that while device is registered, its module refcnt is not zero logically, but we can't unload the module in this case, so we have to do funny try_* each lookup. Alexey From rusty@samba.org Thu May 1 23:57:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Thu, 01 May 2003 23:58:00 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h426vqFu012592 for ; Thu, 1 May 2003 23:57:53 -0700 Received: by lists.samba.org (Postfix, from userid 590) id 516742C04C; Fri, 2 May 2003 06:57:52 +0000 (GMT) From: Rusty Russell To: kuznet@ms2.inr.ac.ru Cc: davem@redhat.com (David S. Miller), shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor In-reply-to: Your message of "Fri, 02 May 2003 08:06:51 +0400." <200305020406.IAA10719@sex.inr.ac.ru> Date: Fri, 02 May 2003 15:25:15 +1000 Message-Id: <20030502065752.516742C04C@lists.samba.org> X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2406 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rusty@rustcorp.com.au Precedence: bulk X-list: netdev In message <200305020406.IAA10719@sex.inr.ac.ru> you write: > Hello! Hi Alexey! > It is damnly inconvenient, fragile, et cetera and such bugs do exist. > That's why unregister_netdev() is logically wrong function: it takes > dev as argument, so any sane programmer would assume caller holds > a reference. But he can't. So, call of the function is allowed > only from contexts where device is presumed to be held, i.e. from > cleanup_module() and from no other places. If this is true, I think you can use the module reference count only, and your code will be faster, too. I can prepare the patch for you later tonight, to see how it looks. > netdevices is the simplest example, but it shows the most didctively > that all the ocurences of module_** there are illegal. We want > to register/unregister them dynamically, we have to do all the job not > depending on modules. We have to do our own refcounting. And incorrect > design of modules only prevents to make final small step to make this > right. Well, the key moment is that while device is registered, its > module refcnt is not zero logically, but we can't unload the module > in this case, so we have to do funny try_* each lookup. Alexey, you are using a module but don't want to reference count it. I made module reference counts very cheap so you don't have to worry, but you still are trying to cheat 8) You want to be very tricky and count all ways into the module, instead. Clearly this is mathematically possible, but in practice very tricky. And all solutions I have seen which do this are ugly, and leave us with "remove may not succeed, it may hang forever, and you won't know, and you can't replace the module and need to reboot if it happens". 8( Better, I think, to make CONFIG_MODULE_UNLOAD=n, and make CONFIG_MODULE_FORCE_UNLOAD work even if CONFIG_MODULE_UNLOAD=n. Hope that clarifies? Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. From wichert@wiggy.net Fri May 2 02:12:06 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 02:12:14 -0700 (PDT) Received: from mx1.wiggy.net (IDENT:gMOMBDElYml4SDnNaNrQrQkvHDONC3w6@home.wiggy.net [213.84.101.140]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h429C4Fu014860 for ; Fri, 2 May 2003 02:12:06 -0700 Received: from wichert by mx1.wiggy.net with local (Exim 3.35 #1 (Debian)) id 19BWaJ-0003pQ-00 for ; Fri, 02 May 2003 11:12:03 +0200 Date: Fri, 2 May 2003 11:12:03 +0200 From: Wichert Akkerman To: netdev@oss.sgi.com Subject: Minor error in debugging message Message-ID: <20030502091203.GN22848@wiggy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2407 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: wichert@wiggy.net Precedence: bulk X-list: netdev I just noticed this in the 2.5.68 kernel boot sequence: Freeing alive device d7e6a000, eth%d It seems to be printed from netdev_finish_unregister (at least grep doesn't find that message elsewhere) and dev->name contains eth%d instead of eth0. Wichert. -- Wichert Akkerman It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. From andre@tomt.net Fri May 2 11:25:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 11:25:19 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h42IPCFu028113 for ; Fri, 2 May 2003 11:25:13 -0700 Received: (qmail 31509 invoked by uid 1006); 2 May 2003 18:33:46 -0000 Received: from andre@tomt.net by ns1 by uid 1003 with qmail-scanner-1.15 (sophie: 2.14/3.67. spamassassin: 2.50. Clear:. Processed in 0.16935 secs); 02 May 2003 18:33:46 -0000 Received: from slask.tomt.net (HELO slurv) (andre@tomt.net@217.8.136.222) by mail.skjellin.no with SMTP; 2 May 2003 18:33:45 -0000 From: "Andre Tomt" To: Subject: unable to add ipv6 default route in 2.4.21-rc1 + latest bkbits.net pieces Date: Fri, 2 May 2003 20:25:11 +0200 Message-ID: <002001c310d8$2bb57b00$0a01ff0a@slurv> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4024 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h42IPCFu028113 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2410 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev Hi, I noticed a problem on one of our semi-production systems today, it suddenly lacked a working ipv6 default route. After further investigation, even the raw iproute commands failed (I had it all set up in /etc/network/interfaces, a debian network configuration file). It seems to me something changed in 2.4.21? It certainly broke my iproute. 2.4.20 works fine (well, with many fixes not related to ipv6). iproute is debian version 20010824-9 (from unstable/sid). kvass:~# ip -6 route add default via 2003:730:f:3:: dev eth0 RTNETLINK answers: Invalid argument and for good measure: kvass:~# ip -6 route add 2000::/3 via 2003:730:f:3:: dev eth0 RTNETLINK answers: Invalid argument 2001:730:f:3::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 fe80::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 ff00::/8 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 unreachable default dev lo proto none metric -1 error -101 kvass:~# ip -6 route 2001:730:f:3::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 fe80::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 ff00::/8 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 default dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 unreachable default dev lo proto none metric -1 error -101 Theres a defaultroute there pointing to eth0 with noe gw set, that seems to be some kind of, err, default. Removing it and (trying, in this case) adding a proper default route has no effect. I even tried to recompile iproute, but that gave the same symptoms. Not sure if dpkg-buildpackage actually buildt it against /usr/include/linux or /usr/src/linux/include/linux though. Any ideas? I know there have been some deal of ipv6 fixes and additions going into 2.4.21, did this break something? -- Cheers, André Tomt andre@tomt.net From andre@tomt.net Fri May 2 14:38:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 14:38:56 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h42LcjFu030368 for ; Fri, 2 May 2003 14:38:48 -0700 Received: (qmail 30426 invoked by uid 1006); 2 May 2003 21:47:20 -0000 Received: from andre@tomt.net by ns1 by uid 1003 with qmail-scanner-1.15 (sophie: 2.14/3.67. spamassassin: 2.50. Clear:. Processed in 0.017965 secs); 02 May 2003 21:47:20 -0000 Received: from slask.tomt.net (HELO slurv) (andre@tomt.net@217.8.136.222) by mail.skjellin.no with SMTP; 2 May 2003 21:47:20 -0000 From: "Andre Tomt" To: Subject: RE: unable to add ipv6 default route in 2.4.21-rc1 + latest bkbits.net pieces Date: Fri, 2 May 2003 23:38:38 +0200 Message-ID: <009301c310f3$319bd760$0a01ff0a@slurv> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4024 In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h42LcjFu030368 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2411 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev Replying to myself here. > 2001:730:f:3::/64 dev eth0 proto kernel metric 256 mtu > 1500 advmss 1440 fe80::/64 dev eth0 proto kernel metric 256 > mtu 1500 advmss 1440 ff00::/8 dev eth0 proto kernel metric > 256 mtu 1500 advmss 1440 unreachable default dev lo proto > none metric -1 error -101 Oops, this was'nt supposed to be pasted here. > kvass:~# ip -6 route > 2001:730:f:3::/64 dev eth0 proto kernel metric 256 mtu > 1500 advmss 1440 fe80::/64 dev eth0 proto kernel metric 256 > mtu 1500 advmss 1440 ff00::/8 dev eth0 proto kernel metric > 256 mtu 1500 advmss 1440 default dev eth0 proto kernel > metric 256 mtu 1500 advmss 1440 unreachable default dev lo > proto none metric -1 error -101 This however, is still the right one. :-) -- Cheers, André Tomt andre@tomt.net From davem@redhat.com Fri May 2 14:55:31 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 14:55:36 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h42LtTFu031255 for ; Fri, 2 May 2003 14:55:30 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA04181; Fri, 2 May 2003 13:48:04 -0700 Date: Fri, 02 May 2003 13:48:04 -0700 (PDT) Message-Id: <20030502.134804.78707298.davem@redhat.com> To: rusty@rustcorp.com.au Cc: kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor From: "David S. Miller" In-Reply-To: <20030502065752.516742C04C@lists.samba.org> References: <200305020406.IAA10719@sex.inr.ac.ru> <20030502065752.516742C04C@lists.samba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2412 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Rusty Russell Date: Fri, 02 May 2003 15:25:15 +1000 If this is true, I think you can use the module reference count only, and your code will be faster, too. I can prepare the patch for you later tonight, to see how it looks. And where do we get the counter from when dev->owner is NULL (ie. non-modular)? We need the reference counting regardless of whether the device is implemented statically in the kernel or modular. Do you propose to attach dummy struct module to non-modular case? I am curious... Alexey, you are using a module but don't want to reference count it. I made module reference counts very cheap so you don't have to worry, but you still are trying to cheat 8) Understood. I think even stronger part of Alexey's argument is that all of this "if (x->owner)" all over the place takes away some of the gains of compiling things statically into the kernel. Why extra branches all over the place? You want to be very tricky and count all ways into the module, instead. Clearly this is mathematically possible, but in practice very tricky. And all solutions I have seen which do this are ugly, and leave us with "remove may not succeed, it may hang forever, and you won't know, and you can't replace the module and need to reboot if it happens". 8( As long as I can Control-C rmmod when it waits like this, which would be the case, what is the problem? Also, not only is this mathematically possible it is DONE already. Hmmm, there seems to be massive disconnect here between what we understand here and what you appear to. Let me try to describe it in detail. All reasonable protocol code must do exactly this. Any module which does not properly keep track of the objects it is creating has problems bigger than proper module handling. It is not "very tricky", but rather "required". Look at it this way, when module kmalloc's something does it immediately forget about this? This seems to be what you suggest, and it is a dangerous way to think! No, rather, it remembers that it did this, either by setting '1' to refcount of this object, or attaching it to some hash table, list, tree, or other global data structure it maintains. Any time this object is attached somewhere else, reference count is incremented. Anytime it is detached or destroyed, refcount is decremented and final decrement to zero makes final killing of this object. It is ABCs of programming. :-) Apply this to every dynamic object created by a module, and the end result is that it makes the work of counting all internal references. Ergo, module refcounting is superfluous. Look, once external view into module (ie. socket operations, superblock ops, netdev registry) is removed, all that remains to reference object is exactly these objects. It is the only different part about modules vs. non-modules. After threading the networking and adding true refcounting to sockets I will never forget these rules. :-) Better, I think, to make CONFIG_MODULE_UNLOAD=n, and make CONFIG_MODULE_FORCE_UNLOAD work even if CONFIG_MODULE_UNLOAD=n. As much as I'd like to be able to accept that behavior, it's too much breakage. So many people periodically make rmmod attempts to unload unused modules, distributions even make this by default (or at least used to). Let's look at this aspect of behavior: 1) Some people think that -EBUSY return is unexpected. I fall into this category. 2) It is argued that some other people think the "wait until unloadable" behavior is unexpected. But nobody would be surprised if rmmod told them: ==================== Trying to unload %s, waiting for all references to go away. Perhaps you have programs running which still reference this module? (Hit Ctrl-C to interrupt unload to go and remove those references) ==================== nobody would ask what does this mean. :-) In fact, what IF rmmod was able to know it was unloading a filesystem and therefore could walk the mount list to find mounted instances of this filesystem and print that to the user in the rmmod message? Or for network protocols to print the socket list of sockets/routes/devices open to that module and even making 'lsof' to print process name/pid holding open such sockets? I bet even Linus himself would exclaim "wow, that's nice." Compare this to "-EBUSY". :-))))))))) And I want to mention that in some cases you have to "wait". The best example are TCP_TIME_WAIT sockets. Even after users downs all the interfaces, and closes all the sockets, these remnants must remain for their full life of 60 seconds. I really am concerned at both sides, both user observed behavior and kernel side correctness. From niv@us.ibm.com Fri May 2 15:41:23 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 15:41:29 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h42MfEFu032384 for ; Fri, 2 May 2003 15:41:23 -0700 Received: from westrelay05.boulder.ibm.com (westrelay05.boulder.ibm.com [9.17.193.33]) by e35.co.us.ibm.com (8.12.9/8.12.2) with ESMTP id h42Mf8uT269628; Fri, 2 May 2003 18:41:08 -0400 Received: from w-nivedita.beaverton.ibm.com (d03av01.boulder.ibm.com [9.17.193.81]) by westrelay05.boulder.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h42Mf7nU062418; Fri, 2 May 2003 16:41:07 -0600 Date: Fri, 2 May 2003 15:41:06 -0700 (PDT) From: Nivedita Singhvi X-X-Sender: nivedita@w-nivedita.beaverton.ibm.com To: David Miller cc: netdev@oss.sgi.com Subject: [PATCH 2.5.68] sysctl max_dgram_qlen permissions Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2413 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev Currently the sysctl var net.unix.max_dgram_qlen has 0600 permissions..Need users to be able to read it.. Dont see any reason not to(?). thanks, Nivedita --- sysctl_net_unix.c Sat Apr 19 19:49:10 2003 +++ sysctl_net_unix.c.new Fri May 2 13:48:11 2003 @@ -20,7 +20,7 @@ .procname = "max_dgram_qlen", .data = &sysctl_unix_max_dgram_qlen, .maxlen = sizeof(int), - .mode = 0600, + .mode = 0644, .proc_handler = &proc_dointvec }, { .ctl_name = 0 } From rusty@samba.org Fri May 2 21:09:51 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 21:09:58 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4349nFu003711 for ; Fri, 2 May 2003 21:09:50 -0700 Received: by lists.samba.org (Postfix, from userid 590) id 804182C003; Sat, 3 May 2003 04:09:49 +0000 (GMT) From: Rusty Russell To: "David S. Miller" Cc: kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor In-reply-to: Your message of "Fri, 02 May 2003 13:48:04 MST." <20030502.134804.78707298.davem@redhat.com> Date: Sat, 03 May 2003 14:07:41 +1000 Message-Id: <20030503040949.804182C003@lists.samba.org> X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2414 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rusty@rustcorp.com.au Precedence: bulk X-list: netdev In message <20030502.134804.78707298.davem@redhat.com> you write: > From: Rusty Russell > Date: Fri, 02 May 2003 15:25:15 +1000 > > If this is true, I think you can use the module reference count only, > and your code will be faster, too. I can prepare the patch for you > later tonight, to see how it looks. > > And where do we get the counter from when dev->owner is NULL > (ie. non-modular)? We need the reference counting regardless of > whether the device is implemented statically in the kernel or modular. But Alexey said you can only call unregister_netdev from module unload, ie. if not a module, it can't be unloaded, hence no refcount needed. I wrote the above paragraph because I'm not sure if I understood Alexey correctly? > Alexey, you are using a module but don't want to reference count it. > I made module reference counts very cheap so you don't have to worry, > but you still are trying to cheat 8) > > Understood. > > I think even stronger part of Alexey's argument is that all of > this "if (x->owner)" all over the place takes away some of the > gains of compiling things statically into the kernel. Why extra > branches all over the place? Agreed. I have considered removing that, and making THIS_MODULE equal a dummy module struct for the core kernel. I think that would be a win (we've already established that the actual refcount is cheap, possibly cheaper than the branch in practice). BTW, we should look at making local_inc() etc. a first-class citizen: it has uses outside modules. > As long as I can Control-C rmmod when it waits like this, which would > be the case, what is the problem? If you can, and the device is still usable afterwards, it would be nirvana 8) [ Refcounting tutorial skipped: I went through the same pain with early conntrack code, and learned to refcount everything now 8) ] > Look, once external view into module (ie. socket operations, > superblock ops, netdev registry) is removed, all that remains to > reference object is exactly these objects. It is the only different > part about modules vs. non-modules. This argument applies to all objects. If you reference count everything which holds a reference to an object, you can infer the reference count of the object from the sum of reference counts of its referees. In practice, as you pointed out in an earlier mail (I think sockets were your example), doing this proves to be extremely painful. And we're feeling the pain now. The module functions, and all its data, are objects. For convenience, size, and speed, we don't reference count them separately. OK, so why doesn't, say, struct netdevice grab the module refcount on registration (as is logical), and drop it on unregistration? Because the module holds a reference to the struct device, so now you have a classic circular reference count problem: the module reference count will never fall to zero. That's why we grab a reference just before use. You can, of course, do two-stage module cleanup (ie. first stage with refcounts non-zero), but the user wants clean "remove or fail to remove" semantics, so half-way through the cleanup you realize it's still in use, and restore things: now you have the spurious failure problem where it was half-unregistered for a while, and AFAICT you have to rewrite 1400 modules's cleanup routines. I imagined schemes where the kernel would be basically stopped during module remove, so the half-remove and unremove would appear atomic. I shied away from implementing such a monster without deadlock, but it might be possible. Then we would truly have nirvana 8) > Trying to unload %s, waiting for all references to go away. > Perhaps you have programs running which still reference this > module? (Hit Ctrl-C to interrupt unload to go and remove > those references) > ==================== > > nobody would ask what does this mean. :-) Please, implement such a thing. I was unable to, without introducing spurious failure in the components, *and* rewriting every module_cleanup function 8( Hence rmmod does not wait by default, but says "module is busy". > And I want to mention that in some cases you have to "wait". The best > example are TCP_TIME_WAIT sockets. Even after users downs all the > interfaces, and closes all the sockets, these remnants must remain for > their full life of 60 seconds. Yes, certainly. The two-stage unload provided by rmmod --wait ensures that the reference count decreases to zero (it makes try_module_get fail). This was my desire if unloading security modules or some netfilter modules was going to be reasonable. > I really am concerned at both sides, both user observed behavior and > kernel side correctness. There are shades of correctness, too. Not jumping into a module which has gone away is probably the most important. Having finite unload time is also nice. Not having spurious failures because someone tried to unload a module is nice too. Roman Zippel suggested (among other things) that every unregister fail when busy, and that modules the reinitialize. This gives the spurious failure problem, and means rewriting every unregister interface in the kernel, every module cleanup function, and dealing with the case where you're cleaning up a failed initialization and your unregister failed. Adam Richter suggested the module or interface register with a central repository to say "this modules uses this interface", then you can atomically freeze all the module interfaces (say they all share a single lock) and see if it's in use, but you can't really call the module's cleanup routine, so you have to atomically deactivate all these registrations before dropping the lock, and that's the same as try_module_get(). As you know, I love radical change. But I want be make sure we're going to end up somewhere we like at the end of it, which is why I didn't do it. Anyway, putting the module loader in the kernel was enough to sate my appetite for change 8) Thanks! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. From davem@redhat.com Fri May 2 21:53:53 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 21:54:01 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h434rqFu004221 for ; Fri, 2 May 2003 21:53:53 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA04968; Fri, 2 May 2003 20:46:29 -0700 Date: Fri, 02 May 2003 20:46:28 -0700 (PDT) Message-Id: <20030502.204628.35664814.davem@redhat.com> To: rusty@rustcorp.com.au Cc: kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor From: "David S. Miller" In-Reply-To: <20030503040949.804182C003@lists.samba.org> References: <20030502.134804.78707298.davem@redhat.com> <20030503040949.804182C003@lists.samba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2415 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Rusty Russell Date: Sat, 03 May 2003 14:07:41 +1000 I imagined schemes where the kernel would be basically stopped during module remove, so the half-remove and unremove would appear atomic. I shied away from implementing such a monster without deadlock, but it might be possible. Then we would truly have nirvana 8) I think this is an interesting area for exploration. This isn't a unique requirement BTW Rusty. IP conntrack rehashing in the presence of RCU would want something just like this now wouldn't it? Consider other applications, such as hot plug memory. I'm sure tons of other interesting examples could be imagined. So indeed, the key to nirvana would indeed be here :-) I think it can work Rusty, in short you create 1 freeze thread per cpu. You wake up all the freeze threads on non-local cpus, and they indicate their presence via some bitmask. Once the master cpu sees all non-local-cpu bits set in the bitmask, it begins the unload sequence, after the unload the cpu mask is cleared and this signals the freeze threads to break from their spin loops and schedule(). This means the local master cpu executes the unload sequence. It may sleep in order to yield to, for example, semaphore holders, it may also sleep to yield to kswapd and friends for the sake of memory allocation. I mean... consider all the situations and please try to find some hole in this. We can make all try_to_*() sleep at this time too... this in particular needs more thought. To make these freeze threads globally useful, we allow them to run atomicity commands. The two defined commands are "local_irq_*()" and "local_bh_*()", two bitmasks control this and the freeze threads check the bits in their spin loops. Do you see? Maybe... it is nearly Nirvana! :-))))) Our ability to implement this changes the rest of the conversation, so let us resolve this first. From davem@redhat.com Fri May 2 22:07:25 2003 Received: with ECARTIS (v1.0.0; list netdev); Fri, 02 May 2003 22:07:29 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4357PFu004642 for ; Fri, 2 May 2003 22:07:25 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id VAA04998; Fri, 2 May 2003 21:00:00 -0700 Date: Fri, 02 May 2003 21:00:00 -0700 (PDT) Message-Id: <20030502.210000.35018302.davem@redhat.com> To: rusty@rustcorp.com.au Cc: kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor From: "David S. Miller" In-Reply-To: <20030503040949.804182C003@lists.samba.org> References: <20030502.134804.78707298.davem@redhat.com> <20030503040949.804182C003@lists.samba.org> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2416 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Rusty Russell Date: Sat, 03 May 2003 14:07:41 +1000 This argument applies to all objects. If you reference count everything which holds a reference to an object, you can infer the reference count of the object from the sum of reference counts of its referees. In practice, as you pointed out in an earlier mail (I think sockets were your example), doing this proves to be extremely painful. And we're feeling the pain now. Please ignore the example code I wrote in that email. Most of it is inconsistent and frankly garbage. :-) The "->can_unload()" check is actually simpler than we might initially suspect. Something like ipv6 might check: if (atomic_read(&inet6_sock_nr) == 0 && atomic_read(&inet6_dev_nr) == 0 && rt6_cache_empty()) return 1; return 0; Now, here is the important part! When this thing returns "1" the module.c code does this: call_rcu(&mod->rcu_head, mod->cleanup, NULL); This makes sure the guy who killed the last object has indeed left the module code. From davem@redhat.com Sat May 3 01:35:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 03 May 2003 01:35:27 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h438ZFFu006093 for ; Sat, 3 May 2003 01:35:16 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA05331; Sat, 3 May 2003 00:28:22 -0700 Date: Sat, 03 May 2003 00:28:22 -0700 (PDT) Message-Id: <20030503.002822.34760030.davem@redhat.com> To: niv@us.ibm.com Cc: netdev@oss.sgi.com Subject: Re: [PATCH 2.5.68] sysctl max_dgram_qlen permissions From: "David S. Miller" In-Reply-To: References: X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2417 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Nivedita Singhvi Date: Fri, 2 May 2003 15:41:06 -0700 (PDT) Currently the sysctl var net.unix.max_dgram_qlen has 0600 permissions..Need users to be able to read it.. Dont see any reason not to(?). Applied, thanks. --- sysctl_net_unix.c Sat Apr 19 19:49:10 2003 +++ sysctl_net_unix.c.new Fri May 2 13:48:11 2003 Please use properly rooted patches in the future. Ie. soemthing like: --- a/linux/net/unix/sysctl_net_unix.c.orig Sat Apr 19 19:49:10 2003 +++ b/linux/net/unix/sysctl_net_unix.c Fri May 2 13:48:11 2003 Patches like yours don't work at all with the automated patch scripts lots of us use to ease the burdon of applying lots of patches. In fact, even if I had changed into net/unix/ before applying your patch, it would fail because there patch tries to patch a file named sysctl_net_unix.c.new which of course does not exist. Please, this is very lazy and causes me to waste a lot of time fixing up your patch which I should not have to do. :( From niv@us.ibm.com Sat May 3 10:07:08 2003 Received: with ECARTIS (v1.0.0; list netdev); Sat, 03 May 2003 10:07:14 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h43H70Fu012926 for ; Sat, 3 May 2003 10:07:07 -0700 Received: from northrelay04.pok.ibm.com (northrelay04.pok.ibm.com [9.56.224.206]) by e4.ny.us.ibm.com (8.12.9/8.12.2) with ESMTP id h43H6ssZ093400; Sat, 3 May 2003 13:06:54 -0400 Received: from us.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by northrelay04.pok.ibm.com (8.12.9/NCO/VER6.5) with ESMTP id h43H6q6h184178; Sat, 3 May 2003 13:06:52 -0400 Message-ID: <3EB3F5BD.5070202@us.ibm.com> Date: Sat, 03 May 2003 10:00:45 -0700 From: Nivedita Singhvi User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2.1) Gecko/20021130 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com Subject: Re: [PATCH 2.5.68] sysctl max_dgram_qlen permissions References: <20030503.002822.34760030.davem@redhat.com> In-Reply-To: <20030503.002822.34760030.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2418 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: niv@us.ibm.com Precedence: bulk X-list: netdev David S. Miller wrote: > Please, this is very lazy and causes me to waste a lot of time fixing > up your patch which I should not have to do. :( Eeep! A thousand pardons Dave! Didnt send you the right file. Much flamage deserved and redirected to my sloppiness and utter inability to correctly recall the name of a file generated moments earlier... Much thanks for the cleanup.. Nivedita From vinay-rc@naturesoft.net Sun May 4 02:20:44 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 04 May 2003 02:20:48 -0700 (PDT) Received: from naturesoft.net ([203.145.184.221]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h449KcFu020649 for ; Sun, 4 May 2003 02:20:42 -0700 Received: from [192.168.0.15] (helo=lima.royalchallenge.com) by naturesoft.net with esmtp (Exim 3.35 #1) id 19CFXy-00085y-00 for netdev@oss.sgi.com; Sun, 04 May 2003 14:42:38 +0530 Subject: [PATCH 2.{4,x5}.x] mod_timer cleanups for dst.c From: Vinay K Nallamothu To: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 04 May 2003 14:55:34 +0530 Message-Id: <1052040334.1119.71.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2420 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay-rc@naturesoft.net Precedence: bulk X-list: netdev Hi, Just reposting what I earlier posted to the linux-kernel mailing list. These are trivial patches converting {del,add}_timer to mod_timer. Also kindly let me know whether this is the right platform to post the PPP & FR related fixes or to linux-x25 mailing list. vinay PS: I am not subscribed to this mailing list --- linux-2.5.68/net/core/dst.c 2003-03-25 10:08:35.000000000 +0530 +++ linux-2.5.68-mod/net/core/dst.c 2003-05-03 13:42:12.000000000 +0530 @@ -154,11 +154,9 @@ dst->next = dst_garbage_list; dst_garbage_list = dst; if (dst_gc_timer_inc > DST_GC_INC) { - del_timer(&dst_gc_timer); dst_gc_timer_inc = DST_GC_INC; dst_gc_timer_expires = DST_GC_MIN; - dst_gc_timer.expires = jiffies + dst_gc_timer_expires; - add_timer(&dst_gc_timer); + mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires); } spin_unlock_bh(&dst_lock); } From andre@tomt.net Sun May 4 02:19:16 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 04 May 2003 02:19:24 -0700 (PDT) Received: from mail.skjellin.no (mail.skjellin.no [80.239.42.67]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h449J8Fu019256 for ; Sun, 4 May 2003 02:19:15 -0700 Received: (qmail 5586 invoked by uid 1006); 4 May 2003 09:27:51 -0000 Received: from andre@tomt.net by ns1 by uid 1003 with qmail-scanner-1.15 (sophie: 2.14/3.67. spamassassin: 2.50. Clear:. Processed in 0.017195 secs); 04 May 2003 09:27:51 -0000 Received: from slask.tomt.net (HELO slurv) (andre@tomt.net@217.8.136.222) by mail.skjellin.no with SMTP; 4 May 2003 09:27:51 -0000 From: "Andre Tomt" To: Subject: RE: unable to add ipv6 default route in 2.4.21-rc1 + latest bkbits.net pieces Date: Sun, 4 May 2003 11:21:04 +0200 Message-ID: <005701c3121e$7cc46370$0a01ff0a@slurv> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4024 In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id h449J8Fu019256 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2419 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: andre@tomt.net Precedence: bulk X-list: netdev > Replying to myself here. And again. Have some more information now. It seems, I cannot use 2001:730:f:3:: as an ordinary unicast global address anymore? At least not as a default gateway. Adding a default route pointing to 2001:730:f:3::1 works just fine, but why did :0000 break? PS! I'm not subscribed to this list, so CC's would be nice. I forgot to mention this earlier too. -- Cheers, André Tomt andre@tomt.net From vinay-rc@naturesoft.net Sun May 4 02:26:13 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 04 May 2003 02:26:16 -0700 (PDT) Received: from naturesoft.net ([203.145.184.221]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h449Q8Fu021220 for ; Sun, 4 May 2003 02:26:11 -0700 Received: from [192.168.0.15] (helo=lima.royalchallenge.com) by naturesoft.net with esmtp (Exim 3.35 #1) id 19CFdJ-00088C-00 for netdev@oss.sgi.com; Sun, 04 May 2003 14:48:09 +0530 Subject: [PATCH 2.{4,5}.x] mod_timer cleanups for sch_cbq.c From: Vinay K Nallamothu To: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 04 May 2003 15:01:05 +0530 Message-Id: <1052040665.1119.76.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2421 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay-rc@naturesoft.net Precedence: bulk X-list: netdev sch_cbq.c: Trivial {del,add}_timer to mod_timer conversion. --- linux-2.5.68/net/sched/sch_cbq.c 2003-03-25 10:08:36.000000000 +0530 +++ linux-2.5.68-nvk/net/sched/sch_cbq.c 2003-05-03 19:29:08.000000000 +0530 @@ -1056,11 +1056,9 @@ sch->stats.overlimits++; if (q->wd_expires && !netif_queue_stopped(sch->dev)) { long delay = PSCHED_US2JIFFIE(q->wd_expires); - del_timer(&q->wd_timer); if (delay <= 0) delay = 1; - q->wd_timer.expires = jiffies + delay; - add_timer(&q->wd_timer); + mod_timer(&q->wd_timer, jiffies + delay); sch->flags |= TCQ_F_THROTTLED; } } From vinay-rc@naturesoft.net Sun May 4 02:34:14 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 04 May 2003 02:34:17 -0700 (PDT) Received: from naturesoft.net ([203.145.184.221]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h449YAFu021604 for ; Sun, 4 May 2003 02:34:13 -0700 Received: from [192.168.0.15] (helo=lima.royalchallenge.com) by naturesoft.net with esmtp (Exim 3.35 #1) id 19CFl3-0008Aq-00 for netdev@oss.sgi.com; Sun, 04 May 2003 14:56:09 +0530 Subject: [PATCH 2.{4,5}.x] mod_timer fix for sch_csz.c From: Vinay K Nallamothu To: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 04 May 2003 15:09:05 +0530 Message-Id: <1052041145.1119.83.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2422 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay-rc@naturesoft.net Precedence: bulk X-list: netdev --- linux-2.5.68/net/sched/sch_csz.c 2003-04-21 10:14:44.000000000 +0530 +++ linux-2.5.68-nvk/net/sched/sch_csz.c 2003-05-03 14:40:11.000000000 +0530 @@ -708,11 +708,9 @@ */ if (q->wd_expires) { unsigned long delay = PSCHED_US2JIFFIE(q->wd_expires); - del_timer(&q->wd_timer); if (delay == 0) delay = 1; - q->wd_timer.expires = jiffies + delay; - add_timer(&q->wd_timer); + mod_timer(&q->wd_timer, jiffies + delay); sch->stats.overlimits++; } #endif From vinay-rc@naturesoft.net Sun May 4 02:35:49 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 04 May 2003 02:35:52 -0700 (PDT) Received: from naturesoft.net ([203.145.184.221]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h449ZiFu021919 for ; Sun, 4 May 2003 02:35:47 -0700 Received: from [192.168.0.15] (helo=lima.royalchallenge.com) by naturesoft.net with esmtp (Exim 3.35 #1) id 19CFmb-0008C0-00 for netdev@oss.sgi.com; Sun, 04 May 2003 14:57:46 +0530 Subject: [PATCH 2.{4,5}.x] mod_timer fix for sch_htb.c From: Vinay K Nallamothu To: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8 (1.0.8-10) Date: 04 May 2003 15:10:42 +0530 Message-Id: <1052041242.1245.86.camel@lima.royalchallenge.com> Mime-Version: 1.0 X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2423 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: vinay-rc@naturesoft.net Precedence: bulk X-list: netdev sch_htb.c: Trivial {del,add}_timer to mod_timer conversion. --- linux-2.5.68/net/sched/sch_htb.c 2003-04-21 10:14:44.000000000 +0530 +++ linux-2.5.68-nvk/net/sched/sch_htb.c 2003-05-03 14:41:36.000000000 +0530 @@ -975,9 +975,7 @@ printk(KERN_INFO "HTB delay %ld > 5sec\n", delay); delay = 5*HZ; } - del_timer(&q->timer); - q->timer.expires = jiffies + delay; - add_timer(&q->timer); + mod_timer(&q->timer, jiffies + delay); sch->flags |= TCQ_F_THROTTLED; sch->stats.overlimits++; HTB_DBG(3,1,"htb_deq t_delay=%ld\n",delay); From davem@redhat.com Sun May 4 05:33:27 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 04 May 2003 05:33:36 -0700 (PDT) Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h44CXQFu029630 for ; Sun, 4 May 2003 05:33:27 -0700 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id EAA13572; Sun, 4 May 2003 04:26:13 -0700 Date: Sun, 04 May 2003 04:26:12 -0700 (PDT) Message-Id: <20030504.042612.104038287.davem@redhat.com> To: vinay-rc@naturesoft.net Cc: netdev@oss.sgi.com Subject: Re: [PATCH 2.{4,x5}.x] mod_timer cleanups for dst.c From: "David S. Miller" In-Reply-To: <1052040334.1119.71.camel@lima.royalchallenge.com> References: <1052040334.1119.71.camel@lima.royalchallenge.com> X-FalunGong: Information control. X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2424 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@redhat.com Precedence: bulk X-list: netdev From: Vinay K Nallamothu Date: 04 May 2003 14:55:34 +0530 Just reposting what I earlier posted to the linux-kernel mailing list. These are trivial patches converting {del,add}_timer to mod_timer. Thanks, this is actually a bug fix. The old code could add_timer() while the timer was active. From rusty@samba.org Sun May 4 22:18:56 2003 Received: with ECARTIS (v1.0.0; list netdev); Sun, 04 May 2003 22:19:04 -0700 (PDT) Received: from lists.samba.org (dp.samba.org [66.70.73.150]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h455IsFu005320 for ; Sun, 4 May 2003 22:18:56 -0700 Received: by lists.samba.org (Postfix, from userid 590) id 5C31F2C05E; Mon, 5 May 2003 05:18:54 +0000 (GMT) From: Rusty Russell To: "David S. Miller" Cc: kuznet@ms2.inr.ac.ru, shemminger@osdl.org, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor In-reply-to: Your message of "Fri, 02 May 2003 20:46:28 MST." <20030502.204628.35664814.davem@redhat.com> Date: Mon, 05 May 2003 15:18:29 +1000 Message-Id: <20030505051854.5C31F2C05E@lists.samba.org> X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) X-archive-position: 2425 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: rusty@rustcorp.com.au Precedence: bulk X-list: netdev In message <20030502.204628.35664814.davem@redhat.com> you write: > I think it can work Rusty, in short you create 1 freeze thread > per cpu. You wake up all the freeze threads on non-local cpus, > and they indicate their presence via some bitmask. This code is already in module.c. I'm glad you like it though 8). But we disable local irqs as well: this is what I call a "bogolock" (the read-side of a bogolock is prempt_disable()/preempt_enable(): you could temporarily disable preemption and force the scheduler to run every preempted thread, and remove this). > This means the local master cpu executes the unload sequence. It may > sleep in order to yield to, for example, semaphore holders, it may > also sleep to yield to kswapd and friends for the sake of memory > allocation. I mean... consider all the situations and please try to > find some hole in this. We can make all try_to_*() sleep at this > time too... this in particular needs more thought. Well, it's a big task. Holding interrupts disabled for unbounded time on CPUs needs to be thought about, but I think can be fixed. try_xxx can be called from interrupt context: you really want to get rid of interrupts, too... During previous discussions, I called this "return to primordial soup": back to like during init. Ideally, only userspace context (no interrupts, timers, bottom halves), and life is easy. > To make these freeze threads globally useful, we allow them to > run atomicity commands. The two defined commands are "local_irq_*()" > and "local_bh_*()", two bitmasks control this and the freeze threads > check the bits in their spin loops. Something like this? /* Tell all freeze threads to disable bottom halves. */ void global_bh_disable(void); void global_bh_enable(void); /* Tell all freeze threads to disable interrupts halves. */ void global_irq_disable(void); void global_irq_enable(void); > Do you see? Maybe... it is nearly Nirvana! :-))))) Yes, but I worry it might be an illusion 8) > Our ability to implement this changes the rest of the conversation, > so let us resolve this first. Yes, but it's a big IF. I think it might be easier to make all unregistrations runnable in interrupt context 8( Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. From sriram.chintalapati@wipro.com Mon May 5 06:04:57 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 05 May 2003 06:24:24 -0700 (PDT) Received: from localhost [127.0.0.1] by oss.sgi.com with SpamAssassin (2.50 1.173-2003-02-20-exp); Mon, 05 May 2003 06:05:01 -0700 From: "Chintalapati,Sriram" To: "'netdev@oss.sgi.com'" Cc: "'sriramc@wipro.tcpn.com'" Subject: Help needed: Find Net4 related differences between 2.4.20 and 2.5 .68 kernel versions Date: Mon, 5 May 2003 18:21:21 +0530 Message-Id: <2B4B4D30E18DD31180CA00508B0F3002A71D85@wipro-exmbx1.wipro.tcpn.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----------=_3EB6617D.2FD31CBE" X-archive-position: 2426 X-Approved-By: ralf@linux-mips.org X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: Sriram.Chintalapati@wipro.com Precedence: bulk X-list: netdev This is a multi-part message in MIME format. ------------=_3EB6617D.2FD31CBE Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: 8bit This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See http://spamassassin.org/tag/ for more details. Content preview: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C31305.074BD3C0 Content-Type: text/plain [...] Content analysis details: (10.40 points, 5 required) HTML_MESSAGE (3.0 points) BODY: HTML included in message HTML_40_50 (0.4 points) BODY: Message is 40% to 50% HTML MSG_ID_ADDED_BY_MTA_3 (3.0 points) 'Message-Id' was added by a relay (3) MIME_BOUND_NEXTPART (4.0 points) Spam tool pattern in MIME boundary The original message did not contain plain text, and may be unsafe to open with some email clients; in particular, it may contain a virus, or confirm that your address can receive spam. If you wish to view it, it may be safer to save it to a file and open it with an editor. ------------=_3EB6617D.2FD31CBE Content-Type: message/rfc822 Content-Description: original message before SpamAssassin Content-Disposition: attachment Content-Transfer-Encoding: 8bit Received: from wiprom2mx1.wipro.com (wiprom2mx1.wipro.com [203.197.164.41]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h45D4mFu015591 for ; Mon, 5 May 2003 06:04:51 -0700 Received: from m2vwall5.wipro.com (m2vwall5.wipro.com [10.115.50.5]) by wiprom2mx1.wipro.com (8.11.3/8.11.3) with SMTP id h45D4dN21841 for ; Mon, 5 May 2003 18:34:39 +0530 (IST) Received: from blr-m2-msg.wipro.com ([10.116.50.99]) by blr-m1-bh2.wipro.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 5 May 2003 18:34:34 +0530 Received: from wipro.tcpn.com ([172.31.41.11]) by blr-m2-msg.wipro.com with Microsoft SMTPSVC(5.0.2195.5329); Mon, 5 May 2003 18:34:34 +0530 Received: from wipro-exconn1.wipro.tcpn.com (wipro-exconn1.wipro.tcpn.com [172.31.41.55]) by wipro.tcpn.com (8.9.3/8.9.3) with ESMTP id SAA04495; Mon, 5 May 2003 18:37:19 +0530 (IST) Received: by wipro-exconn1.wipro.tcpn.com with Internet Mail Service (5.5.2650.21) id ; Mon, 5 May 2003 18:15:34 +0530 Message-ID: <2B4B4D30E18DD31180CA00508B0F3002A71D85@wipro-exmbx1.wipro.tcpn.com> From: "Chintalapati,Sriram" To: "'netdev@oss.sgi.com'" Cc: "'sriramc@wipro.tcpn.com'" Subject: Help needed: Find Net4 related differences between 2.4.20 and 2.5 .68 kernel versions Date: Mon, 5 May 2003 18:21:21 +0530 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: multipart/mixed; boundary="----=_NextPartTM-000-36df13d1-7afa-11d7-ba7f-006067005148" X-OriginalArrivalTime: 05 May 2003 13:04:34.0379 (UTC) FILETIME=[DFE871B0:01C31306] This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------=_NextPartTM-000-36df13d1-7afa-11d7-ba7f-006067005148 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C31305.074BD3C0" ------_=_NextPart_001_01C31305.074BD3C0 Content-Type: text/plain Hi, I need some guidance regarding finding differences related to Net4 between 2.4.20 and 2.5.68 kernel versions. Aim is to make a list of code changes that might be needed to use SOCK_STREAM option for SCTP protocol and any dependent changes. Can you please let me know how these differences can be found? Do I need to look for a list patches present for 2.5 kernel that need to be migrated back? If so, where can I get list of patches that are needed? Can you please let me know the best way to proceed for this task? Please let me know if this is not the right place for this question. Please let me know if I need to provide any additional information. Thanks, Sriram ------_=_NextPart_001_01C31305.074BD3C0 Content-Type: text/html Message
Hi,
 
I need some guidance regarding finding differences related to Net4 between 2.4.20 and 2.5.68 kernel versions.
Aim is to make a list of code changes that might be needed to use SOCK_STREAM option for SCTP protocol and any dependent changes.
Can you please let me know how these differences can be found? Do I need to look for a list patches present for 2.5 kernel that need to be migrated back? If so, where can I get list of patches that are needed?
Can you please let me know the best way to proceed for this task?
 
Please let me know if this is not the right place for this question.
 
Please let me know if I need to provide any additional information.
 
Thanks,
Sriram
 
------_=_NextPart_001_01C31305.074BD3C0-- ------=_NextPartTM-000-36df13d1-7afa-11d7-ba7f-006067005148-- ------------=_3EB6617D.2FD31CBE-- From shemminger@osdl.org Mon May 5 09:08:28 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 05 May 2003 09:08:34 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h45G8RFu019453 for ; Mon, 5 May 2003 09:08:28 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h45G8KW12794; Mon, 5 May 2003 09:08:20 -0700 Date: Mon, 5 May 2003 09:08:20 -0700 From: Stephen Hemminger To: Rusty Russell Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor Message-Id: <20030505090820.50cd5a13.shemminger@osdl.org> In-Reply-To: <20030503040949.804182C003@lists.samba.org> References: <20030502.134804.78707298.davem@redhat.com> <20030503040949.804182C003@lists.samba.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 2427 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev On Sat, 03 May 2003 14:07:41 +1000 Rusty Russell wrote: > In message <20030502.134804.78707298.davem@redhat.com> you write: > > From: Rusty Russell > > Date: Fri, 02 May 2003 15:25:15 +1000 > > > > If this is true, I think you can use the module reference count only, > > and your code will be faster, too. I can prepare the patch for you > > later tonight, to see how it looks. > > > > And where do we get the counter from when dev->owner is NULL > > (ie. non-modular)? We need the reference counting regardless of > > whether the device is implemented statically in the kernel or modular. > > But Alexey said you can only call unregister_netdev from module > unload, ie. if not a module, it can't be unloaded, hence no refcount > needed. I wrote the above paragraph because I'm not sure if I > understood Alexey correctly? There are several flavors of pseudo-network devices like bridging and VLAN that dynamically create/destroy netdev's even when they are not modules. From shemminger@osdl.org Mon May 5 13:01:00 2003 Received: with ECARTIS (v1.0.0; list netdev); Mon, 05 May 2003 13:01:06 -0700 (PDT) Received: from mail.osdl.org (air-2.osdl.org [65.172.181.6]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h45K0xFu023266 for ; Mon, 5 May 2003 13:00:59 -0700 Received: from dell_ss3.pdx.osdl.net (dell_ss3.pdx.osdl.net [172.20.1.60]) by mail.osdl.org (8.11.6/8.11.6) with SMTP id h45K0oW25590; Mon, 5 May 2003 13:00:50 -0700 Date: Mon, 5 May 2003 13:00:50 -0700 From: Stephen Hemminger To: Rusty Russell Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, acme@conectiva.com.br Subject: Re: dev->destructor Message-Id: <20030505130050.4b9868bb.shemminger@osdl.org> In-Reply-To: <20030503040949.804182C003@lists.samba.org> References: <20030502.134804.78707298.davem@redhat.com> <20030503040949.804182C003@lists.samba.org> Organization: Open Source Development Lab X-Mailer: Sylpheed version 0.8.11 (GTK+ 1.2.10; i686-pc-linux-gnu) X-Face: &@E+xe?c%:&e4D{>f1O<&U>2qwRREG5!}7R4;D<"NO^UI2mJ[eEOA2*3>(`Th.yP,VDPo9$ /`~cw![cmj~~jWe?AHY7D1S+\}5brN0k*NE?pPh_'_d>6;XGG[\KDRViCfumZT3@[ Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart_Mon__5_May_2003_13:00:50_-0700_086866c8" X-archive-position: 2428 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev This is a multi-part message in MIME format. --Multipart_Mon__5_May_2003_13:00:50_-0700_086866c8 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit As an experiment, tried acquiring module ref count every time network device is ref counted. The result is discovering that there are cases in the Ethernet module init path where there is a call to dev_hold() without a previous explicit ref count. kernel BUG at include/linux/module.h:284! invalid operand: 0000 [#1] CPU: 0 EIP: 0060:[] Not tainted EFLAGS: 00010246 EIP is at linkwatch_fire_event+0x170/0x1a3 eax: 00000000 ebx: c047fad0 ecx: 00000020 edx: f88c8100 esi: f88c7100 edi: f6fa7000 ebp: f6f15de4 esp: f6f15dc8 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 408, threadinfo=f6f14000 task=f78f46a0) Stack: f88c7100 c03f7008 00000246 f6f14000 f6fa7000 00000000 fffc829b f6f15df8 f88c0ab9 f6fa7000 f6fa71e0 033002a8 f6f15e24 f88c00e0 f6fa71e0 00007148 c03e2e80 f6fa7320 c011eb46 ffffffef f6f15e3e f6fa71e0 fffc829b f6f15e50 Call Trace: [] +0x0/0x1180 [e100] [] e100_update_link_state+0x97/0xa2 [e100] [] e100_find_speed_duplex+0x20/0x26a [e100] [] sys_sched_yield+0xc0/0xfe [] e100_auto_neg+0x114/0x11c [e100] [] __delay+0x14/0x18 [] e100_phy_set_speed_duplex+0x37/0xa4 [e100] [] e100_phy_init+0x69/0x78 [e100] [] e100_hw_init+0x14/0x11e [e100] [] e100_rd_pwa_no+0x32/0x40 [e100] [] e100_init+0xf6/0x126 [e100] [] e100_found1+0x1a9/0x42e [e100] [] e100_driver_version+0x0/0xb [e100] [] e100_driver+0x0/0xa0 [e100] [] pci_device_probe+0x5a/0x68 [] e100_id_table+0x0/0x2e0 [e100] [] e100_driver+0x28/0xa0 [e100] [] bus_match+0x43/0x6e [] e100_driver+0x28/0xa0 [e100] [] e100_driver+0x28/0xa0 [e100] [] driver_attach+0x5c/0x60 [] e100_driver+0x28/0xa0 [e100] [] e100_driver+0x28/0xa0 [e100] [] bus_add_driver+0xb2/0xc8 [] e100_driver+0x28/0xa0 [e100] [] +0x0/0x1180 [e100] [] pci_register_driver+0x46/0x56 [] e100_driver+0x28/0xa0 [e100] [] +0x15/0x3e [e100] [] e100_driver+0x0/0xa0 [e100] [] sys_init_module+0x1b0/0x292 all_call+0x7/0xb Code: 0f 0b 1c 01 97 5a 32 c0 e9 d9 fe ff ff c7 04 24 0c 00 00 00 ./ifup: line 91: 408 Segmentation fault modprobe $1 >/dev/null 2>&1 --Multipart_Mon__5_May_2003_13:00:50_-0700_086866c8 Content-Type: application/octet-stream; name="netdev-module.diff" Content-Disposition: attachment; filename="netdev-module.diff" Content-Transfer-Encoding: base64 ZGlmZiAtdXJOcCAtWCBkb250ZGlmZiBsaW51eC0yLjUvaW5jbHVkZS9hc20taTM4Ni9hc21fb2Zm c2V0cy5oIGxpbnV4LTIuNS1kZXYvaW5jbHVkZS9hc20taTM4Ni9hc21fb2Zmc2V0cy5oCi0tLSBs aW51eC0yLjUvaW5jbHVkZS9hc20taTM4Ni9hc21fb2Zmc2V0cy5oCTE5NjktMTItMzEgMTY6MDA6 MDAuMDAwMDAwMDAwIC0wODAwCisrKyBsaW51eC0yLjUtZGV2L2luY2x1ZGUvYXNtLWkzODYvYXNt X29mZnNldHMuaAkyMDAzLTA1LTA1IDEwOjExOjUwLjAwMDAwMDAwMCAtMDcwMApAQCAtMCwwICsx LDIyIEBACisjaWZuZGVmIF9fQVNNX09GRlNFVFNfSF9fCisjZGVmaW5lIF9fQVNNX09GRlNFVFNf SF9fCisvKgorICogRE8gTk9UIE1PRElGWS4KKyAqCisgKiBUaGlzIGZpbGUgd2FzIGdlbmVyYXRl ZCBieSBhcmNoL2kzODYvTWFrZWZpbGUKKyAqCisgKi8KKworI2RlZmluZSBTSUdDT05URVhUX2Vh eCA0NCAvKiBvZmZzZXRvZiAoc3RydWN0IHNpZ2NvbnRleHQsIGVheCkgKi8KKyNkZWZpbmUgU0lH Q09OVEVYVF9lYnggMzIgLyogb2Zmc2V0b2YgKHN0cnVjdCBzaWdjb250ZXh0LCBlYngpICovCisj ZGVmaW5lIFNJR0NPTlRFWFRfZWN4IDQwIC8qIG9mZnNldG9mIChzdHJ1Y3Qgc2lnY29udGV4dCwg ZWN4KSAqLworI2RlZmluZSBTSUdDT05URVhUX2VkeCAzNiAvKiBvZmZzZXRvZiAoc3RydWN0IHNp Z2NvbnRleHQsIGVkeCkgKi8KKyNkZWZpbmUgU0lHQ09OVEVYVF9lc2kgMjAgLyogb2Zmc2V0b2Yg KHN0cnVjdCBzaWdjb250ZXh0LCBlc2kpICovCisjZGVmaW5lIFNJR0NPTlRFWFRfZWRpIDE2IC8q IG9mZnNldG9mIChzdHJ1Y3Qgc2lnY29udGV4dCwgZWRpKSAqLworI2RlZmluZSBTSUdDT05URVhU X2VicCAyNCAvKiBvZmZzZXRvZiAoc3RydWN0IHNpZ2NvbnRleHQsIGVicCkgKi8KKyNkZWZpbmUg U0lHQ09OVEVYVF9lc3AgMjggLyogb2Zmc2V0b2YgKHN0cnVjdCBzaWdjb250ZXh0LCBlc3ApICov CisjZGVmaW5lIFNJR0NPTlRFWFRfZWlwIDU2IC8qIG9mZnNldG9mIChzdHJ1Y3Qgc2lnY29udGV4 dCwgZWlwKSAqLworCisjZGVmaW5lIFJUX1NJR0ZSQU1FX3NpZ2NvbnRleHQgMTY0IC8qIG9mZnNl dG9mIChzdHJ1Y3QgcnRfc2lnZnJhbWUsIHVjLnVjX21jb250ZXh0KSAqLworCisjZW5kaWYKZGlm ZiAtdXJOcCAtWCBkb250ZGlmZiBsaW51eC0yLjUvaW5jbHVkZS9saW51eC9uZXRkZXZpY2UuaCBs aW51eC0yLjUtZGV2L2luY2x1ZGUvbGludXgvbmV0ZGV2aWNlLmgKLS0tIGxpbnV4LTIuNS9pbmNs dWRlL2xpbnV4L25ldGRldmljZS5oCTIwMDMtMDQtMTQgMTM6MzI6MjEuMDAwMDAwMDAwIC0wNzAw CisrKyBsaW51eC0yLjUtZGV2L2luY2x1ZGUvbGludXgvbmV0ZGV2aWNlLmgJMjAwMy0wNS0wNSAx MDowNjoxOS4wMDAwMDAwMDAgLTA3MDAKQEAgLTI5LDYgKzI5LDcgQEAKICNpbmNsdWRlIDxsaW51 eC9pZl9ldGhlci5oPgogI2luY2x1ZGUgPGxpbnV4L2lmX3BhY2tldC5oPgogI2luY2x1ZGUgPGxp bnV4L2tvYmplY3QuaD4KKyNpbmNsdWRlIDxsaW51eC9tb2R1bGUuaD4KIAogI2luY2x1ZGUgPGFz bS9hdG9taWMuaD4KICNpbmNsdWRlIDxhc20vY2FjaGUuaD4KQEAgLTYyOSwxMiArNjMwLDMyIEBA IGV4dGVybiBpbnQgbmV0ZGV2X2ZpbmlzaF91bnJlZ2lzdGVyKHN0cnUKIAogc3RhdGljIGlubGlu ZSB2b2lkIGRldl9wdXQoc3RydWN0IG5ldF9kZXZpY2UgKmRldikKIHsKKwltb2R1bGVfcHV0KGRl di0+b3duZXIpOwogCWlmIChhdG9taWNfZGVjX2FuZF90ZXN0KCZkZXYtPnJlZmNudCkpCiAJCW5l dGRldl9maW5pc2hfdW5yZWdpc3RlcihkZXYpOwogfQogCi0jZGVmaW5lIF9fZGV2X3B1dChkZXYp IGF0b21pY19kZWMoJihkZXYpLT5yZWZjbnQpCi0jZGVmaW5lIGRldl9ob2xkKGRldikgYXRvbWlj X2luYygmKGRldiktPnJlZmNudCkKK3N0YXRpYyBpbmxpbmUgdm9pZCBfX2Rldl9wdXQoc3RydWN0 IG5ldF9kZXZpY2UgKmRldikKK3sKKwltb2R1bGVfcHV0KGRldi0+b3duZXIpOworCWF0b21pY19k ZWMoJmRldi0+cmVmY250KTsKK30KKworc3RhdGljIGlubGluZSB2b2lkIGRldl9ob2xkKHN0cnVj dCBuZXRfZGV2aWNlICpkZXYpCit7CisJX19tb2R1bGVfZ2V0KGRldi0+b3duZXIpOworCWF0b21p Y19pbmMoJmRldi0+cmVmY250KTsKK30KKworc3RhdGljIGlubGluZSBpbnQgZGV2X3RyeV9ob2xk KHN0cnVjdCBuZXRfZGV2aWNlICpkZXYpCit7CisJaW50IHJldCA9IDA7CisJaWYgKHRyeV9tb2R1 bGVfZ2V0KGRldi0+b3duZXIpKXsKKwkJYXRvbWljX2luYygmZGV2LT5yZWZjbnQpOworCQlyZXQg PSAxOworCX0KKwlyZXR1cm4gcmV0OworfQogCiAvKiBDYXJyaWVyIGxvc3MgZGV0ZWN0aW9uLCBk aWFsIG9uIGRlbWFuZC4gVGhlIGZ1bmN0aW9ucyBuZXRpZl9jYXJyaWVyX29uCiAgKiBhbmQgX29m ZiBtYXkgYmUgY2FsbGVkIGZyb20gSVJRIGNvbnRleHQsIGJ1dCBpdCBpcyBjYWxsZXIKZGlmZiAt dXJOcCAtWCBkb250ZGlmZiBsaW51eC0yLjUvbmV0L2NvcmUvZGV2LmMgbGludXgtMi41LWRldi9u ZXQvY29yZS9kZXYuYwotLS0gbGludXgtMi41L25ldC9jb3JlL2Rldi5jCTIwMDMtMDUtMDUgMDk6 NDE6MDMuMDAwMDAwMDAwIC0wNzAwCisrKyBsaW51eC0yLjUtZGV2L25ldC9jb3JlL2Rldi5jCTIw MDMtMDUtMDUgMTA6MDY6MjAuMDAwMDAwMDAwIC0wNzAwCkBAIC0xLDMgKzEsNCBAQAorI2RlZmlu ZSBORVRfUkVGQ05UX0RFQlVHCTEKIC8qCiAgKiAJTkVUMwlQcm90b2NvbCBpbmRlcGVuZGVudCBk ZXZpY2Ugc3VwcG9ydCByb3V0aW5lcy4KICAqCkBAIC00NDAsOCArNDQxLDggQEAgc3RydWN0IG5l dF9kZXZpY2UgKmRldl9nZXRfYnlfbmFtZShjb25zdAogCiAJcmVhZF9