Received: by oss.sgi.com id ; Mon, 26 Jun 2000 15:23:01 -0700 Received: from pix142166199162.nbtel.net ([142.166.199.162]:62206 "EHLO pzero.sandelman.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Mon, 26 Jun 2000 15:22:36 -0700 Received: from morden.sandelman.ottawa.on.ca (localhost [127.0.0.1]) by pzero.sandelman.ottawa.on.ca (8.8.8/8.8.8) with ESMTP id MAA03950 for ; Mon, 26 Jun 2000 12:14:43 -0600 (MDT) Message-Id: <200006261814.MAA03950@pzero.sandelman.ottawa.on.ca> To: "netdev@oss.sgi.com" Subject: Re: modular net drivers In-reply-to: Your message of "Wed, 21 Jun 2000 07:49:40 +1000." <4450.961537780@ocs3.ocs-net> Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Mon, 26 Jun 2000 12:14:03 -0600 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2452 Lines: 50 >>>>> "Keith" == Keith Owens writes: Keith> On Tue, 20 Jun 2000 17:01:56 +1000, Keith> Rusty Russell wrote: >> Keith Owens wrote: >>> It is also an important bug fix. The module code has suffered from >>> unload races ever since the kernel locking became fine grained, users >>> can crash the kernel. >> >> Races which can be largely solved at the moment by having the module >> page removal code sync all bh's and softirqs after calling cleanup(). >> Hell, we could even poll all CPUs and check they're not executing in >> the about-to-be-freed pages. Speed is completely unimportant here. Keith> This race is not obvious but IMHO it exists. The original theory was Keith> Kernel load and unload code runs under the big kernel lock. Keith> open() and similar code runs under the big kernel lock. Keith> If the code does MOD_INC_USE_COUNT before sleeping then we are safe. Keith> But consider this race, even on UP. Keith> Module has been used, nothing is currently using it, use_count == 0. Keith> rmmod runs, either manual or autoclean. Keith> The module is marked as being deleted. Keith> module_cleanup() is entered, does I/O, sleeps, loses big kernel The module_cleanup() is broken in that case. It should get all resources (i.e. locks) that it needs before doing *anything* and should release all resources as soon as it fails to get any others. To do anything else is to cause a possible deadlock. This is textbook multiprocessing. Keith> AFAICT the only safe mechanism is one that checks the module state Keith> *before* entering the module. Once you enter the module and sleep all Keith> bets are off. And that means exporting the module information to the Keith> open() layer, which is what Al Viro has been doing. Is the "module is marked as being deleted" the info that is passed to open()? "Module is deleted" is an atomic operation. It either occurs because use_count==0, or it fails, and all further calls to the module don't find it. ] Out and about in Ottawa. hmmm... beer. | firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[ ] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [