Received: by oss.sgi.com id ; Tue, 20 Jun 2000 22:15:13 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:39802 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 20 Jun 2000 22:14:53 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id WAA05667 for ; Tue, 20 Jun 2000 22:09:55 -0700 (PDT) mail_from (kaos@kao2.melbourne.sgi.com) Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA03066; Wed, 21 Jun 2000 15:12:15 +1000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Andrew Morton cc: "netdev@oss.sgi.com" Subject: Re: modular net drivers, take 2 In-reply-to: Your message of "Wed, 21 Jun 2000 04:24:01 GMT." <39504361.81F03943@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 21 Jun 2000 15:12:15 +1000 Message-ID: <13370.961564335@kao2.melbourne.sgi.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, 21 Jun 2000 04:24:01 +0000, Andrew Morton wrote: >Keith Owens wrote: >> Anything sleeping loses the lock. Any sleep in module open code primes >> the race, if the module_exit code also sleeps the race is triggered. > >You're a hard man, Mr Owens. I try ... >So sys_delete_module() isn't allowed to sleep. It's hard to make this >rule future-safe. Impossible because sys_delete_module() calls module_exit() which is allowed to do anything. >Do you think that the concept of grabbing the entire machine during >module unload is an acceptable one? I think it is, because the act of >actually unloading kernel text is so unique and traumatic. It is the best solution, if it can be done. But I have not found any method of doing this. >sys_delete_module() >{ > ... > spin_lock(&module_deletion_lock); > blocked_cpus = 1 << smp_processor_id(); > while (blocked_cpus != ((1 << smp_num_cpus) - 1)) > ; > { > I think the only code whcih needs to go in > here is the call to vfree(module). sys_delete_module() -> free_module() -> mod->cleanup() -> module_exit() which is entered with module_deletion_lock held. You just constrained all module cleanup code to never sleep - no chance. For sys_delete_module() to "grab" the entire machine it has to exclude all processors from entering the module being unloaded (not too difficult), to verify that no processor is currently executing the code pages (a bit harder) and that no suspended process or timer queue will ever pop its stack and return into those code pages (the really hard bit).