Received: by oss.sgi.com id ; Fri, 23 Jun 2000 20:09:44 -0700 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:54994 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Fri, 23 Jun 2000 20:09:20 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Fri, 23 Jun 2000 22:02:58 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id NQCKK3T5; Fri, 23 Jun 2000 22:06:04 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id NCLF8NNL; Sat, 24 Jun 2000 13:06:04 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.183]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id NAA28772; Sat, 24 Jun 2000 13:03:57 +1000 Message-ID: <3954262D.60BDEF41@uow.edu.au> Date: Sat, 24 Jun 2000 13:08:29 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Rusty Russell CC: Keith Owens , Philipp Rumpf , Alan Cox , "netdev@oss.sgi.com" Subject: Re: modular net drivers References: <20000623164805.AA5BB8154@halfway> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rusty Russell wrote: > > In message <20000622174858.304CE8154@halfway> I wrote: > > OK. Here is how it would work: > > Alternate solution to avoid module problems: Phil Rumpf and I came up > with basically identical answers. It assumes that MOD_INC_USE_COUNT > is always called in user context, and involves no changes to module > code. > > 1) static volatile int freeze[NR_CPUS]; Yup. I think this can be generalised and pushed out to userland more. A new system call: int sys_stop_cpu(int yep) sys_stop_cpu(1) Causes the current CPU to enter a busy loop, with local interrupts disabled. The return value is the number of CPUs which are _not_ captured by sys_stop_cpu. If the current CPU is the last CPU, sys_stop_cpu() will return 1 immediately. sys_stop_cpu(0) will unfreeze _all_ CPUs (I think this is a little racy...) So the idea is that a privileged app can loop doing clone()/sys_stop_cpu(1) until all CPUs have stopped. Then the privileged app can unload modules (or do anything else which requires total serialisation). The weakness in this (and in your proposal, Rusty) is the case where some module code does a schedule() when the module reference count is zero. I'm not aware of any which can do this, but all it would take is a kmalloc() within a netdriver's set_multicast_list/do_ioctl/get_stats/etc method. Two more things: You had: if (atomic_read(&mod->uc.use) == 0) mod->cleanup(); This could be changed to if (atomic_read(&mod->uc.use) == 0) { atomic_inc(&mod->uc.use); mod->cleanup(); atomic_dec(&mod->uc.use); } to avoid bizarre reentrancy happenings if the module destructor somehow calls schedule(). Finally, the net drivers seem to be the biggest problem at this time, and I think all their methods are called via ioctl on a socket. For 2.4 why the hell don't we just take the unlock_kernel() out of sock_ioctl()?