From kingsley@aurema.com Wed Sep 14 02:09:23 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 14 Sep 2005 02:09:28 -0700 (PDT) Received: from smtp.sw.oz.au (alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8E99KiL028341 for ; Wed, 14 Sep 2005 02:09:22 -0700 Received: from smtp.sw.oz.au (localhost [127.0.0.1]) by smtp.sw.oz.au with ESMTP id j8E96RXk006835; Wed, 14 Sep 2005 19:06:27 +1000 (EST) Received: (from kingsley@localhost) by smtp.sw.oz.au id j8E96QfM006834; Wed, 14 Sep 2005 19:06:26 +1000 (EST) Date: Wed, 14 Sep 2005 19:06:26 +1000 From: kingsley@aurema.com To: Erik Jacobson Cc: pagg@oss.sgi.com, tonyt@aurema.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 Message-ID: <20050914090626.GK13682@aurema.com> References: <20050617014512.GA10285@aurema.com> <20050623143301.GB32764@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050623143301.GB32764@sgi.com> User-Agent: Mutt/1.4.2.1i X-Scanned-By: MIMEDefang 2.52 on 192.41.203.35 X-archive-position: 104 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg On Thu, Jun 23, 2005 at 09:33:01AM -0500, Erik Jacobson wrote: > Kingsley - my first attempt skipped the list, sorry. > > > While testing the propagation of pagg_attach errors to fork() I > > noticed that the detach callback is called again for the client > > I'm sorry it's taking a while for me to get back to you. > I had kicked your patch around to a couple people internally and > I think we want to investigate the error path more before we > take it as part of the PAGG patch. > > Does anybody else on the list have thoughts on this change? > > Thanks for the submission. I'd like to do a bit more research. Hi Erik, Has there been any progress on this? Thanks, -- Kingsley From erikj@sgi.com Fri Sep 16 08:30:12 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Sep 2005 08:30:18 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8GFUCiL014323 for ; Fri, 16 Sep 2005 08:30:12 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8GHTYJ4009509 for ; Fri, 16 Sep 2005 10:29:34 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8GFQ3DN15764529 for ; Fri, 16 Sep 2005 10:26:03 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8GFQ2S93473994 for ; Fri, 16 Sep 2005 10:26:02 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id BD4976022F4A; Fri, 16 Sep 2005 10:26:02 -0500 (CDT) Date: Fri, 16 Sep 2005 10:26:02 -0500 From: Erik Jacobson To: pagg@oss.sgi.com Subject: Future of PAGG? Message-ID: <20050916152602.GB4739@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 105 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg To be honest, I've been a bit frustrated at how to proceed with PAGG. Yesterday, we kicked around some ideas to perhaps propose to the community that implement things other ways. As far as we can tell, these things are not as efficient as PAGG. For example, we explored the use of notifier lists that are already available in the kernel. This implements the callback portion of PAGG, but not the portion that associates data per-task. Another co-worker also observed that locking of notifier lists isn't really provided by the notifier list infrastructure itself - so unless you're really careful, a module could remove itself from the list while it's being walked. Another problem with notifier lists is that it probably would have reduced performance. Instead of knowing you can "do nothing" if the pagg list is null per task (and that task is probably already cached on the machine), you have to walk a notifier list each time. It will possibly reduce fork performance. The only reason a pagg-like system using notifier lists would get accepted is because it uses tools already in the kernel instead of our own setup. So this part might be attractive to some in the community. In the past, we had pushed PAGG more for its grouping abilities rather than calling it something like "task notifier list with data" (or some slick name that means that). We described it like this somewhat because the community seems to frown on generic callbacks. Perhaps the world has changed now. After all, notifier lists are generic callouts and are in the kernel now... Of course, they aren't in the fork or exit paths. All of this is compounded by a lack of support from PAGG's users. We know various people outside of SGI use PAGG, but they never have stepped up to say they are users when it counts. If we could have some users, our community position would be improved. I think having callouts in fork, exec, and exit are really needed. If you look at what we do during a fork in the kernel, you can spot a few things in the generic kernel itself that could use generic callouts or PAGG instead. It has the potential to reduce, at least a little bit, the number of calls made in a fork. At a meeting yesterday, I was asked to look in to implementing Linux Job, SGI inescapable Job Containers, without using PAGG. Instead, I was asked to try these notifier lists. Because we don't feel we can get a Job ID in the task struct, I'll need to implement table lookups to associate a task with data about the task. After some discussion today, I'm not sure notifier lists are the answer due to reduced performance and locking issues. So what do you think? Should I try to implement a reduced version of PAGG that uses notifier lists for the callout piece and pagg lists like we have today for the task associated data? (performance issues, locking issues with notifier lists). Again, this is only attractive because it uses tools in the kernel itself. I thought one idea is I could give PAGG, mostly as-is, one more shot. I can reduce it to it's bare essentials, perhaps removing some functionality. I can re-name it to something that better describes what it does, and try once again to get it accepted by the community of LKML. I thought I'd start on LSE-tech before LKML to get some ideas. Does this sound like a good approach? I'd like to work with this mailing list to try to organize support for this. If there are PAGG users, and you don't want to see us stop maintaining PAGG, maybe you could join the discussion so people know the patch is used. SGI really needs something that is PAGG-like for its open sourced projects such as Job, CSA, and two open-source but non-pushed projects in-house. But the community is interested in more than "a patch SGI needs for itself." If we can't get PAGG in, we'll have to work out other ways to get our open source projects that use PAGG accepted. In the end, I think lack of users is the biggest problem with getting something PAGG-like accepted. Please let me know if there are other ideas. From erikj@sgi.com Sat Sep 17 08:36:53 2005 Received: with ECARTIS (v1.0.0; list pagg); Sat, 17 Sep 2005 08:36:58 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8HFariL003974 for ; Sat, 17 Sep 2005 08:36:53 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8HHanmx023368 for ; Sat, 17 Sep 2005 10:36:49 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8HFYADN15832790 for ; Sat, 17 Sep 2005 10:34:10 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8HFYAS93528299 for ; Sat, 17 Sep 2005 10:34:10 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 16FAD6028D21; Sat, 17 Sep 2005 10:34:10 -0500 (CDT) Date: Sat, 17 Sep 2005 10:34:10 -0500 From: Erik Jacobson To: pagg@oss.sgi.com Subject: PAGG ideas for next attempt: new docs, new name? Message-ID: <20050917153409.GA17708@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 106 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg I'm looking for feedback on these ideas. I'm sending this to the PAGG list. After I gather feedback from you and some co-workers, I'll be posting this to lse-tech and some other folks as well. I'll then work on the code side of the changes. Please see the justification section for sure. I'm not sure I should say that stuff - so let me know if it is silly to put there. (I'll wait until Monday or Tuesday to send this off to a broader audience to be sure it isn't lost in the weekend). === I am re-working what used to be PAGG to have a new name, better documentation, and better variable names. My hope is that I can present this to the community for inclusion in the kernel and I'm hoping to have a couple of the users of this help by explaining how they use it. I feel one reason PAGG didn't get attention was because it's true function was obscured by its name and the names of functions and variables within. The first step of this for me was to write some new documentation using the new names for the pieces. Before I propose this to the broader community, I'd like to get feedback. After that, I plan to re-write the code to match and post it. If a variable seems too long (some are), perhaps provide a suggested shorer name. The name of PN itself is fair game. It turns out it was hard to pick a name for this thing. Process Notification (PN) -------------------- PN provides a method (service) for kernel modules to be notified when certain events happen in the life of a process. Events we support include fork, exit, and exec. A special init event is also supported (see events below). More events could be added. PN also provides a generic data pointer for the modules to work with so that data can be associated per process. A kernel module will register (pn_register) a service request (pn_service_request) with PN. The request tells PN which notifications the kernel module wants. The kernel module passes along function pointers to be called for these events (exit, fork, exec) in the service request. >From the process point of view, each process has a kernel module subscriber list (pn_module_subscriber_list). These kernel modules are the ones who want notification about the life of the process. As described above, each kernel module subscriber on the list has a generic data pointer to point to data associated with the process. In the case of fork, PN will allocate the same kernel module subscriber list for the new child that existed for the parent. The kernel module's function pointer for fork is also called so the kernel module can do what ever it needs to do when a parent forks. For exit, similar things happen but the exit function pointer for each kernel module subscriber is called and the kernel module subscriber list for that task is deleted. Events ------ Events are stages of a processes life that kernel modules care about. The fork event is a spot in copy_process when a parent forks. The exit event happens when a process is going away. We also support an exec event, which happens when a process execs. Finally, there is an init event. This special event makes it so this kernel module will be associated with all current processes in the system. This is used when a kernel module wants to keep track of all current processes as opposed to just those it associates by itself (and children that follow). The events a kernel module cares about are set up in the pn_service_request structure - see usage below. When setting up a pn_service_request structure, you designate which events you care about by either associating NULL (meaning you don't care about that event) or a pointer to the function to run when the event is triggered. fork and exit are currently required. How do processes become associated with kernel modules? ------------------------------------------------------- Your kernel module itself can use the pn_alloc function to associate a given process with a given pn_service_request structure. This adds your kernel module to the subscriber list of the process. In the case of inescapable job containers making use of PAM, when PAM allows a person to log in, PAM contacts job (via a PAM job module which uses the job userland library) and the kernel Job code will call pn_alloc to associate the process with PN. From that point on, the kernel module will be notified about events in the process's life that the module cares about. Likewise, your kernel module can remove an association between it and a given process by using pn_subscriber_free. Example Usage ------------- === filling out the pn_service_request structure === A kernel module wishing to use PN needs to set up a pn_service_request structure. This structure tells PN which events you care about and what functions to call when those events are triggered. In addition, you supply a name (usually the kernel module name). The entry is always filled out as shown below. .module is usually set to THIS_MODULE. data can be optionally used to store a pointer with the service request structure. Example of a filled out pn_service_request: static struct pn_service_request pn_service_request = { .module = THIS_MODULE, .name = "test_module", .data = NULL, .entry = LIST_HEAD_INIT(pn_service_request.entry), .init = test_init, .fork = test_attach, .exit = test_detach, .exec = test_exec, }; The above pn_service_request says the kernel module "test_module" cares about events fork, exit, exec, and init. In fork, call the kernel module's test_attach function. In exec, call test_exec. In exit, call test_detach. The init event is specified, so all processes on the system will be associated with this kernel module and the test_init function will be run for each. === Registering with PN === You will likely register with PN in your kernel module's module_init function. Here is an example: static int __init test_module_init(void) { int rc = pn_register(&pn_service_request); if (rc < 0) { return -1; } return 0; } === Example init event function ==== Since the init event is defined, it means this kernel module is added to the subscriber list of all processes -- it will receive notification about events it cares about for all processes and all children that follow. Of course, if a kernel module doesn't need to know about all current processes, that module shouldn't implement this and '.init' in the pn_service_request structure would be NULL. This is as opposed to the normal method where the kernel module adds itself to the subscriber list of a process using pn_alloc. static int test_init(struct task_struct *tsk, struct pn_subscriber *subscriber) { if (pn_get_subscriber(tsk, "test_module") == NULL) dprintk("ERROR PN expected \"%s\" PID = %d\n", "test_module", tsk->pid); dprintk("FYI PN init hook fired for PID = %d\n", tsk->pid); atomic_inc(&init_count); return 0; } === Example fork (test_attach) function === This function is executed when a process forks - this is associated with the pn_callout callout in copy_process. There would be a very similar test_detach function (not shown). PN will add the kernel module to the notification list for the child process automatically and then execute this fork function pointer (test_attach in this example). However, the kernel module can control if the kernel module stays on the processes's subscriber list and wants notification by the return value. A negative value results in the fork failing. zero is success. >0 means success, but the kernel module doesn't want the to be associated with that specific process (doesn't want notification). In other words, if >0 is returned, your kernel module is saying that it doesn't want to be on the subscriber list for this process. static int test_attach(struct task_struct *tsk, struct pagg *pagg, void *vp) { dprintk("PN attach hook fired for PID = %d\n", tsk->pid); atomic_inc(&attach_count); return 0; } === Example exec event function === And here is an example function to run when a task gets to exec. So any time a "tracked" process gets to exec, this would execute. More hooks/callouts similar to this one could be implemented as there is demand for them. static void test_exec(struct task_struct *tsk, struct pn_subscriber *subscriber) { dprintk("PN exec hook fired for PID %d\n", tsk->pid); atomic_inc(&exec_count); } === Unregistering with PN === You will likely wish to unregister with PN in the kernel module's module_exit function. Here is an example: static void __exit test_module_cleanup(void) { pn_unregister(&pn_service_request); printk("detach called %d times...\n", atomic_read(&detach_count)); printk("attach called %d times...\n", atomic_read(&attach_count)); printk("init called %d times...\n", atomic_read(&init_count)); printk("exec called %d times ...\n", atomic_read(&exec_count)); if (atomic_read(&attach_count) + atomic_read(&init_count) != atomic_read(&detach_count)) printk("PN PROBLEM: attach count + init count SHOULD equal detach cound and doesn't\n"); else printk("Good - attach count + init count equals detach count.\n"); } === Actually using data associated with the process in your module === The above examples show you how to create an example kernel module using PN, but it doesn't show what you might do with the data pointer associated with a given process. Linux Inescapable Jobs is a good example of making use of PN. Some versions of it use PAGG, which is what PN is based on. A new Job patch should be available soon if not already. See oss.sgi.com/projects/pagg. A Job is a group of processes from which a process cannot escape. A batch scheduling system such as LSF may use Job to put possibly otherwise unrelated processes together to be tracked and signaled as ia set including any children that follow. If the Job PAM module is used, each login processes gets a job ID and the children become part of the job by default. In Job, we want to know whenever a parent forks a new process or whenever a process exits. So Job gets notified for these events, and adds the process to the list of processes in the job (or removes then in the case of exit). To efficiently add a job, we need to know which Job the parent was in. This information, in our case, is what is stored in the data pointer within the pn_subscriber structure associated with a given process. pn_get_subscriber is used to retrieve the PN subscriber for a given process and kernel module. Like this: subscriber = pn_get_subscriber(task, name); Where name is your kernel module's name (as provided in the pn_service_request structure) and task is the process you're interested in. Please be careful about locking. The task structure has a pn_subscriber_list_sem to be used for locking. An example code snip follows: /* We have a valid task now */ get_task_struct(task); /* Ensure the task doesn't vanish on us */ read_unlock(&tasklist_lock); /* Unlock the tasklist */ down_write(&task->pn_subscriber_list_sem); /* write lock subscriber list */ subscriber = pn_get_subscriber(task, pagg_hook.name); if (subscriber) { detachpid.r_jid = ((struct job_attach *)subscriber->data)->job->jid; subscriber->pn_subscriber_request->detach(task, subscriber); pn_subscriber_free(subscriber); } else { errcode = -ENODATA; } put_task_struct(task); /* Done accessing the task */ up_write(&task->pn_subscriber_list_sem); /* write unlock subscriber list */ In the above snip, we make sure we have a task that won't disappear on us. Then we write lock the pn_subscriber_list-sem to be sure it doesn't change on it. We write lock (rather than read) because we're going to be removing an entry from it. If there is a subscriber for this kernel module matching the given process, we store the jid (job identifier in Job), we call our own detach function directly (in Job, this associated with the exit event), and we remove the subscriber from the subscriber list. This means this kernel module will no longer get notifications of events for this task. The detachjid.r_jid line above is an example of retrieving data from the data pointer for the given subscriber. History ------- Process Notification used to be known as PAGG (Process Aggregates). It was re-written to be called Process Notification because we believe this better describes its purpose. Structures and functions were re-named to be more clear and to reflect the new name. Why Not Notifier Lists? ----------------------- We investigated the use of notifier lists, available in newer kernels. There were two reasons we didn't use them to implement PAGG. 1) There seems to be some tricky locking issues with notifier lists. For example, if a kernel module exits while the notifier list is walked, we could have trouble. There may be means to work around this 2) Notifier lists would not be as efficient as PN for kernel modules wishing to associate data with processes. With PN, if the pn_subscriber_list of a given task is NULL, we can instantly know there are no kernel modules that care about the process. Further, the callbacks happen in places were the task struct is likely to be cached. So this is a quick operation. With notifier lists, the scope is system wide rather than per process. As long as one kernel module wants to be notified, we have to walk the notifier list and potentially waste cycles. Some Justification ------------------ Some have argued that PAGG in the past shouldn't be used because it will allow interesting things to be implemented outside of the kernel. While this might be a small risk, having these in place allows customers and users to implement kernel components that you don't want to see in the kernel anyway. SGI may have HPC needs that very few other people are interested in. We in fact have 4 open source projects that make use of PAGG (and will convert to PN). At least one of these projects is urgent for our customers but is simply not interesting to enough people to maintain in the kernel itself. In a world where all customers need to run on standard distributions to be supported by the distributor, we're left in a situation where: a) The distributor doesn't want to take patches not accepted in the kernel b) The community wants everything important in the kernel c) The community wants only things having multiple users in the kernel d) SGI has things that are only interesting to SGI systems and it's customers (not multiple users) e) There is no option to re-build kernels while staying in a supported environment. We find it hard to support customers in this catch 22 situation. PN allows us to implement our open source projects outside of the mainline kernel. We do offer things like Job for inclusion, but so far haven't met with success in getting it accepted. We feel PN is very useful for kernel components already in the kernel too. There is a potential to reduce the number of calls in the copy_process path, for example. One could also envision things in the task struct that are used slightly less frequently could be implemented to use PN. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From pj@sgi.com Sat Sep 17 10:47:23 2005 Received: with ECARTIS (v1.0.0; list pagg); Sat, 17 Sep 2005 10:47:28 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8HHlNiL011877 for ; Sat, 17 Sep 2005 10:47:23 -0700 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8HJlKMi012025 for ; Sat, 17 Sep 2005 12:47:20 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by nodin.corp.sgi.com (SGI-8.12.5/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8HHhebT110505423 for ; Sat, 17 Sep 2005 10:43:40 -0700 (PDT) Received: from v0 (mtv-vpn-hw-masa-1.corp.sgi.com [134.15.25.210]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with SMTP id j8HHgdps14648002; Sat, 17 Sep 2005 10:42:39 -0700 (PDT) Date: Sat, 17 Sep 2005 10:42:39 -0700 From: Paul Jackson To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: Re: PAGG ideas for next attempt: new docs, new name? Message-Id: <20050917104239.26cb7e49.pj@sgi.com> In-Reply-To: <20050917153409.GA17708@sgi.com> References: <20050917153409.GA17708@sgi.com> Organization: SGI X-Mailer: Sylpheed version 2.0.0beta5 (GTK+ 2.4.9; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 107 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: pj@sgi.com Precedence: bulk X-list: pagg Erik wrote: > I'm looking for feedback on these ideas. Oohhh - lots of nice words. A couple of random thoughts on first glance now, then I will try to give this a closer read later today. Let me place in evidence the other notifier thingies currently in the kernel: dnotify - directory (Stephen Rothwell) fsnotify - filesystem (Robert Love - a 'FAM' like thing) inotify - inode based (John McCutchan - basis of fsnotify) notify - generic (Alan Cox needed for network devices) First, observe that these other notifiers don't use two-letter acronyms, but rather pseudo-words, to name themselves and to prefix their kernel global symbols. The "TN" name is too cryptic. You need a pseudo-word, that you use consistently and methodically, every place possible. When someone sees a line of kernel code mentioning "inotify_inode_queue_event", they have a pretty good idea what sort of subsystem is involved. When someone sees a mention of "pn_get_subscriber", they will likely not realize this as quickly. Perhaps a few long standing types in the kernel, such as tasks and inodes, get to use the very short, familiar names of just a letter or two, but the less well known types requirer longer more explicit names. Besides the base name 'notify', two other possibilities that come to my mind for the base part of the name are 'callout' and 'hook'. I'm partial to 'callout'. For one thing, this distinguishes rather nicely between two different mechanisms: 1) Some thread is asking to have notice sent to it of particular kinds of events, and 2) You want threads to callout to an extra piece of code when they undergo particular kinds of events. The rule of thumb I'd suggest is to use 'notify' when the receiver is some other thread, and 'callout' when the receiver is a code snippet executing in the context of the thread originally experiencing the event of interest. Beware that the above "notify" mechanisms may or may not follow this rule of thumb; I don't know without thinking harder than I want to right now. There are 268 instances of the 7-char string "callout" in all the kernel source, 5273 instances of the 5-char string "notif", and 2456 instances of the 4-char string "hook". In the 28534 symbols that list in a "nm vmlinux" of a kernel I have at hand, there are 10 instances of the 4-char string "hook", 161 instances of the 5-char string "notif", and zero (0) of "callout". So besides having a suitable meaning, "callout" doesn't collide with existing kernel names. Other words that come to mind that might be worth playing with here: trigger, event, handler and exit (as in IBM's MVS "user exit", "file exit" and "installation exit".) You might want to peruse the literature for IBM-style exits, and look for an opportunity to get someone from IBM with experience in such mechanisms to contemplate what it would take to provide them for Linux in a community acceptable form. However, "callout" will convey the intended meaning to far more Linux hackers than "exit", which will only convey the intended sense to those with IBM background (or at least a beer drinking friend who is expert in such ;). If this were MVS, I'd be recommending "task exit". In any event, you might want to list the other notifier like mechanisms (listed above) in your post, and compare and contrast (whatever happened to Carl Rigg ?) them with your proposed mechanism. Anyhow ... a couple more thoughts, besides the naming issue. If the current notifier lists have technical limitations with locking and efficiency, then what would it take to fix them up, rather than introduce a new mechanism? Are these limitations inherent and unavoidable in any mechanism that has the API of the current notifier lists, or are they an internal accident of the implementation? If the latter, can the implementation be fixed? If the former, can you clearly explain why notifier list, or anything so conceived and so dedicated with such an API, must necessarily suffer from such technical limitations? A key concern, which you face head on (good!) is that such mechanisms as this "allow interesting things to be implemented outside of the kernel." You explain nicely why we need such, but you don't explain how we keep some proprietary competitor of Linux from abusing your mechanism. I'd prefer that this mechanism only allow GPL loadable modules to hook into it, and I wish there were someway to ensure that the portion "outside the kernel" was also GPL. There are legal and competitive business issues here that need to be addressed. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.925.600.0401 From pj@sgi.com Mon Sep 19 02:41:31 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 19 Sep 2005 02:41:41 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8J9fViL013075 for ; Mon, 19 Sep 2005 02:41:31 -0700 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8JBfFo6021384 for ; Mon, 19 Sep 2005 04:41:15 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by nodin.corp.sgi.com (SGI-8.12.5/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8J9bLbT114872374 for ; Mon, 19 Sep 2005 02:37:21 -0700 (PDT) Received: from v0 (mtv-vpn-hw-masa-1.corp.sgi.com [134.15.25.210]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with SMTP id j8J9aKps14991294; Mon, 19 Sep 2005 02:36:20 -0700 (PDT) Date: Mon, 19 Sep 2005 02:36:20 -0700 From: Paul Jackson To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: Re: PAGG ideas for next attempt: new docs, new name? Message-Id: <20050919023620.38ec1820.pj@sgi.com> In-Reply-To: <20050917153409.GA17708@sgi.com> References: <20050917153409.GA17708@sgi.com> Organization: SGI X-Mailer: Sylpheed version 2.0.0beta5 (GTK+ 2.4.9; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 108 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: pj@sgi.com Precedence: bulk X-list: pagg Erik wrote: > I feel one reason PAGG didn't get attention was because it's true function > was obscured by its name and the names of functions and variables within. I tend to agree. > Finally, there is an init event. When does this init event occur - at the beginning of something, I presume. I'm just not clear of what. > fork event is a spot in copy_process when a parent forks. So this event is in the parent, not the child? That seems slightly odd. > The init event is specified, so all processes on the system will > be associated with this kernel module and the test_init function > will be run for each. Ah .. run when? Perhaps the last line above would be clearer as: > will be run for each process in the system when the module is loaded. > int rc = pn_register(&pn_service_request); > if (rc < 0) { > return -1; > } Should that "return -1" be a "return -ERRNO" for some error number? > unrelated processes together to be tracked and signaled as ia set Is that "ia" a typo? > /* We have a valid task now */ > get_task_struct(task); /* Ensure the task doesn't vanish on us */ > read_unlock(&tasklist_lock); /* Unlock the tasklist */ > ... What is the piece of code, beginning with these three lines, doing? > In a world where all customers need to run on standard distributions to > be supported by the distributor, we're left in a situation where: Can this section be turned around into something more positive, and less SGI specific. And can the problems that seem to be associated with this be directly addressed: 1) It could be abused by competitors of Open Source, to leverage Linux kernel work while avoiding GPL constraints on their key code (much as happens with device drivers now, e.g. Nvidia). 2) It opens up a Pandoras box of opportunities for poor quality (or at least inadequately tested) code compromising the stability of the system, with attendant support nightmares. For example, "user exit" code for IBM operating systems was one of the areas that had the greatest difficulty adapting to Y2K. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.925.600.0401 From erikj@sgi.com Mon Sep 19 06:23:55 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 19 Sep 2005 06:24:02 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8JDNtiL002736 for ; Mon, 19 Sep 2005 06:23:55 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8JFNkF5018782 for ; Mon, 19 Sep 2005 08:23:46 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8JDJnDN15950134; Mon, 19 Sep 2005 08:19:49 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8JDJnS93642837; Mon, 19 Sep 2005 08:19:49 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 0ED146022F4A; Mon, 19 Sep 2005 08:19:49 -0500 (CDT) Date: Mon, 19 Sep 2005 08:19:49 -0500 From: Erik Jacobson To: Paul Jackson Cc: Erik Jacobson , pagg@oss.sgi.com Subject: Re: PAGG ideas for next attempt: new docs, new name? Message-ID: <20050919131948.GA4488@sgi.com> References: <20050917153409.GA17708@sgi.com> <20050917104239.26cb7e49.pj@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050917104239.26cb7e49.pj@sgi.com> User-Agent: Mutt/1.5.6i X-archive-position: 109 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg > A couple of random thoughts on first glance now, then I will try to > give this a closer read later today. Hi Paul. I changed over to pnotify, which makes variable names really long. But I understand what you're saying here. > If the current notifier lists have technical limitations with locking > and efficiency, then what would it take to fix them up, rather than > introduce a new mechanism? Are these limitations inherent and > unavoidable in any mechanism that has the API of the current notifier > lists, or are they an internal accident of the implementation? If the > latter, can the implementation be fixed? If the former, can you > clearly explain why notifier list, or anything so conceived and so > dedicated with such an API, must necessarily suffer from such technical > limitations? I'm still munching on this part. > A key concern, which you face head on (good!) is that such mechanisms > as this "allow interesting things to be implemented outside of the > kernel." You explain nicely why we need such, but you don't explain I added a blurb about exporting the symbols with EXPORT_SYMBOL_GPL. I also changed the Justification quite a bit per your suggestions in a separate email. I'll post a new version a little later. From erikj@sgi.com Mon Sep 19 07:09:11 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 19 Sep 2005 07:09:26 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8JE9BiL004595 for ; Mon, 19 Sep 2005 07:09:11 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8JG95ve024831 for ; Mon, 19 Sep 2005 09:09:05 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8JE59DN15955434; Mon, 19 Sep 2005 09:05:09 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8JE59S93652656; Mon, 19 Sep 2005 09:05:09 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 0E89E6022F4A; Mon, 19 Sep 2005 09:05:09 -0500 (CDT) Date: Mon, 19 Sep 2005 09:05:09 -0500 From: Erik Jacobson To: Paul Jackson Cc: Erik Jacobson , pagg@oss.sgi.com Subject: Re: PAGG ideas for next attempt: new docs, new name? Message-ID: <20050919140508.GA8488@sgi.com> References: <20050917153409.GA17708@sgi.com> <20050917104239.26cb7e49.pj@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050917104239.26cb7e49.pj@sgi.com> User-Agent: Mutt/1.5.6i X-archive-position: 110 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg > If the current notifier lists have technical limitations with locking > and efficiency, then what would it take to fix them up, rather than > introduce a new mechanism? Are these limitations inherent and > unavoidable in any mechanism that has the API of the current notifier > lists, or are they an internal accident of the implementation? If the > latter, can the implementation be fixed? If the former, can you > clearly explain why notifier list, or anything so conceived and so > dedicated with such an API, must necessarily suffer from such technical > limitations? I removed the stuff I said about locking issues. They probably exist, but I am not quite sure how they would be solved. So I instead focused on the efficency aspects. The reason they are less efficient is because, as long as there is one subscriber to the notifer list somewhere on the system, you always have a list to walk. With process notification, you only walk the list if a kernel module is interested in a given task. That way, if a kernel module is only associated with a few tasks on the system, we don't end up walking lists all the time. The other piece that is missing is a data pointer associated with a task. Without that, you'd have to add entries to the task struct or implement table lookups to find data associated with processes. One solution that Jack Steiner actually wrote up a prototype for over the weekend is notifier lists in the task struct itself. So if that is interesting to folks, we have at least some data on it. I haven't tried to implement Job on top of it yet but if people think that direction is interesting, I can implement Job on this sooner. Otherwise, I'm more comfortable with something closer to PAGG that has received a lot of exposure already and has all the features the community here has requested so far (except for one outstanding request from Kingsley). Erik From erikj@sgi.com Mon Sep 19 08:08:17 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 19 Sep 2005 08:08:31 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8JF8GiL007310 for ; Mon, 19 Sep 2005 08:08:17 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8JF5XxT003086 for ; Mon, 19 Sep 2005 10:05:33 -0500 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8JF5VDN15957018; Mon, 19 Sep 2005 10:05:32 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8JF5US93659905; Mon, 19 Sep 2005 10:05:30 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id CC4826022F4A; Mon, 19 Sep 2005 10:05:30 -0500 (CDT) Date: Mon, 19 Sep 2005 10:05:30 -0500 From: Erik Jacobson To: Paul Jackson Cc: Erik Jacobson , pagg@oss.sgi.com Subject: Re: PAGG ideas for next attempt: new docs, new name? Message-ID: <20050919150530.GD8488@sgi.com> References: <20050917153409.GA17708@sgi.com> <20050919023620.38ec1820.pj@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050919023620.38ec1820.pj@sgi.com> User-Agent: Mutt/1.5.6i X-archive-position: 111 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg > When does this init event occur - at the beginning of something, > I presume. I'm just not clear of what. Ok, I added "at the time of registration": ... Finally, there is an init event. This special event makes it so this kernel module will be associated with all current processes in the system at the time of registration. This is used when a kernel module wants to keep track of all current processes as opposed to just those it associates by itself (and children that follow). > > fork event is a spot in copy_process when a parent forks. > > So this event is in the parent, not the child? That seems > slightly odd. The module gets notified when the parent forks and the child is being created. The child receives the same allocation list that the parent had but the kernel module has some control here based on return value to decide if the new process should really be associated with the kernel module or not. > > int rc = pn_register(&pn_service_request); > > if (rc < 0) { > > return -1; > > } > > Should that "return -1" be a "return -ERRNO" for some error number? I'm not sure; we certainly haven't been doing that so far. > > unrelated processes together to be tracked and signaled as ia set > Is that "ia" a typo? Yes :) > > /* We have a valid task now */ > > get_task_struct(task); /* Ensure the task doesn't vanish on us */ > > read_unlock(&tasklist_lock); /* Unlock the tasklist */ > > ... > What is the piece of code, beginning with these three lines, doing? I'll try to come up with a better generic example. It's supposed to show how to use the data pointer because the over-simple examples for using pnotify earlier in the doc aren't sophisticated enough to show that. Erik From erikj@sgi.com Mon Sep 19 08:38:31 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 19 Sep 2005 08:38:38 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8JFcViL012864 for ; Mon, 19 Sep 2005 08:38:31 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8JHc6BZ004036 for ; Mon, 19 Sep 2005 10:38:06 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8JFY9DN15961102 for ; Mon, 19 Sep 2005 10:34:09 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8JFY9S93624561 for ; Mon, 19 Sep 2005 10:34:09 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 1534A6022F4A; Mon, 19 Sep 2005 10:34:09 -0500 (CDT) Date: Mon, 19 Sep 2005 10:34:09 -0500 From: Erik Jacobson To: pagg@oss.sgi.com Subject: Revised Process Notification proposed docs Message-ID: <20050919153408.GA13872@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 112 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg Ok, here is a second pass at this... I am re-working what used to be PAGG to have a new name, better documentation, and better variable names. My hope is that I can present this to the community for inclusion in the kernel and I'm hoping to have a couple of the users of this help by explaining how they use it. I feel one reason PAGG didn't get attention was because it's true function was obscured by its name and the names of functions and variables within. The first step of this for me was to write some new documentation using the new names for the pieces. Before I propose this to the broader community, I'd like to get feedback. After that, I plan to re-write the code to match and post it. If a variable seems too long (some are), perhaps provide a suggested shorer name. The name of pnotify itself is fair game. It turns out it was hard to pick a name for this thing. Process Notification (pnotify) -------------------- pnotify provides a method (service) for kernel modules to be notified when certain events happen in the life of a process. Events we support include fork, exit, and exec. A special init event is also supported (see events below). More events could be added. pnotify also provides a generic data pointer for the modules to work with so that data can be associated per process. A kernel module will register (pnotify_register) a service request (pnotify_service_request) with pnotify. The request tells pnotify which notifications the kernel module wants. The kernel module passes along function pointers to be called for these events (exit, fork, exec) in the service request. From the process point of view, each process has a kernel module subscriber list (pnotify_module_subscriber_list). These kernel modules are the ones who want notification about the life of the process. As described above, each kernel module subscriber on the list has a generic data pointer to point to data associated with the process. In the case of fork, pnotify will allocate the same kernel module subscriber list for the new child that existed for the parent. The kernel module's function pointer for fork is also called so the kernel module can do what ever it needs to do when a parent forks. For exit, similar things happen but the exit function pointer for each kernel module subscriber is called and the kernel module subscriber list for that task is deleted. Events ------ Events are stages of a processes life that kernel modules care about. The fork event is a spot in copy_process when a parent forks. The exit event happens when a process is going away. We also support an exec event, which happens when a process execs. Finally, there is an init event. This special event makes it so this kernel module will be associated with all current processes in the system at the time of registration. This is used when a kernel module wants to keep track of all current processes as opposed to just those it associates by itself (and children that follow). The events a kernel module cares about are set up in the pnotify_service_request structure - see usage below. When setting up a pnotify_service_request structure, you designate which events you care about by either associating NULL (meaning you don't care about that event) or a pointer to the function to run when the event is triggered. fork and exit are currently required. How do processes become associated with kernel modules? ------------------------------------------------------- Your kernel module itself can use the pnotify_alloc function to associate a given process with a given pnotify_service_request structure. This adds your kernel module to the subscriber list of the process. In the case of inescapable job containers making use of PAM, when PAM allows a person to log in, PAM contacts job (via a PAM job module which uses the job userland library) and the kernel Job code will call pnotify_alloc to associate the process with pnotify. From that point on, the kernel module will be notified about events in the process's life that the module cares about. Likewise, your kernel module can remove an association between it and a given process by using pnotify_subscriber_free. Example Usage ------------- === filling out the pnotify_service_request structure === A kernel module wishing to use pnotify needs to set up a pnotify_service_request structure. This structure tells pnotify which events you care about and what functions to call when those events are triggered. In addition, you supply a name (usually the kernel module name). The entry is always filled out as shown below. .module is usually set to THIS_MODULE. data can be optionally used to store a pointer with the service request structure. Example of a filled out pnotify_service_request: static struct pnotify_service_request pnotify_service_request = { .module = THIS_MODULE, .name = "test_module", .data = NULL, .entry = LIST_HEAD_INIT(pnotify_service_request.entry), .init = test_init, .fork = test_attach, .exit = test_detach, .exec = test_exec, }; The above pnotify_service_request says the kernel module "test_module" cares about events fork, exit, exec, and init. In fork, call the kernel module's test_attach function. In exec, call test_exec. In exit, call test_detach. The init event is specified, so all processes on the system will be associated with this kernel module and the test_init function will be run for each. === Registering with pnotify === You will likely register with pnotify in your kernel module's module_init function. Here is an example: static int __init test_module_init(void) { int rc = pnotify_register(&pnotify_service_request); if (rc < 0) { return -1; } return 0; } === Example init event function ==== Since the init event is defined, it means this kernel module is added to the subscriber list of all processes -- it will receive notification about events it cares about for all processes and all children that follow. Of course, if a kernel module doesn't need to know about all current processes, that module shouldn't implement this and '.init' in the pnotify_service_request structure would be NULL. This is as opposed to the normal method where the kernel module adds itself to the subscriber list of a process using pnotify_alloc. static int test_init(struct task_struct *tsk, struct pnotify_subscriber *subscriber) { if (pnotify_get_subscriber(tsk, "test_module") == NULL) dprintk("ERROR pnotify expected \"%s\" PID = %d\n", "test_module", tsk->pid); dprintk("FYI pnotify init hook fired for PID = %d\n", tsk->pid); atomic_inc(&init_count); return 0; } === Example fork (test_attach) function === This function is executed when a process forks - this is associated with the pnotify_callout callout in copy_process. There would be a very similar test_detach function (not shown). pnotify will add the kernel module to the notification list for the child process automatically and then execute this fork function pointer (test_attach in this example). However, the kernel module can control whether the kernel module stays on the process's subscriber list and wants notification by the return value. A negative value results in the fork failing. zero is success. >0 means success, but the kernel module doesn't want the to be associated with that specific process (doesn't want notification). In other words, if >0 is returned, your kernel module is saying that it doesn't want to be on the subscriber list for this process. static int test_attach(struct task_struct *tsk, struct pnotify_subscriber *subscriber, void *vp) { dprintk("pnotify attach hook fired for PID = %d\n", tsk->pid); atomic_inc(&attach_count); return 0; } === Example exec event function === And here is an example function to run when a task gets to exec. So any time a "tracked" process gets to exec, this would execute. More hooks/callouts similar to this one could be implemented as there is demand for them. static void test_exec(struct task_struct *tsk, struct pnotify_subscriber *subscriber) { dprintk("pnotify exec hook fired for PID %d\n", tsk->pid); atomic_inc(&exec_count); } === Unregistering with pnotify === You will likely wish to unregister with pnotify in the kernel module's module_exit function. Here is an example: static void __exit test_module_cleanup(void) { pnotify_unregister(&pnotify_service_request); printk("detach called %d times...\n", atomic_read(&detach_count)); printk("attach called %d times...\n", atomic_read(&attach_count)); printk("init called %d times...\n", atomic_read(&init_count)); printk("exec called %d times ...\n", atomic_read(&exec_count)); if (atomic_read(&attach_count) + atomic_read(&init_count) != atomic_read(&detach_count)) printk("pnotify PROBLEM: attach count + init count SHOULD equal detach cound and doesn't\n"); else printk("Good - attach count + init count equals detach count.\n"); } === Actually using data associated with the process in your module === The above examples show you how to create an example kernel module using pnotify, but they didn't show what you might do with the data pointer associated with a given process. Below, find an example of accessing the data pointer for a given task from within a kernel module making use of pnotify. pnotify_get_subscriber is used to retrieve the pnotify subscriber for a given process and kernel module. Like this: subscriber = pnotify_get_subscriber(task, name); Where name is your kernel module's name (as provided in the pnotify_service_request structure) and task is the process you're interested in. Please be careful about locking. The task structure has a pnotify_subscriber_list_sem to be used for locking. This example retrieves a given task in a way that ensures it doesn't disappear while we try to access it (that's why we do locking for the tasklist_lock and task). The pnotify subscriber list is locked to ensure the list doesn't change as we search it with pnotify_get_subscriber. read_lock(&tasklist_lock); get_task_struct(task); /* Ensure the task doesn't vanish on us */ read_unlock(&tasklist_lock); /* Unlock the tasklist */ down_read(&task->pnotify_subscriber_list_sem); /* readlock subscriber list */ subscriber = pnotify_get_subscriber(task, name); if (subscriber) { /* Get the widgitId associated with this task */ widgitId = ((widgitId_t *)subscriber->data); } put_task_struct(task); /* Done accessing the task */ up_read(&task->pnotify_subscriber_list_sem); /* unlock subscriber list */ History ------- Process Notification used to be known as PAGG (Process Aggregates). It was re-written to be called Process Notification because we believe this better describes its purpose. Structures and functions were re-named to be more clear and to reflect the new name. Why Not Notifier Lists? ----------------------- We investigated the use of notifier lists, available in newer kernels. Notifier lists would not be as efficient as pnotify for kernel modules wishing to associate data with processes. With pnotify, if the pnotify_subscriber_list of a given task is NULL, we can instantly know there are no kernel modules that care about the process. Further, the callbacks happen in places were the task struct is likely to be cached. So this is a quick operation. With notifier lists, the scope is system wide rather than per process. As long as one kernel module wants to be notified, we have to walk the notifier list and potentially waste cycles. In the case of pnotify, we only walk lists if we're interested about a specific task. On a system where pnotify is used to track only a few processes, the overhead of walking the notifier list is high compared to the overhead of walking the kernel module subscriber list only when a kernel module is interested in a given process. Overlooking performance issues, notifier lists in and of themselves wouldn't solve the problem pnotify solves anyway. Although you could argue notifier lists can implement the callback portion of pnotify, there is no association of data with a given process. This is a needed for kernel modules to efficiently associate a task with a data pointer without cluttering up the task struct. Some Justification ------------------ We feel that pnotify could be used to reduce the size of the task struct or the number of functions in copy_process. For example, if another part of the kernel needs to know when a process is forking or exiting, they could use pnotify instead of adding additional code to task struct, copy_process, or exit. Some have argued that PAGG in the past shouldn't be used because it will allow interesting things to be implemented outside of the kernel. While this might be a small risk, having these in place allows customers and users to implement kernel components that you don't want to see in the kernel anyway. For example, a certain vendor may have an urgent need to implement kernel functionality or special types of accounting that nobody else is interested in. That doesn't mean the code isn't open-source, it just means it isn't applicable to all of Linux because it satisfies a niche. All of pnotify's functionality that needs to be exported is exported with EXPORT_SYMBOL_GPL to discourage abuse. The risk already exists in the kernel for people to implement modules outside the kernel that suffer from less peer review and possibly bad programming practice. pnotify could add more oppurtunities for out-of-tree kernel module authors to make new modules. I believe this is somewhat mitigated by the already-existing 'tainted' warnings in the kernel. From erikj@sgi.com Mon Sep 19 09:57:05 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 19 Sep 2005 09:57:17 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8JGv2iL019163 for ; Mon, 19 Sep 2005 09:57:04 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8JIvCG1014897 for ; Mon, 19 Sep 2005 11:57:12 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8JGsFDN15962524; Mon, 19 Sep 2005 11:54:15 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8JGsFS93674713; Mon, 19 Sep 2005 11:54:15 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 03AFF6022F4A; Mon, 19 Sep 2005 11:54:14 -0500 (CDT) Date: Mon, 19 Sep 2005 11:54:14 -0500 From: Erik Jacobson To: pagg@oss.sgi.com, Christoph Lameter Subject: another new rev of the docs... Message-ID: <20050919165414.GA18134@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 113 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg Here is another revision taking in many suggestions from Dean Nelson. The return values for the fork function pointer are defined, and some function names changed. === I am re-working what used to be PAGG to have a new name, better documentation, and better variable names. My hope is that I can present this to the community for inclusion in the kernel and I'm hoping to have a couple of the users of this help by explaining how they use it. I feel one reason PAGG didn't get attention was because it's true function was obscured by its name and the names of functions and variables within. The first step of this for me was to write some new documentation using the new names for the pieces. Before I propose this to the broader community, I'd like to get feedback. After that, I plan to re-write the code to match and post it. If a variable seems too long (some are), perhaps provide a suggested shorer name. The name of pnotify itself is fair game. It turns out it was hard to pick a name for this thing. Process Notification (pnotify) -------------------- pnotify provides a method (service) for kernel modules to be notified when certain events happen in the life of a process. Events we support include fork, exit, and exec. A special init event is also supported (see events below). More events could be added. pnotify also provides a generic data pointer for the modules to work with so that data can be associated per process. A kernel module will register (pnotify_register) a service request describing events it cares about (pnotify_events) with pnotify_register. The request tells pnotify which notifications the kernel module wants. The kernel module passes along function pointers to be called for these events (exit, fork, exec) in the pnotify_events service request. From the process point of view, each process has a kernel module subscriber list (pnotify_module_subscriber_list). These kernel modules are the ones who want notification about the life of the process. As described above, each kernel module subscriber on the list has a generic data pointer to point to data associated with the process. In the case of fork, pnotify will allocate the same kernel module subscriber list for the new child that existed for the parent. The kernel module's function pointer for fork is also called for the child being constructed so the kernel module can do what ever it needs to do when a parent forks this child. Special return values apply for the fork event that don't to others. They are described in the fork example below. For exit, similar things happen but the exit function pointer for each kernel module subscriber is called and the kernel module subscriber entry for that process is deleted. Events ------ Events are stages of a processes life that kernel modules care about. The fork event is triggered in a certain location in copy_process when a parent forks. The exit event happens when a process is going away. We also support an exec event, which happens when a process execs. Finally, there is an init event. This special event makes it so this kernel module will be associated with all current processes in the system at the time of registration. This is used when a kernel module wants to keep track of all current processes as opposed to just those it associates by itself (and children that follow). The events a kernel module cares about are set up in the pnotify_events structure - see usage below. When setting up a pnotify_events, you designate which events you care about by either associating NULL (meaning you don't care about that event) or a pointer to the function to run when the event is triggered. The fork event is currently required. How do processes become associated with kernel modules? ------------------------------------------------------- Your kernel module itself can use the pnotify_subscribe function to associate a given process with a given pnotify_events structure. This adds your kernel module to the subscriber list of the process. In the case of inescapable job containers making use of PAM, when PAM allows a person to log in, PAM contacts job (via a PAM job module which uses the job userland library) and the kernel Job code will call pnotify_subscribe to associate the process with pnotify. From that point on, the kernel module will be notified about events in the process's life that the module cares about (as well, as any children that process may later have). Likewise, your kernel module can remove an association between it and a given process by using pnotify_unsubscribe. Example Usage ------------- === filling out the pnotify_events structure === A kernel module wishing to use pnotify needs to set up a pnotify_events structure. This structure tells pnotify which events you care about and what functions to call when those events are triggered. In addition, you supply a name (usually the kernel module name). The entry is always filled out as shown below. .module is usually set to THIS_MODULE. data can be optionally used to store a pointer with the pnotify_events structure. Example of a filled out pnotify_events: static struct pnotify_events pnotify_events = { .module = THIS_MODULE, .name = "test_module", .data = NULL, .entry = LIST_HEAD_INIT(pnotify_events.entry), .init = test_init, .fork = test_attach, .exit = test_detach, .exec = test_exec, }; The above pnotify_events structure says the kernel module "test_module" cares about events fork, exit, exec, and init. In fork, call the kernel module's test_attach function. In exec, call test_exec. In exit, call test_detach. The init event is specified, so all processes on the system will be associated with this kernel module during registration and the test_init function will be run for each. === Registering with pnotify === You will likely register with pnotify in your kernel module's module_init function. Here is an example: static int __init test_module_init(void) { int rc = pnotify_register(&pnotify_events); if (rc < 0) { return -1; } return 0; } === Example init event function ==== Since the init event is defined, it means this kernel module is added to the subscriber list of all processes -- it will receive notification about events it cares about for all processes and all children that follow. Of course, if a kernel module doesn't need to know about all current processes, that module shouldn't implement this and '.init' in the pnotify_events structure would be NULL. This is as opposed to the normal method where the kernel module adds itself to the subscriber list of a process using pnotify_subscribe. static int test_init(struct task_struct *tsk, struct pnotify_subscriber *subscriber) { if (pnotify_get_subscriber(tsk, "test_module") == NULL) dprintk("ERROR pnotify expected \"%s\" PID = %d\n", "test_module", tsk->pid); dprintk("FYI pnotify init hook fired for PID = %d\n", tsk->pid); atomic_inc(&init_count); return 0; } === Example fork (test_attach) function === This function is executed when a process forks - this is associated with the pnotify_callout callout in copy_process. There would be a very similar test_detach function (not shown). pnotify will add the kernel module to the notification list for the child process automatically and then execute this fork function pointer (test_attach in this example). However, the kernel module can control whether the kernel module stays on the process's subscriber list and wants notification by the return value. PNOTIFY_ERROR - prevent the process from continuing - failing the fork PNOTIFY_OK - good, adds the kernel module to the subscriber list for process PNOTIFY_NOSUB - good, but don't add kernel module to subscriber list for process static int test_attach(struct task_struct *tsk, struct pnotify_subscriber *subscriber, void *vp) { dprintk("pnotify attach hook fired for PID = %d\n", tsk->pid); atomic_inc(&attach_count); return PNOTIFY_OK; } === Example exec event function === And here is an example function to run when a task gets to exec. So any time a "tracked" process gets to exec, this would execute. static void test_exec(struct task_struct *tsk, struct pnotify_subscriber *subscriber) { dprintk("pnotify exec hook fired for PID %d\n", tsk->pid); atomic_inc(&exec_count); } === Unregistering with pnotify === You will likely wish to unregister with pnotify in the kernel module's module_exit function. Here is an example: static void __exit test_module_cleanup(void) { pnotify_unregister(&pnotify_events); printk("detach called %d times...\n", atomic_read(&detach_count)); printk("attach called %d times...\n", atomic_read(&attach_count)); printk("init called %d times...\n", atomic_read(&init_count)); printk("exec called %d times ...\n", atomic_read(&exec_count)); if (atomic_read(&attach_count) + atomic_read(&init_count) != atomic_read(&detach_count)) printk("pnotify PROBLEM: attach count + init count SHOULD equal detach cound and doesn't\n"); else printk("Good - attach count + init count equals detach count.\n"); } === Actually using data associated with the process in your module === The above examples show you how to create an example kernel module using pnotify, but they didn't show what you might do with the data pointer associated with a given process. Below, find an example of accessing the data pointer for a given process from within a kernel module making use of pnotify. pnotify_get_subscriber is used to retrieve the pnotify subscriber for a given process and kernel module. Like this: subscriber = pnotify_get_subscriber(task, name); Where name is your kernel module's name (as provided in the pnotify_events structure) and task is the process you're interested in. Please be careful about locking. The task structure has a pnotify_subscriber_list_sem to be used for locking. This example retrieves a given task in a way that ensures it doesn't disappear while we try to access it (that's why we do locking for the tasklist_lock and task). The pnotify subscriber list is locked to ensure the list doesn't change as we search it with pnotify_get_subscriber. read_lock(&tasklist_lock); get_task_struct(task); /* Ensure the task doesn't vanish on us */ read_unlock(&tasklist_lock); /* Unlock the tasklist */ down_read(&task->pnotify_subscriber_list_sem); /* readlock subscriber list */ subscriber = pnotify_get_subscriber(task, name); if (subscriber) { /* Get the widgitId associated with this task */ widgitId = ((widgitId_t *)subscriber->data); } put_task_struct(task); /* Done accessing the task */ up_read(&task->pnotify_subscriber_list_sem); /* unlock subscriber list */ History ------- Process Notification used to be known as PAGG (Process Aggregates). It was re-written to be called Process Notification because we believe this better describes its purpose. Structures and functions were re-named to be more clear and to reflect the new name. Why Not Notifier Lists? ----------------------- We investigated the use of notifier lists, available in newer kernels. Notifier lists would not be as efficient as pnotify for kernel modules wishing to associate data with processes. With pnotify, if the pnotify_subscriber_list of a given task is NULL, we can instantly know there are no kernel modules that care about the process. Further, the callbacks happen in places were the task struct is likely to be cached. So this is a quick operation. With notifier lists, the scope is system wide rather than per process. As long as one kernel module wants to be notified, we have to walk the notifier list and potentially waste cycles. In the case of pnotify, we only walk lists if we're interested about a specific task. On a system where pnotify is used to track only a few processes, the overhead of walking the notifier list is high compared to the overhead of walking the kernel module subscriber list only when a kernel module is interested in a given process. Overlooking performance issues, notifier lists in and of themselves wouldn't solve the problem pnotify solves anyway. Although you could argue notifier lists can implement the callback portion of pnotify, there is no association of data with a given process. This is a needed for kernel modules to efficiently associate a task with a data pointer without cluttering up the task struct. Some Justification ------------------ We feel that pnotify could be used to reduce the size of the task struct or the number of functions in copy_process. For example, if another part of the kernel needs to know when a process is forking or exiting, they could use pnotify instead of adding additional code to task struct, copy_process, or exit. Some have argued that PAGG in the past shouldn't be used because it will allow interesting things to be implemented outside of the kernel. While this might be a small risk, having these in place allows customers and users to implement kernel components that you don't want to see in the kernel anyway. For example, a certain vendor may have an urgent need to implement kernel functionality or special types of accounting that nobody else is interested in. That doesn't mean the code isn't open-source, it just means it isn't applicable to all of Linux because it satisfies a niche. All of pnotify's functionality that needs to be exported is exported with EXPORT_SYMBOL_GPL to discourage abuse. The risk already exists in the kernel for people to implement modules outside the kernel that suffer from less peer review and possibly bad programming practice. pnotify could add more oppurtunities for out-of-tree kernel module authors to make new modules. I believe this is somewhat mitigated by the already-existing 'tainted' warnings in the kernel. Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From erikj@sgi.com Mon Sep 19 17:22:52 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 19 Sep 2005 17:22:54 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8K0MqiL028154 for ; Mon, 19 Sep 2005 17:22:52 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8K0K8xT002815 for ; Mon, 19 Sep 2005 19:20:08 -0500 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8K0K8DN15986933; Mon, 19 Sep 2005 19:20:08 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8K0K7S93710952; Mon, 19 Sep 2005 19:20:07 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 91AED6022F4A; Mon, 19 Sep 2005 19:20:07 -0500 (CDT) Date: Mon, 19 Sep 2005 19:20:07 -0500 From: Erik Jacobson To: pagg@oss.sgi.com, steiner@sgi.com, clameter@sgi.com Subject: New pagg patch progress Message-ID: <20050920002007.GA9813@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 114 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg It took me longer then expected to get PAGG turned in to pnotify and change Job to use the new pnotify stuff. I just finished but haven't done any testing. So to those of you I said I would post the patch today - I'm sorry, it will be done tomorrow though. At the very least, I'm hoping the new patch will spur some discussion and at least some accepted solution will come about. I think the pnotify stuff has good exposure because it's been around on SGI machines for quite some time, and it serves its purpose well. However, now I'd just settle for anything that has the basic functionality needed to implement Job. Let's see where tomorrow takes us. I'd appreciate list member involvement in the discussion. Erik From erikj@sgi.com Tue Sep 20 08:17:04 2005 Received: with ECARTIS (v1.0.0; list pagg); Tue, 20 Sep 2005 08:17:17 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8KFH2iL031486 for ; Tue, 20 Sep 2005 08:17:04 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8KFEIxT003974 for ; Tue, 20 Sep 2005 10:14:18 -0500 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8KFEHDN16030276 for ; Tue, 20 Sep 2005 10:14:17 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8KFEHS93747731 for ; Tue, 20 Sep 2005 10:14:17 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 889C86028D21; Tue, 20 Sep 2005 10:14:17 -0500 (CDT) Date: Tue, 20 Sep 2005 10:14:17 -0500 From: Erik Jacobson To: pagg@oss.sgi.com Subject: job version to be posted, recent job fixes Message-ID: <20050920151417.GA24846@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 115 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg I just wanted people to know that the version of Job I plan to post using the new pnotify version of pagg is not the jobfs variant. The last time Job got a bunch of community feedback, they suggested using a jobfs implementation instead of the /proc/job ioctl interface. We implemented that. It does work, but for certain customer situations, the overhead of the inode operations to control job are quite costly. Although most customers wouldn't hit this, at least one big customer would have. In one of the test suite tests, we fork like 40,000 processes maybe more to see if job suffers from a duplicate JID issue that a customer reported. In that test case, where job controls are issued for each process at least once, the run time of the test takes 10 minutes or more compared to less than 20 seconds with the old version. The hold-up was due to inode operations in jobfs. We were trying to decide which way to go -- to try to figure out if there is a way to speed up the inode operations or just go with the tried-and-true kernel implementation. During this time, we found a couple other bugs that I didn't fix because I didn't know which way we were going - jobfs or the old way. Some bugs that will be fixed in the version of job I'm planning to post today include: - Duplicate JIDs possible when process table wraps - we changed JID computation to be based on a counter instead of a PID - Some code that never executes was purged from job_sys_create - A hang (locking logic error) was possible in rare situations in job_sys_create - send_sig_info doesn't check for signal zero (status check) any more, so we changed to use group_send_sig_info which requires the tasklist to be locked during the call. The bug here was that an invalid signal ended up being passed that could wakeup things that didn't expect to be woken up. I just wanted folks to know what was going on with the job patch. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From erikj@sgi.com Tue Sep 20 09:32:35 2005 Received: with ECARTIS (v1.0.0; list pagg); Tue, 20 Sep 2005 09:32:43 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8KGWZiL009103 for ; Tue, 20 Sep 2005 09:32:35 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8KGToxT017959 for ; Tue, 20 Sep 2005 11:29:50 -0500 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8KGTnDN16033758 for ; Tue, 20 Sep 2005 11:29:50 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8KGTnS93550093 for ; Tue, 20 Sep 2005 11:29:49 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 693B76028D21; Tue, 20 Sep 2005 11:29:49 -0500 (CDT) Date: Tue, 20 Sep 2005 11:29:49 -0500 From: Erik Jacobson To: pagg@oss.sgi.com Subject: Re: job version to be posted, recent job fixes Message-ID: <20050920162949.GA29495@sgi.com> References: <20050920151417.GA24846@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050920151417.GA24846@sgi.com> User-Agent: Mutt/1.5.6i X-archive-position: 116 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg Hi. I got a strong suggestion that I should make both the jobfs versoin of job available and the all-kernel proc ioctl versoin for the community to look at and compare. That means I need to port the fixes to the jobfs version and re-test in addition to changing it to use pnotify. I have some personal stuff to take care of this afternoon so once again I'm pushing forward the posting until either late tonight or tomorrow to give me time to complete and test the changes in the jobfs version of job to both convert it to pnotify and fix outstanding bugs. I'm sorry my estimated time keeps slipping but this time it's because I've been asked to do more than I planned :) PS: The kernel proc/ioctl versoin of job and the new pnotify did pass my regression tests this morning, so that's good. Erik From erikj@sgi.com Wed Sep 21 12:57:47 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 21 Sep 2005 12:57:51 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8LJvkiL002981 for ; Wed, 21 Sep 2005 12:57:47 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8LKvIN0006589 for ; Wed, 21 Sep 2005 13:57:19 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8LJt1DN16113553 for ; Wed, 21 Sep 2005 14:55:01 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8LJt1S93765451 for ; Wed, 21 Sep 2005 14:55:01 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id DDB336022F49; Wed, 21 Sep 2005 14:55:00 -0500 (CDT) Date: Wed, 21 Sep 2005 14:55:00 -0500 From: Erik Jacobson To: pagg@oss.sgi.com Subject: jobfs implementation of Linux Job - testing only Message-ID: <20050921195500.GA21918@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 117 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg I have decided to move the jobfs version of Job to a "test only" status. It will appear in a test directory. I have decided in my tests that it just isn't stable and has some performance issues and other problems that need to be fixed if we choose to continue down that path. It isn't clear that the jobfs implementation is the way to go - I'm not sure some of the performance issues are solvable with jobfs. The non-jobfs implementation (the tried and true one) is the one that is stable and supported. Both versions require the job package (libraries and commands) to function. However, the jobfs version moved much of the processing out of the kernel and in to the library. Of course, both require the pnotify patch (formally PAGG) as well. The ftp site is being re-organized to make it clear what is test, what is stable, etc. I'll post on that shortly. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From erikj@sgi.com Wed Sep 21 14:03:55 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 21 Sep 2005 14:04:02 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8LL3siL009822 for ; Wed, 21 Sep 2005 14:03:54 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8LM3R2e016157 for ; Wed, 21 Sep 2005 15:03:27 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8LL09DN16117331 for ; Wed, 21 Sep 2005 16:00:09 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8LL08S93802445 for ; Wed, 21 Sep 2005 16:00:08 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id C19706022F49; Wed, 21 Sep 2005 16:00:08 -0500 (CDT) Date: Wed, 21 Sep 2005 16:00:08 -0500 From: Erik Jacobson To: pagg@oss.sgi.com Subject: ftp site re-organized Message-ID: <20050921210008.GA25218@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i X-archive-position: 118 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg Hi. As promised, here is some information on the re-organized directories on the pagg ftp site. Later, depending on what happens in the community, I'll probably need to look in to changing the web site itself to take in to account PAGG's new name. If you click on the 'download' link on the upper left of this web page: http://oss.sgi.com/projects/pagg/ Or if you ftp to oss.sgi.com and change to /projects/pagg/download This is the same location as before. Here is the directory layout: old.......old pagg and job files pnotify...New Process Notification implementation (formally pagg) job.......job patch and userland pieces - stable version job-test..job patch and userland pieces - jobfs version, unstable, test Both the stable job and the jobfs implementation in job-test have been updated to use pnotify. The pnotify patch is in the pnotify directory. The job patches and pnotify were all tested against 2.6.13.2 but should apply to any recent kernel. I provided the source RPMs and pre-built rpms for ia64 and x86 for the Job userland library. Documentation for pnotify can be found in the Documentation/pnotify.txt file after applying the pnotify patch. It includes variable name changes from PAGG to pnotify at the end of the document. The README file in job-test describes some of the current problems with the jobfs implementation. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From erikj@sgi.com Tue Sep 27 13:14:11 2005 Received: with ECARTIS (v1.0.0; list pagg); Tue, 27 Sep 2005 13:14:22 -0700 (PDT) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8RKEBiL013673 for ; Tue, 27 Sep 2005 13:14:11 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8RLEVij031143 for ; Tue, 27 Sep 2005 14:14:31 -0700 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8RKAKDN16521944; Tue, 27 Sep 2005 15:10:20 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8RKAKS94178506; Tue, 27 Sep 2005 15:10:20 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 536FE6022F49; Tue, 27 Sep 2005 15:10:20 -0500 (CDT) Date: Tue, 27 Sep 2005 15:10:20 -0500 From: Erik Jacobson To: Kingsley Cheung Cc: Erik Jacobson , pagg@oss.sgi.com, tonyt@aurema.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 Message-ID: <20050927201020.GA30433@sgi.com> References: <20050617014512.GA10285@aurema.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050617014512.GA10285@aurema.com> User-Agent: Mutt/1.5.6i X-archive-position: 119 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg I fixed this in the RCU version of pnotify I'm working per the lse-tech community discussion - thanks for the reminder the other day (in a non-list email). If the RCU version crashes and burns for some reason and we go back to the non-CUu one, I'll need to make the fix there too. The function now looks like this. I hope this is what you had in mind (untested as of this moment). /** * __pnotify_fork - Add kernel module subscriber to same subscribers as parent * @to_task: The child task that will inherit the parent's subscribers * @from_task: The parent task * * Used to attach a new task to the same subscribers the parent has in its * subscriber list. * * The "from" argument is the parent task. The "to" argument is the child * task. * * See Documentation/pnotify.txt for details on * how to handle return codes from the attach function pointer. * * Locking: The to_task is currently in-construction, so we don't * need to worry about write-locks. We do need to be sure the parent's * subscriber list, which we copy here, doesn't go away on us. This is * done via RCU. * */ int __pnotify_fork(struct task_struct *to_task, struct task_struct *from_task) { struct pnotify_subscriber *from_subscriber; int ret; /* We need to be sure the parent's list we copy from doesn't disappear */ rcu_read_lock(); list_for_each_entry_rcu(from_subscriber, &from_task->pnotify_subscriber_list, entry) { struct pnotify_subscriber *to_subscriber = NULL; to_subscriber = pnotify_subscribe(to_task, from_subscriber->events); if (!to_subscriber) { ret=-ENOMEM; __pnotify_exit(to_task); rcu_read_unlock(); return ret; } ret = to_subscriber->events->fork(to_task, to_subscriber, from_subscriber->data); rcu_read_unlock(); /* no more to do with the parent's data */ if (ret < 0) { /* Propagates to copy_process as a fork failure */ /* No __pnotify_exit because there is one in the failure path * for copy_process in fork.c */ return ret; /* Fork failure */ } else if (ret > 0) { /* Success, but fork function pointer in the pnotify_events structure * doesn't want the kenrel module subscribed */ /* Again, this is the in-construction-child so no write lock */ pnotify_unsubscribe(to_subscriber); } } return 0; /* success */ } From kaigai@ak.jp.nec.com Wed Sep 28 04:38:19 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 28 Sep 2005 04:38:33 -0700 (PDT) Received: from tyo202.gate.nec.co.jp (TYO206.gate.nec.co.jp [202.32.8.206]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8SBcJiL006649 for ; Wed, 28 Sep 2005 04:38:19 -0700 Received: from mailgate3.nec.co.jp (mailgate53.nec.co.jp [10.7.69.160] (may be forged)) by tyo202.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j8SBZAb04220; Wed, 28 Sep 2005 20:35:10 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j8SBZAg21600; Wed, 28 Sep 2005 20:35:10 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv4.nec.co.jp (8.11.7/3.7W-MAILSV4-NEC) with ESMTP id j8SBZ9b26517; Wed, 28 Sep 2005 20:35:09 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id EE7F52FE04; Wed, 28 Sep 2005 20:34:50 +0900 (JST) Message-ID: <433A7FE4.5040109@ak.jp.nec.com> Date: Wed, 28 Sep 2005 20:35:00 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Erik Jacobson Cc: Kingsley Cheung , pagg@oss.sgi.com, tonyt@aurema.com, paulmck@us.ibm.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 References: <20050617014512.GA10285@aurema.com> <20050927201020.GA30433@sgi.com> In-Reply-To: <20050927201020.GA30433@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 120 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg Hi, Erik Jacobson wrote: > I fixed this in the RCU version of pnotify I'm working per the lse-tech > community discussion - thanks for the reminder the other day (in a non-list > email). > > If the RCU version crashes and burns for some reason and we go back to > the non-CUu one, I'll need to make the fix there too. The function now > looks like this. I hope this is what you had in mind (untested as of > this moment). In my understanding, any _write_ operations can not be implemented without locking, even if we can use RCU. (In addition, RCU conscious writing/update style is required.) For example, pnotify permits to attach a new pnotify_subscriber object to another task. If someone calls pnotify_subscribe() for other task which is doing fork(), there is a possibility to break the pnotify_subscriber_list of victim task. Therefore, procedures with updates such __pnotify_fork() should be serialized by somethig locking. RCU is so effective for seldom-write/ frequentrly-read pass, such as SELinux's Access Vector Cache(AVC). But it's not omnipotence, and it restricts write methodology. In the past, I made a proposition of applying RCU for PAGG. But it might be inappropriate for pnotify/PAGG as a general framework. I have attention to another respect. The current pnotify implementation requires to hold pnotify_event_list_sem before calling pnotify_get_events(). Threfore, we must repeat read_lock/unlock(&tasklist_lock) on do_each_thread()/while_each_thread() loop as follows: ---------------------------- read_lock(&tasklist_lock); do_each_thread(g, p) { get_task_struct(p); read_unlock(&tasklist_lock); down_read(&p->pnotify_subscriber_list_sem); subscriber = pnotify_get_subscriber(p, events->name); : up_read(&p->pnotify_subscriber_list_sem); read_lock(&tasklist_lock); << checking, p is dead or not ? >> } while_each_thread(g, p); read_unlock(&tasklist_lock); ---------------------------- I'm happy, if pnotify_subscriber_list would be protected by rwlock. If rwlock is used, we can not implement pnotify_subscribe() with current spec. But is it impossible to prepare pnotify_subscribe_atomic() or pnotify_subscribe_bind() which associates task_struct with pre-allocated pnotify_events object ? ---- in rwlock world :-) --- read_lock(&tasklist_lock); do_each_thread(g, p) { read_lock(&p->pnotify_subscriber_list_rwlock); subscriber = pnotify_get_subscriber(p, events->name); : read_unlock(&p->pnotify_subscriber_list_rwlock); } while_each_thread(g, p); read_unlock(&tasklist_lock); ---------------------------- Thanks, > /** > * __pnotify_fork - Add kernel module subscriber to same subscribers as parent > * @to_task: The child task that will inherit the parent's subscribers > * @from_task: The parent task > * > * Used to attach a new task to the same subscribers the parent has in its > * subscriber list. > * > * The "from" argument is the parent task. The "to" argument is the child > * task. > * > * See Documentation/pnotify.txt for details on > * how to handle return codes from the attach function pointer. > * > * Locking: The to_task is currently in-construction, so we don't > * need to worry about write-locks. We do need to be sure the parent's > * subscriber list, which we copy here, doesn't go away on us. This is > * done via RCU. > * > */ > int > __pnotify_fork(struct task_struct *to_task, struct task_struct *from_task) > { > struct pnotify_subscriber *from_subscriber; > int ret; > > /* We need to be sure the parent's list we copy from doesn't disappear */ > rcu_read_lock(); > > list_for_each_entry_rcu(from_subscriber, &from_task->pnotify_subscriber_list, entry) { > struct pnotify_subscriber *to_subscriber = NULL; > > to_subscriber = pnotify_subscribe(to_task, from_subscriber->events); > if (!to_subscriber) { > ret=-ENOMEM; > __pnotify_exit(to_task); > rcu_read_unlock(); > return ret; > } > ret = to_subscriber->events->fork(to_task, to_subscriber, > from_subscriber->data); > > rcu_read_unlock(); /* no more to do with the parent's data */ rcu_read_unlovk(); should be deployed outside of the list_for_each_entry_rcu(){...}. > > if (ret < 0) { > /* Propagates to copy_process as a fork failure */ > /* No __pnotify_exit because there is one in the failure path > * for copy_process in fork.c */ > return ret; /* Fork failure */ > } > else if (ret > 0) { > /* Success, but fork function pointer in the pnotify_events structure > * doesn't want the kenrel module subscribed */ > /* Again, this is the in-construction-child so no write lock */ > pnotify_unsubscribe(to_subscriber); > } > } > > return 0; /* success */ > } -- Linux Promotion Center, NEC KaiGai Kohei From erikj@sgi.com Wed Sep 28 07:21:25 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 28 Sep 2005 07:21:40 -0700 (PDT) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8SELNiL021852 for ; Wed, 28 Sep 2005 07:21:24 -0700 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j8SEIXxT001947 for ; Wed, 28 Sep 2005 09:18:33 -0500 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j8SEIWDN16572380; Wed, 28 Sep 2005 09:18:33 -0500 (CDT) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j8SEIVS94206965; Wed, 28 Sep 2005 09:18:32 -0500 (CDT) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 55A0E6022F4A; Wed, 28 Sep 2005 09:18:31 -0500 (CDT) Date: Wed, 28 Sep 2005 09:18:31 -0500 From: Erik Jacobson To: Kaigai Kohei Cc: Erik Jacobson , Kingsley Cheung , pagg@oss.sgi.com, tonyt@aurema.com, paulmck@us.ibm.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 Message-ID: <20050928141831.GA24110@sgi.com> References: <20050617014512.GA10285@aurema.com> <20050927201020.GA30433@sgi.com> <433A7FE4.5040109@ak.jp.nec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <433A7FE4.5040109@ak.jp.nec.com> User-Agent: Mutt/1.5.6i X-archive-position: 121 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg > In my understanding, any _write_ operations can not be implemented > without locking, even if we can use RCU. > (In addition, RCU conscious writing/update style is required.) I'm surrounding write operations with a writelock rwsem (not a spinlock, at least not now... since it is common for pnotify users to use semaphores for their own locking). I think this is similar to your example submission, but you used a spinlock in those places. At this moment, I'm close to done with something to look at. I'm just tracking down a bug in the new implementation that showed up in the Job patch. I'll post what I have when I'm ready and maybe we can tear it apart then. This is my first rcu experience so I'd welcome the feedback including "this just won't work with rcu" if that's what it comes down to. I also want to be sure the 'stale data' problem isn't actually a problem for us. > frequentrly-read pass, such as SELinux's Access Vector Cache(AVC). > But it's not omnipotence, and it restricts write methodology. The feeling I had is that most users of pnotify won't be writing super-often. This is a generalization that may be incorrect. Taking Job as an example, once the process is made part of a job, not much usually happens in terms of adjusting the data pointer associated with the task struct until Job is done. I could imagine there may be things this isn't the case for, then the writes will be a penalty possibly. > I have attention to another respect. The current pnotify implementation > requires to hold pnotify_event_list_sem before calling pnotify_get_events(). As I recall, that code only would happen at most twice in the life of a kernel module, right? The only time the init function pointer would fire, if it's present, is at pnotify_register time. A similar piece of code happens at unregister time I think. I guess I'm wondering if this happens enough to worry about? Please let me know if I missed your entire point. > Threfore, we must repeat read_lock/unlock(&tasklist_lock) on > do_each_thread()/while_each_thread() loop as follows: > > ---------------------------- > read_lock(&tasklist_lock); > do_each_thread(g, p) { > get_task_struct(p); > read_unlock(&tasklist_lock); > > down_read(&p->pnotify_subscriber_list_sem); > subscriber = pnotify_get_subscriber(p, events->name); > : > up_read(&p->pnotify_subscriber_list_sem); > read_lock(&tasklist_lock); > << checking, p is dead or not ? >> > } while_each_thread(g, p); > read_unlock(&tasklist_lock); > ---------------------------- > > I'm happy, if pnotify_subscriber_list would be protected by rwlock. > > If rwlock is used, we can not implement pnotify_subscribe() with current > spec. But is it impossible to prepare pnotify_subscribe_atomic() or > pnotify_subscribe_bind() which associates task_struct with pre-allocated > pnotify_events object ? > > ---- in rwlock world :-) --- > read_lock(&tasklist_lock); > do_each_thread(g, p) { > read_lock(&p->pnotify_subscriber_list_rwlock); > subscriber = pnotify_get_subscriber(p, events->name); > : > read_unlock(&p->pnotify_subscriber_list_rwlock); > } while_each_thread(g, p); > read_unlock(&tasklist_lock); > ---------------------------- > > Thanks, > > > >/** > > * __pnotify_fork - Add kernel module subscriber to same subscribers as > > parent > > * @to_task: The child task that will inherit the parent's subscribers > > * @from_task: The parent task > > * > > * Used to attach a new task to the same subscribers the parent has in its > > * subscriber list. > > * > > * The "from" argument is the parent task. The "to" argument is the child > > * task. > > * > > * See Documentation/pnotify.txt for details on > > * how to handle return codes from the attach function pointer. > > * > > * Locking: The to_task is currently in-construction, so we don't > > * need to worry about write-locks. We do need to be sure the parent's > > * subscriber list, which we copy here, doesn't go away on us. This is > > * done via RCU. > > * > > */ > >int > >__pnotify_fork(struct task_struct *to_task, struct task_struct *from_task) > >{ > > struct pnotify_subscriber *from_subscriber; > > int ret; > > > > /* We need to be sure the parent's list we copy from doesn't > > disappear */ > > rcu_read_lock(); > > > > list_for_each_entry_rcu(from_subscriber, > > &from_task->pnotify_subscriber_list, entry) { > > struct pnotify_subscriber *to_subscriber = NULL; > > > > to_subscriber = pnotify_subscribe(to_task, > > from_subscriber->events); > > if (!to_subscriber) { > > ret=-ENOMEM; > > __pnotify_exit(to_task); > > rcu_read_unlock(); > > return ret; > > } > > ret = to_subscriber->events->fork(to_task, to_subscriber, > > from_subscriber->data); > > > > rcu_read_unlock(); /* no more to do with the parent's data */ > > rcu_read_unlovk(); should be deployed outside of the > list_for_each_entry_rcu(){...}. > > > > > if (ret < 0) { > > /* Propagates to copy_process as a fork failure */ > > /* No __pnotify_exit because there is one in the > > failure path > > * for copy_process in fork.c */ > > return ret; /* Fork failure */ > > } > > else if (ret > 0) { > > /* Success, but fork function pointer in the > > pnotify_events structure > > * doesn't want the kenrel module subscribed */ > > /* Again, this is the in-construction-child so no > > write lock */ > > pnotify_unsubscribe(to_subscriber); > > } > > } > > > > return 0; /* success */ > >} > > -- > Linux Promotion Center, NEC > KaiGai Kohei -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From paulmck@us.ibm.com Wed Sep 28 08:05:06 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 28 Sep 2005 08:05:14 -0700 (PDT) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.144]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8SF4xiL025662 for ; Wed, 28 Sep 2005 08:05:06 -0700 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j8SF2966016020 for ; Wed, 28 Sep 2005 11:02:09 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j8SF29bd102902 for ; Wed, 28 Sep 2005 11:02:09 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j8SF28J1006348 for ; Wed, 28 Sep 2005 11:02:09 -0400 Received: from linux.local ([9.47.22.63]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j8SF24JK005925; Wed, 28 Sep 2005 11:02:08 -0400 Received: by linux.local (Postfix on SuSE Linux 7.3 (i386), from userid 500) id 665E5148809; Wed, 28 Sep 2005 08:02:50 -0700 (PDT) Date: Wed, 28 Sep 2005 08:02:50 -0700 From: "Paul E. McKenney" To: Kaigai Kohei Cc: Erik Jacobson , Kingsley Cheung , pagg@oss.sgi.com, tonyt@aurema.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 Message-ID: <20050928150250.GB4925@us.ibm.com> Reply-To: paulmck@us.ibm.com References: <20050617014512.GA10285@aurema.com> <20050927201020.GA30433@sgi.com> <433A7FE4.5040109@ak.jp.nec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <433A7FE4.5040109@ak.jp.nec.com> User-Agent: Mutt/1.4.1i X-archive-position: 122 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: paulmck@us.ibm.com Precedence: bulk X-list: pagg On Wed, Sep 28, 2005 at 08:35:00PM +0900, Kaigai Kohei wrote: > Hi, > > Erik Jacobson wrote: > >I fixed this in the RCU version of pnotify I'm working per the lse-tech > >community discussion - thanks for the reminder the other day (in a > >non-list email). > > > >If the RCU version crashes and burns for some reason and we go back to > >the non-CUu one, I'll need to make the fix there too. The function now > >looks like this. I hope this is what you had in mind (untested as of > >this moment). > > In my understanding, any _write_ operations can not be implemented > without locking, even if we can use RCU. > (In addition, RCU conscious writing/update style is required.) You understanding is quite correct. RCU protects readers from writers. Something else must be used to coordinate writers, for example: 1. locking 2. only single designated thread allowed to update 3. carefully crafted sequences of atomic instructions (but only do this is -really- needed!) Thanx, Paul > For example, pnotify permits to attach a new pnotify_subscriber object > to another task. If someone calls pnotify_subscribe() for other task > which is doing fork(), there is a possibility to break the > pnotify_subscriber_list of victim task. > > Therefore, procedures with updates such __pnotify_fork() should be > serialized by somethig locking. RCU is so effective for seldom-write/ > frequentrly-read pass, such as SELinux's Access Vector Cache(AVC). > But it's not omnipotence, and it restricts write methodology. > > In the past, I made a proposition of applying RCU for PAGG. But it might > be inappropriate for pnotify/PAGG as a general framework. > > > I have attention to another respect. The current pnotify implementation > requires to hold pnotify_event_list_sem before calling pnotify_get_events(). > Threfore, we must repeat read_lock/unlock(&tasklist_lock) on > do_each_thread()/while_each_thread() loop as follows: > > ---------------------------- > read_lock(&tasklist_lock); > do_each_thread(g, p) { > get_task_struct(p); > read_unlock(&tasklist_lock); > > down_read(&p->pnotify_subscriber_list_sem); > subscriber = pnotify_get_subscriber(p, events->name); > : > up_read(&p->pnotify_subscriber_list_sem); > read_lock(&tasklist_lock); > << checking, p is dead or not ? >> > } while_each_thread(g, p); > read_unlock(&tasklist_lock); > ---------------------------- > > I'm happy, if pnotify_subscriber_list would be protected by rwlock. > > If rwlock is used, we can not implement pnotify_subscribe() with current > spec. But is it impossible to prepare pnotify_subscribe_atomic() or > pnotify_subscribe_bind() which associates task_struct with pre-allocated > pnotify_events object ? > > ---- in rwlock world :-) --- > read_lock(&tasklist_lock); > do_each_thread(g, p) { > read_lock(&p->pnotify_subscriber_list_rwlock); > subscriber = pnotify_get_subscriber(p, events->name); > : > read_unlock(&p->pnotify_subscriber_list_rwlock); > } while_each_thread(g, p); > read_unlock(&tasklist_lock); > ---------------------------- > > Thanks, > > > >/** > > * __pnotify_fork - Add kernel module subscriber to same subscribers as > > parent > > * @to_task: The child task that will inherit the parent's subscribers > > * @from_task: The parent task > > * > > * Used to attach a new task to the same subscribers the parent has in its > > * subscriber list. > > * > > * The "from" argument is the parent task. The "to" argument is the child > > * task. > > * > > * See Documentation/pnotify.txt for details on > > * how to handle return codes from the attach function pointer. > > * > > * Locking: The to_task is currently in-construction, so we don't > > * need to worry about write-locks. We do need to be sure the parent's > > * subscriber list, which we copy here, doesn't go away on us. This is > > * done via RCU. > > * > > */ > >int > >__pnotify_fork(struct task_struct *to_task, struct task_struct *from_task) > >{ > > struct pnotify_subscriber *from_subscriber; > > int ret; > > > > /* We need to be sure the parent's list we copy from doesn't > > disappear */ > > rcu_read_lock(); > > > > list_for_each_entry_rcu(from_subscriber, > > &from_task->pnotify_subscriber_list, entry) { > > struct pnotify_subscriber *to_subscriber = NULL; > > > > to_subscriber = pnotify_subscribe(to_task, > > from_subscriber->events); > > if (!to_subscriber) { > > ret=-ENOMEM; > > __pnotify_exit(to_task); > > rcu_read_unlock(); > > return ret; > > } > > ret = to_subscriber->events->fork(to_task, to_subscriber, > > from_subscriber->data); > > > > rcu_read_unlock(); /* no more to do with the parent's data */ > > rcu_read_unlovk(); should be deployed outside of the > list_for_each_entry_rcu(){...}. > > > > > if (ret < 0) { > > /* Propagates to copy_process as a fork failure */ > > /* No __pnotify_exit because there is one in the > > failure path > > * for copy_process in fork.c */ > > return ret; /* Fork failure */ > > } > > else if (ret > 0) { > > /* Success, but fork function pointer in the > > pnotify_events structure > > * doesn't want the kenrel module subscribed */ > > /* Again, this is the in-construction-child so no > > write lock */ > > pnotify_unsubscribe(to_subscriber); > > } > > } > > > > return 0; /* success */ > >} > > -- > Linux Promotion Center, NEC > KaiGai Kohei > From paulmck@us.ibm.com Wed Sep 28 08:07:07 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 28 Sep 2005 08:07:17 -0700 (PDT) Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.143]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8SF76iL025761 for ; Wed, 28 Sep 2005 08:07:06 -0700 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e3.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j8SF4FWO008488 for ; Wed, 28 Sep 2005 11:04:16 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j8SF4Fbd080022 for ; Wed, 28 Sep 2005 11:04:15 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11/8.13.3) with ESMTP id j8SF4Fe1022673 for ; Wed, 28 Sep 2005 11:04:15 -0400 Received: from linux.local ([9.47.22.63]) by d01av03.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j8SF4EWQ022589; Wed, 28 Sep 2005 11:04:15 -0400 Received: by linux.local (Postfix on SuSE Linux 7.3 (i386), from userid 500) id C9467148809; Wed, 28 Sep 2005 08:04:55 -0700 (PDT) Date: Wed, 28 Sep 2005 08:04:55 -0700 From: "Paul E. McKenney" To: Erik Jacobson Cc: Kaigai Kohei , Kingsley Cheung , pagg@oss.sgi.com, tonyt@aurema.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 Message-ID: <20050928150455.GD4925@us.ibm.com> Reply-To: paulmck@us.ibm.com References: <20050617014512.GA10285@aurema.com> <20050927201020.GA30433@sgi.com> <433A7FE4.5040109@ak.jp.nec.com> <20050928141831.GA24110@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050928141831.GA24110@sgi.com> User-Agent: Mutt/1.4.1i X-archive-position: 123 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: paulmck@us.ibm.com Precedence: bulk X-list: pagg On Wed, Sep 28, 2005 at 09:18:31AM -0500, Erik Jacobson wrote: > > In my understanding, any _write_ operations can not be implemented > > without locking, even if we can use RCU. > > (In addition, RCU conscious writing/update style is required.) > > I'm surrounding write operations with a writelock rwsem (not a spinlock, at > least not now... since it is common for pnotify users to use semaphores for > their own locking). I think this is similar to your example submission, but > you used a spinlock in those places. Yes, a semaphore works as well. Whatever it is, there must be something to coordinate the updaters. Thanx, Paul > At this moment, I'm close to done with something to look at. I'm just > tracking down a bug in the new implementation that showed up in the Job > patch. I'll post what I have when I'm ready and maybe we can tear it apart > then. This is my first rcu experience so I'd welcome the feedback including > "this just won't work with rcu" if that's what it comes down to. > > I also want to be sure the 'stale data' problem isn't actually a problem > for us. > > > frequentrly-read pass, such as SELinux's Access Vector Cache(AVC). > > But it's not omnipotence, and it restricts write methodology. > > The feeling I had is that most users of pnotify won't be writing super-often. > This is a generalization that may be incorrect. Taking Job as an example, > once the process is made part of a job, not much usually happens in terms of > adjusting the data pointer associated with the task struct until Job is done. > > I could imagine there may be things this isn't the case for, then the writes > will be a penalty possibly. > > > I have attention to another respect. The current pnotify implementation > > requires to hold pnotify_event_list_sem before calling pnotify_get_events(). > > As I recall, that code only would happen at most twice in the life of a kernel > module, right? The only time the init function pointer would fire, if it's > present, is at pnotify_register time. A similar piece of code happens at > unregister time I think. I guess I'm wondering if this happens enough to > worry about? Please let me know if I missed your entire point. > > > Threfore, we must repeat read_lock/unlock(&tasklist_lock) on > > do_each_thread()/while_each_thread() loop as follows: > > > > ---------------------------- > > read_lock(&tasklist_lock); > > do_each_thread(g, p) { > > get_task_struct(p); > > read_unlock(&tasklist_lock); > > > > down_read(&p->pnotify_subscriber_list_sem); > > subscriber = pnotify_get_subscriber(p, events->name); > > : > > up_read(&p->pnotify_subscriber_list_sem); > > read_lock(&tasklist_lock); > > << checking, p is dead or not ? >> > > } while_each_thread(g, p); > > read_unlock(&tasklist_lock); > > ---------------------------- > > > > I'm happy, if pnotify_subscriber_list would be protected by rwlock. > > > > If rwlock is used, we can not implement pnotify_subscribe() with current > > spec. But is it impossible to prepare pnotify_subscribe_atomic() or > > pnotify_subscribe_bind() which associates task_struct with pre-allocated > > pnotify_events object ? > > > > ---- in rwlock world :-) --- > > read_lock(&tasklist_lock); > > do_each_thread(g, p) { > > read_lock(&p->pnotify_subscriber_list_rwlock); > > subscriber = pnotify_get_subscriber(p, events->name); > > : > > read_unlock(&p->pnotify_subscriber_list_rwlock); > > } while_each_thread(g, p); > > read_unlock(&tasklist_lock); > > ---------------------------- > > > > Thanks, > > > > > > >/** > > > * __pnotify_fork - Add kernel module subscriber to same subscribers as > > > parent > > > * @to_task: The child task that will inherit the parent's subscribers > > > * @from_task: The parent task > > > * > > > * Used to attach a new task to the same subscribers the parent has in its > > > * subscriber list. > > > * > > > * The "from" argument is the parent task. The "to" argument is the child > > > * task. > > > * > > > * See Documentation/pnotify.txt for details on > > > * how to handle return codes from the attach function pointer. > > > * > > > * Locking: The to_task is currently in-construction, so we don't > > > * need to worry about write-locks. We do need to be sure the parent's > > > * subscriber list, which we copy here, doesn't go away on us. This is > > > * done via RCU. > > > * > > > */ > > >int > > >__pnotify_fork(struct task_struct *to_task, struct task_struct *from_task) > > >{ > > > struct pnotify_subscriber *from_subscriber; > > > int ret; > > > > > > /* We need to be sure the parent's list we copy from doesn't > > > disappear */ > > > rcu_read_lock(); > > > > > > list_for_each_entry_rcu(from_subscriber, > > > &from_task->pnotify_subscriber_list, entry) { > > > struct pnotify_subscriber *to_subscriber = NULL; > > > > > > to_subscriber = pnotify_subscribe(to_task, > > > from_subscriber->events); > > > if (!to_subscriber) { > > > ret=-ENOMEM; > > > __pnotify_exit(to_task); > > > rcu_read_unlock(); > > > return ret; > > > } > > > ret = to_subscriber->events->fork(to_task, to_subscriber, > > > from_subscriber->data); > > > > > > rcu_read_unlock(); /* no more to do with the parent's data */ > > > > rcu_read_unlovk(); should be deployed outside of the > > list_for_each_entry_rcu(){...}. > > > > > > > > if (ret < 0) { > > > /* Propagates to copy_process as a fork failure */ > > > /* No __pnotify_exit because there is one in the > > > failure path > > > * for copy_process in fork.c */ > > > return ret; /* Fork failure */ > > > } > > > else if (ret > 0) { > > > /* Success, but fork function pointer in the > > > pnotify_events structure > > > * doesn't want the kenrel module subscribed */ > > > /* Again, this is the in-construction-child so no > > > write lock */ > > > pnotify_unsubscribe(to_subscriber); > > > } > > > } > > > > > > return 0; /* success */ > > >} > > > > -- > > Linux Promotion Center, NEC > > KaiGai Kohei > -- > Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota > From kingsley@sw.oz.au Wed Sep 28 22:19:32 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 28 Sep 2005 22:19:35 -0700 (PDT) Received: from smtp.sw.oz.au (alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8T5JUO0008759 for ; Wed, 28 Sep 2005 22:19:31 -0700 Received: from kingsley.sw.oz.au (kingsley.sw.oz.au [192.41.203.97]) by smtp.sw.oz.au with ESMTP id j8T5GSc2014500; Thu, 29 Sep 2005 15:16:28 +1000 (EST) Received: from kingsley.sw.oz.au (localhost.localdomain [127.0.0.1]) by kingsley.sw.oz.au (8.13.1/8.12.10) with ESMTP id j8T5GSN0020577; Thu, 29 Sep 2005 15:16:28 +1000 Received: (from kingsley@localhost) by kingsley.sw.oz.au (8.13.1/8.13.1/Submit) id j8T5GRvl020576; Thu, 29 Sep 2005 15:16:27 +1000 Date: Thu, 29 Sep 2005 15:16:27 +1000 From: kingsley@aurema.com To: Erik Jacobson Cc: pagg@oss.sgi.com, tonyt@aurema.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 Message-ID: <20050929051627.GC3404@aurema.com> References: <20050617014512.GA10285@aurema.com> <20050927201020.GA30433@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050927201020.GA30433@sgi.com> User-Agent: Mutt/1.4.1i X-Scanned-By: MIMEDefang 2.52 on 192.41.203.35 X-archive-position: 124 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg On Tue, Sep 27, 2005 at 03:10:20PM -0500, Erik Jacobson wrote: > I fixed this in the RCU version of pnotify I'm working per the lse-tech > community discussion - thanks for the reminder the other day (in a non-list > email). > > If the RCU version crashes and burns for some reason and we go back to > the non-CUu one, I'll need to make the fix there too. The function now > looks like this. I hope this is what you had in mind (untested as of > this moment). Erik, I'm not sure that it does at this moment, not seeing the code for copy_process() or __pnotify_exit(). __pnotify_exit() would need to call the exit callback for all clients except for the client failing the fork call. To do this wouldn't the following be needed in __pnotify_fork()? > int > __pnotify_fork(struct task_struct *to_task, struct task_struct *from_task) > { > struct pnotify_subscriber *from_subscriber; > int ret; > > /* We need to be sure the parent's list we copy from doesn't disappear */ > rcu_read_lock(); > > list_for_each_entry_rcu(from_subscriber, &from_task->pnotify_subscriber_list, entry) { > struct pnotify_subscriber *to_subscriber = NULL; > > to_subscriber = pnotify_subscribe(to_task, from_subscriber->events); > if (!to_subscriber) { > ret=-ENOMEM; > __pnotify_exit(to_task); > rcu_read_unlock(); > return ret; > } > ret = to_subscriber->events->fork(to_task, to_subscriber, > from_subscriber->data); > > rcu_read_unlock(); /* no more to do with the parent's data */ > Then, to make sure the current client does not have his exit callback invoked: ... if (ret != 0) { pnotify_unsubscribe(to_subscriber); if (ret < 0) return ret; } } return 0; } What do you think? -- Kingsley From kaigai@ak.jp.nec.com Wed Sep 28 22:53:42 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 28 Sep 2005 22:53:48 -0700 (PDT) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [210.143.35.51]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8T5rfO0011100 for ; Wed, 28 Sep 2005 22:53:42 -0700 Received: from mailgate4.nec.co.jp (mailgate53.nec.co.jp [10.7.69.184]) by tyo201.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j8T5obE01649; Thu, 29 Sep 2005 14:50:37 +0900 (JST) Received: (from root@localhost) by mailgate4.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j8T5obB05054; Thu, 29 Sep 2005 14:50:37 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv5.nec.co.jp (8.11.7/3.7W-MAILSV4-NEC) with ESMTP id j8T5oan00895; Thu, 29 Sep 2005 14:50:36 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id 6AD9F2FADD; Thu, 29 Sep 2005 14:50:36 +0900 (JST) Message-ID: <433B80B6.2010604@ak.jp.nec.com> Date: Thu, 29 Sep 2005 14:50:46 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Erik Jacobson Cc: Kingsley Cheung , pagg@oss.sgi.com, tonyt@aurema.com, paulmck@us.ibm.com Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11 References: <20050617014512.GA10285@aurema.com> <20050927201020.GA30433@sgi.com> <433A7FE4.5040109@ak.jp.nec.com> <20050928141831.GA24110@sgi.com> In-Reply-To: <20050928141831.GA24110@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 125 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg Hi, Erik Jacobson wrote: >>frequentrly-read pass, such as SELinux's Access Vector Cache(AVC). >>But it's not omnipotence, and it restricts write methodology. > > The feeling I had is that most users of pnotify won't be writing super-often. > This is a generalization that may be incorrect. Taking Job as an example, > once the process is made part of a job, not much usually happens in terms of > adjusting the data pointer associated with the task struct until Job is done. > > I could imag