pagg
[Top] [All Lists]

PAGG ideas for next attempt: new docs, new name?

To: pagg@xxxxxxxxxxx
Subject: PAGG ideas for next attempt: new docs, new name?
From: Erik Jacobson <erikj@xxxxxxx>
Date: Sat, 17 Sep 2005 10:34:10 -0500
Sender: pagg-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.6i
I'm looking for feedback on these ideas.  I'm sending this to the PAGG
list.  After I gather feedback from you and some co-workers, I'll be posting
this to lse-tech and some other folks as well.  I'll then work on the code 
side of the changes.

Please see the justification section for sure.  I'm not sure I should say 
that stuff - so let me know if it is silly to put there. 

(I'll wait until Monday or Tuesday to send this off to a broader audience
to be sure it isn't lost in the weekend).

===

I am re-working what used to be PAGG to have a new name, better documentation, 
and better variable names.

My hope is that I can present this to the community for inclusion in the
kernel and I'm hoping to have a couple of the users of this help by 
explaining how they use it.

I feel one reason PAGG didn't get attention was because it's true function
was obscured by its name and the names of functions and variables within.

The first step of this for me was to write some new documentation using
the new names for the pieces.  Before I propose this to the broader
community, I'd like to get feedback.  After that, I plan to re-write the code 
to match and post it.

If a variable seems too long (some are), perhaps provide a suggested shorer 
name.  The name of PN itself is fair game.  It turns out it was hard to pick a 
name for this thing.




Process Notification (PN)
--------------------
PN provides a method (service) for kernel modules to be notified when certain 
events happen in the life of a process.  Events we support include fork, 
exit, and exec.  A special init event is also supported (see events below).
More events could be added.  PN also provides a generic data pointer for the 
modules to work with so that data can be associated per process.

A kernel module will register (pn_register) a service request 
(pn_service_request) with PN.  The request tells PN which notifications the 
kernel module wants.  The kernel module passes along function pointers to be 
called for these events (exit, fork, exec) in the service request.

>From the process point of view, each process has a kernel module subscriber 
list (pn_module_subscriber_list).  These kernel modules are the ones who want 
notification about the life of the process.  As described above, each kernel 
module subscriber on the list has a generic data pointer to point to data 
associated with the process.

In the case of fork, PN will allocate the same kernel module subscriber list
for the new child that existed for the parent.  The kernel module's function 
pointer for fork is also called so the kernel module can do what ever it needs 
to do when a parent forks.

For exit, similar things happen but the exit function pointer for each
kernel module subscriber is called and the kernel module subscriber list
for that task is deleted.


Events
------
Events are stages of a processes life that kernel modules care about.  The 
fork event is a spot in copy_process when a parent forks.  The exit event
happens when a process is going away.  We also support an exec event, which 
happens when a process execs.  Finally, there is an init event.  This special 
event makes it so this kernel module will be associated with all current 
processes in the system.  This is used when a kernel module wants to keep 
track of all current processes as opposed to just those it associates by 
itself (and children that follow).  The events a kernel module cares about are 
set up in the pn_service_request structure - see usage below.

When setting up a pn_service_request structure, you designate which
events you care about by either associating NULL (meaning you don't care
about that event) or a pointer to the function to run when the event is 
triggered.  fork and exit are currently required.


How do processes become associated with kernel modules?
-------------------------------------------------------
Your kernel module itself can use the pn_alloc function to associate a 
given process with a given pn_service_request structure.  This adds your
kernel module to the subscriber list of the process.  In the case
of inescapable job containers making use of PAM, when PAM allows a person to 
log in, PAM contacts job (via a PAM job module which uses the job userland 
library) and the kernel Job code will call pn_alloc to associate the process 
with PN.  From that point on, the kernel module will be notified about events 
in the process's life that the module cares about.

Likewise, your kernel module can remove an association between it and
a given process by using pn_subscriber_free.


Example Usage
-------------

=== filling out the pn_service_request structure ===

A kernel module wishing to use PN needs to set up a pn_service_request
structure.  This structure tells PN which events you care about and what
functions to call when those events are triggered.  In addition, you
supply a name (usually the kernel module name).  The entry is always
filled out as shown below.  .module is usually set to THIS_MODULE.
data can be optionally used to store a pointer with the service request
structure.

Example of a filled out pn_service_request:

static struct pn_service_request pn_service_request = {
   .module  = THIS_MODULE,
   .name = "test_module",
   .data = NULL,
   .entry   = LIST_HEAD_INIT(pn_service_request.entry),
        .init = test_init,
   .fork  = test_attach,
   .exit  = test_detach,
        .exec = test_exec,
};

The above pn_service_request says the kernel module "test_module" cares about
events fork, exit, exec, and init.  In fork, call the kernel module's 
test_attach function.  In exec, call test_exec.  In exit, call test_detach.  
The init event is specified, so all processes on the system will be associated 
with this kernel module and the test_init function will be run for each.


=== Registering with PN ===

You will likely register with PN in your kernel module's module_init
function.  Here is an example:

static int __init test_module_init(void)
{
        int rc = pn_register(&pn_service_request);
        if (rc < 0) {
                return -1;
        }

        return 0;
}


=== Example init event function ====

Since the init event is defined, it means this kernel module is added
to the subscriber list of all processes -- it will receive notification
about events it cares about for all processes and all children that
follow.

Of course, if a kernel module doesn't need to know about all current 
processes, that module shouldn't implement this and '.init' in the 
pn_service_request structure would be NULL.

This is as opposed to the normal method where the kernel module adds itself 
to the subscriber list of a process using pn_alloc.

static int test_init(struct task_struct *tsk, struct pn_subscriber *subscriber)
{
        if (pn_get_subscriber(tsk, "test_module") == NULL)
                dprintk("ERROR PN expected \"%s\" PID = %d\n", "test_module", 
tsk->pid);

        dprintk("FYI PN init hook fired for PID = %d\n", tsk->pid);
        atomic_inc(&init_count);
        return 0;
}


=== Example fork (test_attach) function ===

This function is executed when a process forks - this is associated
with the pn_callout callout in copy_process.  There would be a very
similar test_detach function (not shown).  

PN will add the kernel module to the notification list for the child process
automatically and then execute this fork function pointer (test_attach in this 
example).  However, the kernel module can control if the kernel module stays 
on the processes's subscriber list and wants notification by the return value.

A negative value results in the fork failing.  zero is success.  
>0 means success, but the kernel module doesn't want the to be associated
with that specific process (doesn't want notification).  In other words, 
if >0 is returned, your kernel module is saying that it doesn't want to be on 
the subscriber list for this process.


static int test_attach(struct task_struct *tsk, struct pagg *pagg, void *vp)
{
        dprintk("PN attach hook fired for PID = %d\n", tsk->pid);
        atomic_inc(&attach_count);

        return 0;
}


=== Example exec event function ===

And here is an example function to run when a task gets to exec.  So any
time a "tracked" process gets to exec, this would execute.  More
hooks/callouts similar to this one could be implemented as there is demand
for them.

static void test_exec(struct task_struct *tsk, struct pn_subscriber *subscriber)
{
        dprintk("PN exec hook fired for PID %d\n", tsk->pid);
        atomic_inc(&exec_count);
}


=== Unregistering with PN ===

You will likely wish to unregister with PN in the kernel module's
module_exit function.  Here is an example:

static void __exit test_module_cleanup(void)
{
        pn_unregister(&pn_service_request);
        printk("detach called %d times...\n", atomic_read(&detach_count));
        printk("attach called %d times...\n", atomic_read(&attach_count));
        printk("init called %d times...\n", atomic_read(&init_count));
        printk("exec called %d times ...\n", atomic_read(&exec_count));
        if (atomic_read(&attach_count) + atomic_read(&init_count) !=
          atomic_read(&detach_count))
        printk("PN PROBLEM: attach count + init count SHOULD equal detach cound 
and doesn't\n");
        else
        printk("Good - attach count + init count equals detach count.\n");
}



=== Actually using data associated with the process in your module ===

The above examples show you how to create an example kernel module using
PN, but it doesn't show what you might do with the data pointer associated
with a given process.

Linux Inescapable Jobs is a good example of making use of PN.  Some versions
of it use PAGG, which is what PN is based on.  A new Job patch should be
available soon if not already.  See oss.sgi.com/projects/pagg.

A Job is a group of processes from which a process cannot escape.  
A batch scheduling system such as LSF may use Job to put possibly otherwise
unrelated processes together to be tracked and signaled as ia set including
any children that follow.  If the Job PAM module is used, each login processes
gets a job ID and the children become part of the job by default.

In Job, we want to know whenever a parent forks a new process or whenever
a process exits.  So Job gets notified for these events, and adds the
process to the list of processes in the job (or removes then in the case
of exit).  To efficiently add a job, we need to know which Job the
parent was in.  This information, in our case, is what is stored in the
data pointer within the pn_subscriber structure associated with a given
process.

pn_get_subscriber is used to retrieve the PN subscriber for a given
process and kernel module.  Like this:

subscriber = pn_get_subscriber(task, name);

Where name is your kernel module's name (as provided in the
pn_service_request structure) and task is the process you're interested
in.

Please be careful about locking.  The task structure has a 
pn_subscriber_list_sem to be used for locking.  An example code snip 
follows:

        /* We have a valid task now */
        get_task_struct(task); /* Ensure the task doesn't vanish on us */
        read_unlock(&tasklist_lock); /* Unlock the tasklist */
        down_write(&task->pn_subscriber_list_sem); /* write lock subscriber 
list */
                                                                       
        subscriber = pn_get_subscriber(task, pagg_hook.name);
        if (subscriber) {
                detachpid.r_jid = ((struct job_attach 
*)subscriber->data)->job->jid;
                subscriber->pn_subscriber_request->detach(task, subscriber);
                pn_subscriber_free(subscriber);
        } else {
                errcode = -ENODATA;
        }
        put_task_struct(task);  /* Done accessing the task */
        up_write(&task->pn_subscriber_list_sem); /* write unlock subscriber 
list */


In the above snip, we make sure we have a task that won't disappear on
us.  Then we write lock the pn_subscriber_list-sem to be sure it doesn't
change on it.  We write lock (rather than read) because we're going to
be removing an entry from it.

If there is a subscriber for this kernel module matching the given
process, we store the jid (job identifier in Job), we call our own
detach function directly (in Job, this associated with the exit event),
and we remove the subscriber from the subscriber list.  This means
this kernel module will no longer get notifications of events for this 
task.

The detachjid.r_jid line above is an example of retrieving data from
the data pointer for the given subscriber.


History
-------
Process Notification used to be known as PAGG (Process Aggregates).  
It was re-written to be called Process Notification because we believe this
better describes its purpose.  Structures and functions were re-named to
be more clear and to reflect the new name.


Why Not Notifier Lists?
-----------------------
We investigated the use of notifier lists, available in newer kernels.
There were two reasons we didn't use them to implement PAGG.

1) There seems to be some tricky locking issues with notifier lists.
   For example, if a kernel module exits while the notifier list is 
   walked, we could have trouble.  There may be means to work around this

2) Notifier lists would not be as efficient as PN for kernel modules wishing
   to associate data with processes.  With PN, if the pn_subscriber_list 
   of a given task is NULL, we can instantly know there are no kernel modules
   that care about the process.  Further, the callbacks happen in places were
   the task struct is likely to be cached.  So this is a quick operation.  
   With notifier lists, the scope is system wide rather than per process.  As 
   long as one kernel module wants to be notified, we have to walk the
   notifier list and potentially waste cycles. 
   

Some Justification
------------------
Some have argued that PAGG in the past shouldn't be used because it will
allow interesting things to be implemented outside of the kernel.  While this
might be a small risk, having these in place allows customers and users to
implement kernel components that you don't want to see in the kernel anyway.

SGI may have HPC needs that very few other people are interested in.  We in
fact have 4 open source projects that make use of PAGG (and will convert to
PN).  At least one of these projects is urgent for our customers but is
simply not interesting to enough people to maintain in the kernel itself.
In a world where all customers need to run on standard distributions to
be supported by the distributor, we're left in a situation where:

  a) The distributor doesn't want to take patches not accepted in the kernel
  b) The community wants everything important in the kernel
  c) The community wants only things having multiple users in the kernel
  d) SGI has things that are only interesting to SGI systems and it's 
     customers (not multiple users)
  e) There is no option to re-build kernels while staying in a supported
     environment.

We find it hard to support customers in this catch 22 situation.

PN allows us to implement our open source projects outside of the mainline
kernel.  We do offer things like Job for inclusion, but so far haven't
met with success in getting it accepted.

We feel PN is very useful for kernel components already in the kernel too.
There is a potential to reduce the number of calls in the copy_process
path, for example.  One could also envision things in the task struct that
are used slightly less frequently could be implemented to use  PN.

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

<Prev in Thread] Current Thread [Next in Thread>