[Top] [All Lists]

another new rev of the docs...

To: pagg@xxxxxxxxxxx, Christoph Lameter <clameter@xxxxxxx>
Subject: another new rev of the docs...
From: Erik Jacobson <erikj@xxxxxxx>
Date: Mon, 19 Sep 2005 11:54:14 -0500
Sender: pagg-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.6i
Here is another revision taking in many suggestions from Dean Nelson.
The return values for the fork function pointer are defined, and some
function names changed.


I am re-working what used to be PAGG to have a new name, better documentation, 
and better variable names.

My hope is that I can present this to the community for inclusion in the
kernel and I'm hoping to have a couple of the users of this help by 
explaining how they use it.

I feel one reason PAGG didn't get attention was because it's true function
was obscured by its name and the names of functions and variables within.

The first step of this for me was to write some new documentation using
the new names for the pieces.  Before I propose this to the broader
community, I'd like to get feedback.  After that, I plan to re-write the code 
to match and post it.

If a variable seems too long (some are), perhaps provide a suggested shorer 
name.  The name of pnotify itself is fair game.  It turns out it was hard to 
pick a name for this thing.

Process Notification (pnotify)
pnotify provides a method (service) for kernel modules to be notified when 
certain events happen in the life of a process.  Events we support include 
fork, exit, and exec.  A special init event is also supported (see events 
below).  More events could be added.  pnotify also provides a generic data 
pointer for the modules to work with so that data can be associated per 

A kernel module will register (pnotify_register) a service request describing
events it cares about (pnotify_events) with pnotify_register.  The request 
tells pnotify which notifications the kernel module wants.  The kernel module 
passes along function pointers to be called for these events (exit, fork, exec) 
in the pnotify_events service request.

From the process point of view, each process has a kernel module subscriber 
list (pnotify_module_subscriber_list).  These kernel modules are the ones who 
want notification about the life of the process.  As described above, each 
kernel module subscriber on the list has a generic data pointer to point to 
data associated with the process.

In the case of fork, pnotify will allocate the same kernel module subscriber 
list for the new child that existed for the parent.  The kernel module's 
function pointer for fork is also called for the child being constructed so 
the kernel module can do what ever it needs to do when a parent forks this 
child.  Special return values apply for the fork event that don't to others.
They are described in the fork example below.

For exit, similar things happen but the exit function pointer for each
kernel module subscriber is called and the kernel module subscriber entry for 
that process is deleted.

Events are stages of a processes life that kernel modules care about.  The 
fork event is triggered in a certain location in copy_process when a parent 
forks.  The exit event happens when a process is going away.  We also support 
an exec event, which happens when a process execs.  Finally, there is an init 
event.  This special event makes it so this kernel module will be associated 
with all current processes in the system at the time of registration.  This is 
used when a kernel module wants to keep track of all current processes as 
opposed to just those it associates by itself (and children that follow).  The 
events a kernel module cares about are set up in the pnotify_events
structure - see usage below.

When setting up a pnotify_events, you designate which events you care about 
by either associating NULL (meaning you don't care about that event) or a 
pointer to the function to run when the event is triggered.  The fork event
is currently required.

How do processes become associated with kernel modules?
Your kernel module itself can use the pnotify_subscribe function to associate 
a given process with a given pnotify_events structure.  This adds 
your kernel module to the subscriber list of the process.  In the case
of inescapable job containers making use of PAM, when PAM allows a person to 
log in, PAM contacts job (via a PAM job module which uses the job userland 
library) and the kernel Job code will call pnotify_subscribe to associate the 
process with pnotify.  From that point on, the kernel module will be notified 
about events in the process's life that the module cares about (as well,
as any children that process may later have).

Likewise, your kernel module can remove an association between it and
a given process by using pnotify_unsubscribe.

Example Usage

=== filling out the pnotify_events structure ===

A kernel module wishing to use pnotify needs to set up a pnotify_events 
structure.  This structure tells pnotify which events you care about and what 
functions to call when those events are triggered.  In addition, you supply a 
name (usually the kernel module name).  The entry is always filled out as 
shown below.  .module is usually set to THIS_MODULE.  data can be optionally 
used to store a pointer with the pnotify_events structure.

Example of a filled out pnotify_events:

static struct pnotify_events pnotify_events = {
        .module  = THIS_MODULE,
        .name = "test_module",
        .data = NULL,
        .entry   = LIST_HEAD_INIT(pnotify_events.entry),
        .init = test_init,
        .fork  = test_attach,
        .exit  = test_detach,
        .exec = test_exec,

The above pnotify_events structure says the kernel module "test_module" cares 
about events fork, exit, exec, and init.  In fork, call the kernel module's 
test_attach function.  In exec, call test_exec.  In exit, call test_detach.  
The init event is specified, so all processes on the system will be associated 
with this kernel module during registration and the test_init function will 
be run for each.

=== Registering with pnotify ===

You will likely register with pnotify in your kernel module's module_init
function.  Here is an example:

static int __init test_module_init(void)
        int rc = pnotify_register(&pnotify_events);
        if (rc < 0) {
                return -1;

        return 0;

=== Example init event function ====

Since the init event is defined, it means this kernel module is added
to the subscriber list of all processes -- it will receive notification
about events it cares about for all processes and all children that

Of course, if a kernel module doesn't need to know about all current 
processes, that module shouldn't implement this and '.init' in the 
pnotify_events structure would be NULL.

This is as opposed to the normal method where the kernel module adds itself 
to the subscriber list of a process using pnotify_subscribe.

static int test_init(struct task_struct *tsk, struct pnotify_subscriber 
        if (pnotify_get_subscriber(tsk, "test_module") == NULL)
                dprintk("ERROR pnotify expected \"%s\" PID = %d\n", 
"test_module", tsk->pid);

        dprintk("FYI pnotify init hook fired for PID = %d\n", tsk->pid);
        return 0;

=== Example fork (test_attach) function ===

This function is executed when a process forks - this is associated
with the pnotify_callout callout in copy_process.  There would be a very
similar test_detach function (not shown).  

pnotify will add the kernel module to the notification list for the child 
process automatically and then execute this fork function pointer (test_attach 
in this example).  However, the kernel module can control whether the kernel 
module stays on the process's subscriber list and wants notification by the 
return value.

PNOTIFY_ERROR - prevent the process from continuing - failing the fork
PNOTIFY_OK - good, adds the kernel module to the subscriber list for process
PNOTIFY_NOSUB - good, but don't add kernel module to subscriber list for process

static int test_attach(struct task_struct *tsk, struct pnotify_subscriber 
*subscriber, void *vp)
        dprintk("pnotify attach hook fired for PID = %d\n", tsk->pid);

        return PNOTIFY_OK;

=== Example exec event function ===

And here is an example function to run when a task gets to exec.  So any
time a "tracked" process gets to exec, this would execute. 

static void test_exec(struct task_struct *tsk, struct pnotify_subscriber 
        dprintk("pnotify exec hook fired for PID %d\n", tsk->pid);

=== Unregistering with pnotify ===

You will likely wish to unregister with pnotify in the kernel module's
module_exit function.  Here is an example:

static void __exit test_module_cleanup(void)
        printk("detach called %d times...\n", atomic_read(&detach_count));
        printk("attach called %d times...\n", atomic_read(&attach_count));
        printk("init called %d times...\n", atomic_read(&init_count));
        printk("exec called %d times ...\n", atomic_read(&exec_count));
        if (atomic_read(&attach_count) + atomic_read(&init_count) !=
        printk("pnotify PROBLEM: attach count + init count SHOULD equal detach 
cound and doesn't\n");
        printk("Good - attach count + init count equals detach count.\n");

=== Actually using data associated with the process in your module ===

The above examples show you how to create an example kernel module using
pnotify, but they didn't show what you might do with the data pointer 
associated with a given process.  Below, find an example of accessing
the data pointer for a given process from within a kernel module making use
of pnotify.

pnotify_get_subscriber is used to retrieve the pnotify subscriber for a given
process and kernel module.  Like this:

subscriber = pnotify_get_subscriber(task, name);

Where name is your kernel module's name (as provided in the pnotify_events 
structure) and task is the process you're interested

Please be careful about locking.  The task structure has a 
pnotify_subscriber_list_sem to be used for locking.  This example retrieves
a given task in a way that ensures it doesn't disappear while we try to 
access it (that's why we do locking for the tasklist_lock and task).  The
pnotify subscriber list is locked to ensure the list doesn't change as we 
search it with pnotify_get_subscriber.

        get_task_struct(task); /* Ensure the task doesn't vanish on us */
        read_unlock(&tasklist_lock); /* Unlock the tasklist */
        down_read(&task->pnotify_subscriber_list_sem); /* readlock subscriber 
list */
        subscriber = pnotify_get_subscriber(task, name);
        if (subscriber) {
                /* Get the widgitId associated with this task */
                widgitId = ((widgitId_t *)subscriber->data);
        put_task_struct(task);  /* Done accessing the task */
        up_read(&task->pnotify_subscriber_list_sem); /* unlock subscriber list 

Process Notification used to be known as PAGG (Process Aggregates).  
It was re-written to be called Process Notification because we believe this
better describes its purpose.  Structures and functions were re-named to
be more clear and to reflect the new name.

Why Not Notifier Lists?
We investigated the use of notifier lists, available in newer kernels.

Notifier lists would not be as efficient as pnotify for kernel modules 
wishing to associate data with processes.  With pnotify, if the 
pnotify_subscriber_list of a given task is NULL, we can instantly know 
there are no kernel modules that care about the process.  Further, the 
callbacks happen in places were the task struct is likely to be cached.  
So this is a quick operation.  With notifier lists, the scope is system 
wide rather than per process.  As long as one kernel module wants to be 
notified, we have to walk the notifier list and potentially waste cycles. 
In the case of pnotify, we only walk lists if we're interested about
a specific task.  

On a system where pnotify is used to track only a few processes, the 
overhead of walking the notifier list is high compared to the overhead
of walking the kernel module subscriber list only when a kernel module
is interested in a given process.

Overlooking performance issues, notifier lists in and of themselves wouldn't 
solve the problem pnotify solves anyway.  Although you could argue notifier 
lists can implement the callback portion of pnotify, there is no association 
of data with a given process.  This is a needed for kernel modules to 
efficiently associate a task with a data pointer without cluttering up
the task struct.

Some Justification
We feel that pnotify could be used to reduce the size of the task struct or
the number of functions in copy_process.  For example, if another part of the 
kernel needs to know when a process is forking or exiting, they could use 
pnotify instead of adding additional code to task struct, copy_process, or

Some have argued that PAGG in the past shouldn't be used because it will
allow interesting things to be implemented outside of the kernel.  While this
might be a small risk, having these in place allows customers and users to
implement kernel components that you don't want to see in the kernel anyway.

For example, a certain vendor may have an urgent need to implement kernel
functionality or special types of accounting that nobody else is interested 
in.  That doesn't mean the code isn't open-source, it just means it isn't
applicable to all of Linux because it satisfies a niche.

All of pnotify's functionality that needs to be exported is exported with
EXPORT_SYMBOL_GPL to discourage abuse.

The risk already exists in the kernel for people to implement modules outside
the kernel that suffer from less peer review and possibly bad programming
practice.  pnotify could add more oppurtunities for out-of-tree kernel module 
authors to make new modules.  I believe this is somewhat mitigated by the 
already-existing 'tainted' warnings in the kernel.  
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

<Prev in Thread] Current Thread [Next in Thread>
  • another new rev of the docs..., Erik Jacobson <=