pagg
[Top] [All Lists]

[steiner@sgi.com: Re: [Lse-tech] [PATCH] Process Notification (task_noti

To: pagg@xxxxxxxxxxx
Subject: [steiner@sgi.com: Re: [Lse-tech] [PATCH] Process Notification (task_notifiers)]
From: Erik Jacobson <erikj@xxxxxxx>
Date: Sat, 8 Oct 2005 08:21:41 -0500
Cc: Jack Steiner <steiner@xxxxxxx>
Sender: pagg-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.6i
I noticed this wasn't CCd to the pagg mailing list because Jack responded
to one of my early posts to lse-tech that didn't include the pagg list.

There is one big issue with this implementation -- that is, there is no
lock protection around the notifier lists.  

This means that a given kernel module can only be added to the notifier list
of "current" or a child-in-construction.

If we went with this implementation as a foundation for Job, it would mean a 
reduction in the functionality of Job.  You couldn't, for example, subscribe 
a kernel module to an arbitrary process.  In Job, this would mean you wouldn't
be able to put an arbitrary process in to a job container.  This wouldn't
affect PAM library creation of jobs or even job_create.  Things like
job_attachpid, job_detachpid, and similar wouldn't be available though.

I'm not sure what we'd do about job_detachjid - that one seems necessary
but I'm not sure how it would work in this restricted patch.

People are very concerned about adding locks in the kernel it seems.
I went to great lengths to show that the locks in pnotify in and of 
themselves don't cost much at all.  I even backed it up with numbers, etc.
So my thought is, if people like Jack's proposal here, we can probably 
implement locks with it and show they don't cost much.  But it seems
proving they don't cost much isn't enough :(

We need comments on this - please respond back and it's better if you
respond to lse-tech too.  

----- Forwarded message from Jack Steiner <steiner@xxxxxxx> -----

From: Jack Steiner <steiner@xxxxxxx>
To: Erik Jacobson <erikj@xxxxxxx>
Cc: lse-tech@xxxxxxxxxxxxxxxxxxxxx, akpm@xxxxxxxx, kingsley@xxxxxxxxxx,
        canon@xxxxxxxxx
Subject: Re: [Lse-tech] [PATCH] Process Notification (task_notifiers)
Date: Fri, 7 Oct 2005 09:26:07 -0500


Here is an alternate proposal for a PAGG / pnotifier mechanism. This
mechanism uses a new "task notifier" that is task specific. Unlike some
of the other callout mechanisms, callouts are attached to specific tasks.
There is no global callout list.  Task notifiers are optional callouts 
that can be registered on a per-task basic.

The task_notifier mechanism does not provide all of the capabilities of PAGG. 
Specifically, a task cannot register a task_notifier to an arbitrary task.
Registration is supported only against 1) the current task, or 2) a child
task during a clone. There is no support for adding notifiers to an
arbitrary task.

Interesting points:
        - no locks
        - no global data maintained by the notifier mechanism
        - 1 void* pointer added to task struct
        - arbitrary number of callouts
        - support for callout priority
        - each callout can have private data (ex., embed task_notifier in larger
          structure) but general callout mechanism is unaware of the data

Loadable modules need to reference-count being in a notifier list & prevent
unloading if the reference-count is non-zero.  Modules that use task_notifiers
are not visible to anything in the kernel if no task_notifiers are currently
registered.

We have a couple of modules that use the current PAGG callout
mechanism. I converted one of these users (dplace) to use the new mechanism.
The conversion effort was trivial. We have additional modules that use PAGG. 
These have not yet been analyzed to see if task_notifiers are sufficient
but it looks promising.

I'm proposing this patch to see if other users of pagg / pnotify could
use this mechanism instead. The mechanism is lightweight, small & unintrusive.
Does this look like something that would be of general use.


Here is a trial patch against a SUSE kernel that also has the PAGG patch 
applied.
I added callouts in fork(), exec(), & exit() at the same point that PAGG
added callouts. Task_notifier callouts can be added in other places as needed.



 fs/exec.c                     |    2 
 include/linux/init_task.h     |    1 
 include/linux/sched.h         |    2 
 include/linux/task_notifier.h |   56 ++++++++++++++++++++
 kernel/Makefile               |    2 
 kernel/exit.c                 |    2 
 kernel/fork.c                 |    4 +
 kernel/task_notifier.c        |  113 ++++++++++++++++++++++++++++++++++++++++++

Index: linux/include/linux/task_notifier.h
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ linux/include/linux/task_notifier.h 2005-10-05 13:06:33.994660706 -0500
@@ -0,0 +1,55 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved.
+ *
+ *
+ *     Routines to manage task notifier chains for passing task state changes 
to any
+ *     interested routines.
+ *
+ */
+
+#ifndef _LINUX_TASK_NOTIFIER_H
+#define _LINUX_TASK_NOTIFIER_H
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <linux/sched.h>
+#include <asm/current.h>
+
+struct task_notifier_block;
+
+typedef void (task_notifier_func)(unsigned int, void *, struct 
task_notifier_block *nb);
+
+struct task_notifier_block
+{
+       task_notifier_func *task_notifier_func;
+       struct task_notifier_block *next;
+       int priority;
+       unsigned int id_select;
+};
+
+
+extern void task_notifier_chain_register(struct task_notifier_block *nb);
+extern void task_notifier_chain_register_child(struct task_notifier_block *nb, 
struct task_struct *p);
+extern void task_notifier_chain_unregister(struct task_notifier_block *nb);
+extern struct task_notifier_block 
*find_task_notifier_block_proc(task_notifier_func *func);
+extern void _task_notifier_call_chain(unsigned int id, void *arg);
+
+static inline void task_notifier_call_chain(unsigned int id, void *arg)
+{
+       if (unlikely(current->task_notifier))
+               _task_notifier_call_chain(id, arg);
+}
+
+
+/*
+ *     Notifier identifiers - used as bitmask for selections
+ */
+
+#define TN_FORK                        0x00000001
+#define TN_EXEC                        0x00000002
+#define TN_EXIT                        0x00000004
+
+#endif /* _LINUX_TASK_NOTIFIER_H */
Index: linux/kernel/task_notifier.c
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ linux/kernel/task_notifier.c        2005-10-05 13:06:52.217418263 -0500
@@ -0,0 +1,111 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/task_notifier.h>
+#include <asm/current.h>
+
+/**
+ *  task_notifier_chain_register_task - Add a task_notifier to a task.
+ *     @n: Entry to be added to the task notifier chain
+ *     @p: Task (must be current or a child being created during fork/clone
+ */
+static inline void task_notifier_chain_register_task(struct task_notifier_block
+                                                    *nb, struct task_struct *p)
+{
+       struct task_notifier_block **list =
+           (struct task_notifier_block **)&p->task_notifier;
+
+       while (*list && nb->priority > (*list)->priority)
+               list = &((*list)->next);
+       nb->next = *list;
+       *list = nb;
+}
+
+/**
+ *  task_notifier_chain_register_child - Add a task_notifier to a task during 
fork/clone.
+ *     @n: Entry to be added to the task notifier chain
+ *     @p: Task (must be current or a child being created during fork/clone
+ */
+void task_notifier_chain_register_child(struct task_notifier_block *nb,
+                                       struct task_struct *p)
+{
+       task_notifier_chain_register_task(nb, p);
+}
+
+/**
+ *  task_notifier_chain_register - Add a task_notifier to the current task
+ *     @n: Entry to be added to the task notifier chain
+ */
+void task_notifier_chain_register(struct task_notifier_block *nb)
+{
+       task_notifier_chain_register_task(nb, current);
+}
+
+/**
+ *     task_notifier_chain_unregister - Remove notifier from the current task
+ *     @nb: Entry in notifier chain
+ *
+ */
+
+void task_notifier_chain_unregister(struct task_notifier_block *nb)
+{
+       struct task_notifier_block **list =
+           (struct task_notifier_block **)&current->task_notifier;
+
+       while (*list && *list != nb)
+               list = &((*list)->next);
+
+       if (*list == nb)
+               *list = nb->next;
+}
+
+/**
+ *     find_task_notifier_block_proc - Scan the current task's notifiers for a
+ *     task_notifier_block with a pointer to @func
+ *     @func: Function
+ *
+ */
+
+struct task_notifier_block *find_task_notifier_block_proc(task_notifier_func *
+                                                         func)
+{
+       struct task_notifier_block *nb = current->task_notifier;
+
+       while (nb && nb->task_notifier_func != func)
+               nb = nb->next;
+
+       return nb;
+}
+
+/**
+ *     task_notifier_call_chain - Call functions in a notifier chain
+ *     @id: Callout identifier
+ *     @arg: Argument to pass to called fumctions
+ *
+ *     Calls each function in a notifier chain in turn.
+ *
+ */
+
+void _task_notifier_call_chain(unsigned int id, void *arg)
+{
+       struct task_notifier_block *nb, *next_nb = current->task_notifier;
+
+       while ((nb = next_nb)) {
+               next_nb = nb->next;
+               if (nb->id_select & id)
+                       nb->task_notifier_func(id, arg, nb);
+       }
+}
+
+EXPORT_SYMBOL_GPL(task_notifier_chain_register);
+EXPORT_SYMBOL_GPL(task_notifier_chain_register_child);
+EXPORT_SYMBOL_GPL(task_notifier_chain_unregister);
+EXPORT_SYMBOL_GPL(find_task_notifier_block_proc);
+EXPORT_SYMBOL_GPL(task_notifier_call_chain);
Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c        2005-09-17 09:25:56.990242412 -0500
+++ linux/fs/exec.c     2005-10-05 13:11:29.621874069 -0500
@@ -51,6 +51,7 @@
 #include <linux/audit.h>
 #include <linux/trigevent_hooks.h>
 #include <linux/pagg.h>
+#include <linux/task_notifier.h>
 #include <linux/acct.h>
 
 #include <asm/uaccess.h>
@@ -1209,6 +1210,7 @@ int do_execve(char * filename,
 
                free_arg_pages(&bprm);
                pagg_exec(current);
+               task_notifier_call_chain(TN_EXEC, NULL); /* should file or bprm 
be passed??? */
 
                /* execve success */
                security_bprm_free(&bprm);
Index: linux/include/linux/init_task.h
===================================================================
--- linux.orig/include/linux/init_task.h        2005-09-17 09:25:22.922404438 
-0500
+++ linux/include/linux/init_task.h     2005-09-17 09:45:28.637217785 -0500
@@ -115,6 +115,7 @@ extern struct group_info init_groups;
        .journal_info   = NULL,                                         \
        .map_base       = __TASK_UNMAPPED_BASE,                         \
        .io_wait        = NULL,                                         \
+       .task_notifier  = NULL,                                         \
        INIT_TASK_PAGG(tsk)                                             \
 }
 
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h    2005-09-17 09:25:56.999030585 -0500
+++ linux/include/linux/sched.h 2005-09-17 10:02:23.314663901 -0500
@@ -601,7 +601,7 @@ struct task_struct {
        struct list_head pagg_list;
        struct rw_semaphore pagg_sem;
 #endif
-
+       void *task_notifier;
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c    2005-09-17 09:25:57.001959975 -0500
+++ linux/kernel/exit.c 2005-09-17 09:59:17.263210664 -0500
@@ -31,6 +31,7 @@
 #include <linux/trigevent_hooks.h>
 #include <linux/ltt.h>
 #include <linux/pagg.h>
+#include <linux/task_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -863,6 +864,7 @@ asmlinkage NORET_TYPE void do_exit(long 
 
        tsk->exit_code = code;
        pagg_detach(tsk);
+       task_notifier_call_chain(TN_EXIT, NULL);
        exit_notify(tsk);
        BUG_ON(!(current->flags & PF_DEAD));
        schedule();
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c    2005-09-17 09:25:57.884683056 -0500
+++ linux/kernel/fork.c 2005-10-07 09:15:25.285049535 -0500
@@ -37,6 +37,7 @@
 #include <linux/trigevent_hooks.h>
 #include <linux/cpuset.h>
 #include <linux/pagg.h>
+#include <linux/task_notifier.h>
 #include <linux/acct.h>
 
 #include <linux/ckrm.h>
@@ -1065,6 +1066,8 @@ struct task_struct *copy_process(unsigne
         * process aggregate containers as the parent process.
         */
        pagg_attach(p, current);
+       p->task_notifier = NULL;
+       task_notifier_call_chain(TN_FORK, p);
 
        /*
         * Ok, make it visible to the rest of the system.
@@ -1150,6 +1153,7 @@ fork_out:
 
 bad_fork_cleanup_namespace:
        pagg_detach(p);
+       task_notifier_call_chain(TN_EXIT, NULL);        /* should this be a 
unique TN_xxx id?? */
        exit_namespace(p);
 bad_fork_cleanup_mm:
        exit_mm(p);
Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile  2005-09-17 09:25:27.373125449 -0500
+++ linux/kernel/Makefile       2005-09-17 10:23:09.740024850 -0500
@@ -7,7 +7,7 @@ obj-y     = sched.o fork.o exec_domain.o
            sysctl.o capability.o ptrace.o timer.o user.o \
            signal.o sys.o kmod.o workqueue.o pid.o \
            rcupdate.o intermodule.o extable.o params.o posix-timers.o \
-           kthread.o ckrm/
+           kthread.o ckrm/ task_notifier.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o

-- 
Thanks

Jack Steiner (steiner@xxxxxxx)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.




-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Lse-tech mailing list
Lse-tech@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/lse-tech

----- End forwarded message -----
--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

<Prev in Thread] Current Thread [Next in Thread>