pagg
[Top] [All Lists]

Re: New 2.6.6 pagg and job patches available

To: Erik Jacobson <erikj@xxxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: New 2.6.6 pagg and job patches available
From: Peter Williams <pwil3058@xxxxxxxxxxxxxx>
Date: Fri, 21 May 2004 12:25:32 +1000
Cc: pagg@xxxxxxxxxxx
In-reply-to: <Pine.SGI.4.53.0405201609330.212943@xxxxxxxxxxxxxxxxxxxxxxx>
References: <Pine.SGI.4.53.0405201609330.212943@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: pagg-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1
Erik Jacobson wrote:
New pagg and job patches are available from the pagg web site for the
2.6.6 kernel.

oss.sgi.com/projects/pagg

Click 'download' on the left.

The patches I made available are:
linux-2.6.6-pagg.patch
and
linux-2.6.6-job.patch

Check out the README for some installation tips.

Note: we haven't finished processing all the community feedback yet.  So
there will likely be at least one more 2.6.6 pagg patch to come.
I also haven't implemented the patches from Peter Williams yet but hope to
soon.

Attached is a patch (against your latest 2.6.6 patch) to do the initialisation of tasks at registration and add real uid/gid and CPU affinity hooks.

The "initialisation of tasks" code still needs some work so that it cleans up properly when faced with memory allocation failures. Probably best left to someone more familiar with PAGG than me.

On a related note, shouldn't the deregistration code do something similar? i.e. call detach() on all of the deregistering clients paggs and remove form any task lists that they are on rather than just refusing to unload if there are any tasks with paggs belonging to the deregistering client. Otherwise every client will have to reinvent the wheel in order to do this when it is unloaded from the kernel.

Peter
--
Dr Peter Williams                                pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
Index: Linux-2.6.X/Documentation/pagg.txt
diff -c Linux-2.6.X/Documentation/pagg.txt:1.1.4.1 
Linux-2.6.X/Documentation/pagg.txt:1.1.4.1.4.1
*** Linux-2.6.X/Documentation/pagg.txt:1.1.4.1  Fri May 21 11:33:14 2004
--- Linux-2.6.X/Documentation/pagg.txt  Fri May 21 12:03:40 2004
***************
*** 32,37 ****
--- 32,48 ----
  used, for example, by other kernel modules that wish to do advanced CPU
  placement on multi-processor systems (just one example).
  
+ The set_user function has been modified to support an optional callout
+ that can be run when a process in a pagg list changes its real uid.
+ 
+ The sys_setresgid, sy_setregid and sys_setgid functions have been modified
+ to support optional callouts that can be run when a process in a pagg list 
changes
+ its real gid.
+ 
+ The set_cpus_allowed function has been modified to support an optional callout
+ that can be run when a process in a pagg list changes its cpu affinity.  It 
could be
+ used, for example, to implement CPU sets.
+ 
  Additional details concerning this implementation of the process aggregates
  infrastructure are described in the sections that follow.
  
***************
*** 51,56 ****
--- 62,69 ----
  -  kernel/Makefile
  -  kernel/exit.c
  -  kernel/fork.c
+ -  kernel/sched.c
+ -  kernel/sys.c
  -  fs/exec.c
  -  init/Kconfig
  
***************
*** 96,101 ****
--- 109,117 ----
       void  *data;                            /* Module specific data */
       struct list_head entry;               /* List connection */
       void    (*exec)(struct task_struct *, struct pagg *); /* exec func ptr */
+     void    (*setruid)(struct task_struct *, struct pagg *); /* setruid func 
ptr */
+     void    (*setrgid)(struct task_struct *, struct pagg *); /* setrgid func 
ptr */
+     void    (*setcpuaffinity)(struct task_struct *, struct pagg *); /* 
setcpuaffinity func ptr */
  
  The pagg structure provides the process' reference to the PAGG
  containers provided by the PAGG modules.  The attach function pointer
***************
*** 104,110 ****
  the referenced PAGG container that the process is exiting or otherwise
  detaching from the container.  The exec function pointer is used when a
  process in the pagg container exec's a new process.  This is optional and
! may be set to NULL if it is not needed by the pagg module.
  
  The pagg_hook structure provides the reference to the module that
  implements a type of PAGG container.  In addition to the function pointers
--- 120,134 ----
  the referenced PAGG container that the process is exiting or otherwise
  detaching from the container.  The exec function pointer is used when a
  process in the pagg container exec's a new process.  This is optional and
! may be set to NULL if it is not needed by the pagg module. The setruid
! function pointer is used when a process in the pagg container changes its
! real uid.  This is optional and may be set to NULL if it is not needed by the
! pagg module. The setrgid function pointer is used when a process in the
! pagg container changes its real gid.  This is optional and may be set to
! NULL if it is not needed by the pagg module. The setcpuaffinity function
! pointer is used when a process in the pagg container changes its cpu
! affinity.  This is optional and may be set to NULL if it is not needed by the
! pagg module.
  
  The pagg_hook structure provides the reference to the module that
  implements a type of PAGG container.  In addition to the function pointers
***************
*** 129,134 ****
--- 153,167 ----
                 Call detach_pagg_list function with current task_struct 
  -  sys_execve:  (fs/exec.c)
       /* When a process in a pagg exec's, an optional callout can be run.  This
+         is implemented with an optional function pointer in the pagg_hook.  */
+ -  set_user:  (kernel/sys.c)
+      /* When a process in a pagg sets its real uid, an optional callout can 
be run.  This
+         is implemented with an optional function pointer in the pagg_hook.  */
+ -  sys_setresgid, sy_setregid and sys_setgid:  (kernel/sys.c)
+      /* When a process in a pagg sets its real gid, an optional callout can 
be run.  This
+         is implemented with an optional function pointer in the pagg_hook.  */
+ -  set_cpus_allowed:  (kernel/sched.c)
+      /* When a process in a pagg changes cpu affinity, an optional callout 
can be run.  This
          is implemented with an optional function pointer in the pagg_hook.  */
  
  2.6 New Functions
Index: Linux-2.6.X/include/linux/pagg.h
diff -c Linux-2.6.X/include/linux/pagg.h:1.1.4.1 
Linux-2.6.X/include/linux/pagg.h:1.1.4.1.4.1
*** Linux-2.6.X/include/linux/pagg.h:1.1.4.1    Fri May 21 11:33:14 2004
--- Linux-2.6.X/include/linux/pagg.h    Fri May 21 12:03:40 2004
***************
*** 121,126 ****
--- 121,138 ----
   *                     in the pagg container exec's a new process. This
   *                     is optional and may be set to NULL if it is not 
   *                     needed by the pagg module.
+  *     setruid:        Function pointer to function used when a process
+  *                     in the pagg container changes its real uid. This
+  *                     is optional and may be set to NULL if it is not 
+  *                     needed by the pagg module.
+  *     setrgid:        Function pointer to function used when a process
+  *                     in the pagg container changes its real gid. This
+  *                     is optional and may be set to NULL if it is not 
+  *                     needed by the pagg module.
+  *     setcpuaffinity: Function pointer to function used when a process
+  *                     in the pagg container changes its cpu affinity. This
+  *                     is optional and may be set to NULL if it is not 
+  *                     needed by the pagg module.
   */
  struct pagg_hook {
         struct module  *module;
***************
*** 131,136 ****
--- 143,151 ----
         int            (*attach)(struct task_struct *, struct pagg *, void*);
         int            (*detach)(struct task_struct *, struct pagg *);
         void           (*exec)(struct task_struct *, struct pagg *);
+        void           (*setruid)(struct task_struct *, struct pagg *);
+        void           (*setrgid)(struct task_struct *, struct pagg *);
+        void           (*setcpuaffinity)(struct task_struct *, struct pagg *);
  };
  
  
***************
*** 145,150 ****
--- 160,168 ----
                         struct task_struct *from_task);
  extern int __pagg_detach(struct task_struct *task);
  extern int __pagg_exec(struct task_struct *task);
+ extern int __pagg_setruid(struct task_struct *task);
+ extern int __pagg_setrgid(struct task_struct *task);
+ extern int __pagg_setcpuaffinity(struct task_struct *task);
  
  /* macros used when a child process must inherit attachment to pagg
   * containers from the parent.
***************
*** 182,187 ****
--- 200,235 ----
                __pagg_exec(task);                                      \
  } while(0)
  
+ /* 
+  * macro used when a process changes real uid.
+  *
+  */
+ #define pagg_setruid(task)                                            \
+ do {                                                                  \
+       if (!list_empty(&task->pagg_list.head))                         \
+               __pagg_setruid(task);                                   \
+ } while(0)
+ 
+ /* 
+  * macro used when a process changes real gid.
+  *
+  */
+ #define pagg_setrgid(task)                                            \
+ do {                                                                  \
+       if (!list_empty(&task->pagg_list.head))                         \
+               __pagg_setrgid(task);                                   \
+ } while(0)
+ 
+ /* 
+  * macro used when a process changes cpu affinity.
+  *
+  */
+ #define pagg_setrcpuaffinity(task)                                    \
+ do {                                                                  \
+       if (!list_empty(&task->pagg_list.head))                         \
+               __pagg_setrcpuaffinity(task);                           \
+ } while(0)
+ 
  /* The static inlines commented out for now with the ifdef below */
  #ifdef NOTDEFINED
  
***************
*** 217,222 ****
--- 265,300 ----
        if (!list_empty(&task->pagg_list.head))
                __pagg_exec(task);
  }
+ 
+ /* 
+  * function used when a process changes real uid.
+  *
+  */
+ static inline void pagg_setruid(struct task_struct *task)
+ {
+       if (!list_empty(&task->pagg_list.head))
+               __pagg_setruid(task);
+ }
+ 
+ /* 
+  * function used when a process changes real gid.
+  *
+  */
+ static inline void pagg_setrgid(struct task_struct *task)
+ {
+       if (!list_empty(&task->pagg_list.head))
+               __pagg_setrgid(task);
+ }
+ 
+ /* 
+  * function used when a process changes cpu affinity.
+  *
+  */
+ static inline void pagg_setcpuaffinity(struct task_struct *task)
+ {
+       if (!list_empty(&task->pagg_list.head))
+               __pagg_setcpuaffinity(task);
+ }
  #endif /* NOT-DEFINED just comment out the block above for now */
  
  /*
***************
*** 240,245 ****
--- 318,326 ----
  #define pagg_attach(ct, pt)  do { } while(0)
  #define pagg_detach(t)  do {  } while(0)     
  #define pagg_exec(t)  do {  } while(0)     
+ #define pagg_setruid(t)  do {  } while(0)     
+ #define pagg_setrgid(t)  do {  } while(0)     
+ #define pagg_setcpuaffinity(t)  do {  } while(0)     
  
  #endif /* CONFIG_PAGG */
  
Index: Linux-2.6.X/kernel/pagg.c
diff -c Linux-2.6.X/kernel/pagg.c:1.1.4.1 
Linux-2.6.X/kernel/pagg.c:1.1.4.1.2.1.2.1
*** Linux-2.6.X/kernel/pagg.c:1.1.4.1   Fri May 21 11:33:14 2004
--- Linux-2.6.X/kernel/pagg.c   Fri May 21 12:03:40 2004
***************
*** 202,207 ****
--- 202,304 ----
  
        /* Okay, we can insert into the pagg hook list */
        list_add_tail(&pagg_hook_new->entry, &pagg_hook_list);
+       /* Now we can call the initialiser function (if present) for each task 
*/
+       if (pagg_hook_new->init != NULL) {
+               int num_inited = 0;
+ 
+               /* Because of internal race conditions we can't gaurantee
+                * getting every task in just one pass so we just keep going 
+                * until we don't find any unitialised tasks.  The inefficiency
+                * of this should be tempered by the fact that this happens
+                * at most once for each registered client.
+                */
+               do {
+                       struct task_struct *p = NULL;
+                       int *live_pids;
+                       int live_pids_sz;
+                       int i, nump;
+                       int failed_pid_mallocs = 0;
+                       int failed_pagg_mallocs = 0;
+ 
+ retry_malloc:
+                       live_pids_sz = nr_threads + 16;
+                       live_pids = kmalloc(sizeof(int) * live_pids_sz, 
GFP_KERNEL);
+                       if (live_pids == NULL) {
+                               /* This should be changed to abort the 
registration
+                                * and undo anything that's been done.  Undoing 
the
+                                * mess may be difficult so we'll just retry 
for the
+                                * time being.
+                                */
+                               if (failed_pid_mallocs < 10) {
+                                       failed_pid_mallocs++;
+                                       yield();
+                                       goto retry_malloc;
+                               } else {
+                                       /* we can't return an error value here
+                                        * as it would cause the module load to
+                                        * fail while we (possibly) still hold
+                                        * malloced memory.  So just warn that
+                                        * initialisation has failed.  This is
+                                        * no worse than completely ignoring
+                                        * the initialisation function.
+                                        */
+                                       printk(KERN_WARNING "Insufficient 
memory"
+                                               " to initialise"
+                                               " PAGG support (name=%s)\n",
+                                               pagg_hook_new->name);
+                                       break;
+                               }
+                       }
+                       read_lock(&tasklist_lock);
+                       if (nr_threads > live_pids_sz) {
+                               read_unlock(&tasklist_lock);
+                               kfree(live_pids);
+                               goto retry_malloc;
+                       }
+                       nump = 0;
+                       for_each_process(p) {
+                               live_pids[nump] = p->pid;
+                               nump++;
+                       }
+                       read_unlock(&tasklist_lock);
+                       num_inited = 0;
+                       for (i = 0; i < nump; i++) {
+                               read_lock(&tasklist_lock);
+                               if (likely((p = find_task_by_pid(live_pids[i])) 
!= NULL))
+                                       get_task_struct(p);
+                               read_unlock(&tasklist_lock);
+                               if (likely(p != NULL)) {
+                                       struct pagg *paggp;
+ 
+                                       down_read(&p->pagg_list.sem);
+                                       paggp = pagg_get(p, 
pagg_hook_new->name);
+                                       up_read(&p->pagg_list.sem);
+ 
+                                       if (paggp == NULL) {
+                                               down_write(&p->pagg_list.sem);
+                                               paggp = pagg_alloc(p, 
pagg_hook_new);
+                                               if (paggp != NULL)
+                                                       pagg_hook_new->init(p, 
paggp);
+                                               else
+                                                       failed_pagg_mallocs++;
+                                               up_write(&p->pagg_list.sem);
+                                               num_inited++;
+                                       }
+                                       put_task_struct(p);
+                               }
+                       }
+                       kfree(live_pids);
+                       if (failed_pagg_mallocs > 10) {
+                               /* we can't return an error value here
+                                * for the same reason as above.
+                                */
+                               printk(KERN_WARNING "Insufficient memory"
+                                       " to initialise PAGG support 
(name=%s)\n",
+                                       pagg_hook_new->name);
+                               break;
+                       }
+               } while (num_inited > 0);
+       }
        up_write(&pagg_hook_list_sem);
  
        printk(KERN_INFO "Registering PAGG support for (name=%s)\n",
***************
*** 392,397 ****
--- 489,560 ----
        list_for_each_entry(pagg, &task->pagg_list.head, entry) {
                if (pagg->hook->exec) /* conditional because it's optional */
                        pagg->hook->exec(task, pagg);
+       }
+ 
+       up_read(&task->pagg_list.sem); /* unlock the pagg list */
+       return 0;
+ }
+ 
+ 
+ /*
+  * __pagg_setruid
+  *
+  * Used to process a task's pagg list when changes real user id.
+  *
+  */
+ int __pagg_setruid(struct task_struct *task) 
+ {
+       struct pagg     *pagg;
+ 
+       down_read(&task->pagg_list.sem); /* lock the pagg list */
+ 
+       list_for_each_entry(pagg, &task->pagg_list.head, entry) {
+               if (pagg->hook->setruid) /* conditional because it's optional */
+                       pagg->hook->setruid(task, pagg);
+       }
+ 
+       up_read(&task->pagg_list.sem); /* unlock the pagg list */
+       return 0;
+ }
+ 
+ 
+ /*
+  * __pagg_setrgid
+  *
+  * Used to process a task's pagg list when it changes real group id.
+  *
+  */
+ int __pagg_setrgid(struct task_struct *task) 
+ {
+       struct pagg     *pagg;
+ 
+       down_read(&task->pagg_list.sem); /* lock the pagg list */
+ 
+       list_for_each_entry(pagg, &task->pagg_list.head, entry) {
+               if (pagg->hook->setrgid) /* conditional because it's optional */
+                       pagg->hook->setrgid(task, pagg);
+       }
+ 
+       up_read(&task->pagg_list.sem); /* unlock the pagg list */
+       return 0;
+ }
+ 
+ 
+ /*
+  * __pagg_setcpuaffinity
+  *
+  * Used to process a task's pagg list when it changes its cpu affinity.
+  *
+  */
+ int __pagg_setcpuaffinity(struct task_struct *task) 
+ {
+       struct pagg     *pagg;
+ 
+       down_read(&task->pagg_list.sem); /* lock the pagg list */
+ 
+       list_for_each_entry(pagg, &task->pagg_list.head, entry) {
+               if (pagg->hook->setcpuaffinity) /* conditional because it's 
optional */
+                       pagg->hook->setcpuaffinity(task, pagg);
        }
  
        up_read(&task->pagg_list.sem); /* unlock the pagg list */
Index: Linux-2.6.X/kernel/sched.c
diff -c Linux-2.6.X/kernel/sched.c:1.1.1.7 
Linux-2.6.X/kernel/sched.c:1.1.1.7.12.1
*** Linux-2.6.X/kernel/sched.c:1.1.1.7  Thu May  6 18:43:49 2004
--- Linux-2.6.X/kernel/sched.c  Fri May 21 12:03:40 2004
***************
*** 2722,2735 ****
  int set_cpus_allowed(task_t *p, cpumask_t new_mask)
  {
        unsigned long flags;
-       int ret = 0;
        migration_req_t req;
        runqueue_t *rq;
  
        rq = task_rq_lock(p, &flags);
        if (any_online_cpu(new_mask) == NR_CPUS) {
!               ret = -EINVAL;
!               goto out;
        }
  
        if (__set_cpus_allowed(p, new_mask, &req)) {
--- 2722,2734 ----
  int set_cpus_allowed(task_t *p, cpumask_t new_mask)
  {
        unsigned long flags;
        migration_req_t req;
        runqueue_t *rq;
  
        rq = task_rq_lock(p, &flags);
        if (any_online_cpu(new_mask) == NR_CPUS) {
!               task_rq_unlock(rq, &flags);
!               return -EINVAL;
        }
  
        if (__set_cpus_allowed(p, new_mask, &req)) {
***************
*** 2737,2747 ****
                task_rq_unlock(rq, &flags);
                wake_up_process(rq->migration_thread);
                wait_for_completion(&req.done);
                return 0;
        }
- out:
        task_rq_unlock(rq, &flags);
!       return ret;
  }
  
  EXPORT_SYMBOL_GPL(set_cpus_allowed);
--- 2736,2747 ----
                task_rq_unlock(rq, &flags);
                wake_up_process(rq->migration_thread);
                wait_for_completion(&req.done);
+               pagg_setcpuaffinity(p);
                return 0;
        }
        task_rq_unlock(rq, &flags);
!       pagg_setcpuaffinity(p);
!       return 0;
  }
  
  EXPORT_SYMBOL_GPL(set_cpus_allowed);
Index: Linux-2.6.X/kernel/sys.c
diff -c Linux-2.6.X/kernel/sys.c:1.1.1.6 Linux-2.6.X/kernel/sys.c:1.1.1.6.16.1
*** Linux-2.6.X/kernel/sys.c:1.1.1.6    Thu May  6 18:43:49 2004
--- Linux-2.6.X/kernel/sys.c    Fri May 21 12:03:40 2004
***************
*** 593,598 ****
--- 593,599 ----
        current->fsgid = new_egid;
        current->egid = new_egid;
        current->gid = new_rgid;
+       pagg_setrgid(current);
        return 0;
  }
  
***************
*** 618,623 ****
--- 619,625 ----
                        wmb();
                }
                current->gid = current->egid = current->sgid = current->fsgid = 
gid;
+               pagg_setrgid(current);
        }
        else if ((gid == current->gid) || (gid == current->sgid))
        {
***************
*** 656,661 ****
--- 658,664 ----
                wmb();
        }
        current->uid = new_ruid;
+       pagg_setruid(current);
        return 0;
  }
  
***************
*** 854,861 ****
                current->egid = egid;
        }
        current->fsgid = current->egid;
!       if (rgid != (gid_t) -1)
                current->gid = rgid;
        if (sgid != (gid_t) -1)
                current->sgid = sgid;
        return 0;
--- 857,866 ----
                current->egid = egid;
        }
        current->fsgid = current->egid;
!       if (rgid != (gid_t) -1) {
                current->gid = rgid;
+               pagg_setrgid(current);
+       }
        if (sgid != (gid_t) -1)
                current->sgid = sgid;
        return 0;
<Prev in Thread] Current Thread [Next in Thread>