Linux Process Aggregates (PAGG) Kernel Design Document Sam Watters Last Update: June 19, 2000 1. Introduction A job is a group of related processes, all descended from a point of entry (POE) process and identified by a unique job ID. A job can contain multiple process groups, session, and processes. The job acts as a process containment mechanism and a process is not allowed to escape from the job container. This allows resource limits to be extended from the process level to the job level. This allows for enhanced resource management features to be incorporated into the system. Additionally, the job allows accounting information to be accumulated for all processes that executed within the job container. This provides users and administrators with increased capabilities for system scheduling and planning for work loads. The job, has the following characteristics: · A job is an inescapable container. A process cannot leave the job container nor can a new process be created outside the job without explicit action, that is, a system call with root privilege. · Each new process inherits the job ID and limits from its parent process. · All point of entry processes (job initiators) create a new job and set the job limits appropriately. · Users can raise and lower their own job limits within maximum values specified by the system administrator. · The job initiator performs authentication and security checks. · The process control initialization process (init(1M)) and start-up scripts called by init are not part of a job. Likewise, system daemons are usually not part of a job. SGI is porting the Comprehensive System Accounting (CSA) package from IRIX. CSA perfoms job level accounting, as opposed to the more familiar process level accounting. As a result, CSA requires that a job container be made available on Linux. In addition, future work on resource limitation capabilities on clusters heralds the need for a job container that is usable across a cluster. Finally, having a job container available, specifically one that is capable of operating in a cluster environment, provides the means for improved application monitoring and control. This document provides a description of what needs to be done to implement the base kernel portion of the Linux Job container. Linux Jobs will be similar to jobs as implemented on IRIX 6.5.7. However, the initial work will focus on providing a job container for the CSA job accounting project on Linux. Work on resource limits for jobs has been deferred until the project is more mature. 2. Requirements To provide support for the job container on Linux, the following issues should be addressed: 1. Minimize the impact of jobs on the kernel, both in performance and changes to the kernel code. 2. Point of entry processes need to create new job containers and then be attached to the job. 3. Only processes with root privilege can create a new job or create a process that is not a member of a job. 4. When a process that is attached to a job forks, the child must also inherit the attachment to the job. 5. When a process that is attached to a job exits, need to be able to update the job container concerning the processes that is exiting. 6. Each job ID is unique on a system and the job container must be capable of being used within a cluster system. 2.1. Minimize impact on the kernel While addressing the first issue, it is also important that any changes to the Linux kernel be applicable beyond the current need of implementing a job container. Changes made to the kernel should be generalized so that the job container implementation can be modularized from the base kernel. In addition, it should be easy for other developers to extend this job container solution or provide other process containers that might benefit the community at large. Borrowing the process aggregate concept found in IRIX 6.5 and implementing that concept in the Linux kernel provides a generalized mechanism for providing process containers. The process aggregate or PAGG consists of a series of functions for registering and unregistering support for PAGG's with the kernel. This is similar to the support currently provided within Linux that allows for dynamic support of filesystems, block and character devices, symbol tables, network devices, serial devices, and execution domains. Implementation of the PAGG provides developers the basic hooks necessary to implement kernel modules for specific process containers, such as the job container. 2.2. Point of entry processes create containers PAM session modules provide the necessary hook for creating job containers at point of entry processes. A system call must be available to create the new container. 2.3 Only root processes create new jobs While executing the PAM modules, the point of entry process is running as uid 0. The system call for creating new container must check that the calling process is running as uid 0. Since this is only a requirement for the job container implementation, this check should be made in the job kernel module so that the system call can be generalized for any PAGG container. 2.4 Child processes inherit container attachment from parent The fork(2) system call will need to be altered. If a process is attached to any PAGG containers and that process forks, the child process should also be attached to the same PAGG containers. The PAGG containers should be updated to indicate that a new process has been attached. 2.5 Exiting processes update container on exit The exit notification function in the kernel wil need to be altered. If a process is attached to any PAGG containers and that process is exiting, the PAGG containers should be updated to indicate that a process has detached from the container. 2.6 Container usable on clustered systems The job ID any any single host must be unique. It is also necessary for the job to be used for multi-host, or cluster jobs. As a result, the job ID should be made unique within a cluster. This can be implemented wholly within the job kernel module and as a result requires not kernel modification. 3. Kernel Design This section will describe files and data structures that need to be modified to implement PAGGs. In addition, new files and data structures will also be introduced. 3.1. Modified Files The following files require changes to implement PAGGs: · Documentation/Configure.help · arch/i386/config.in · include/asm-i386/unistd.h · include/linux/sched.h · arch/i386/kernel/entry.S · kernel/Makefile · kernel/exit.c · kernel/fork.c · kernel/ksyms.c These changes only implement PAGGs for i386 architectures. When testing volunteers appear for other architectures, support will be added for those additional architectures. 3.2. New Files The following files will be added to implement PAGGs: · include/linux/pagg.h · kernel/pagg.c 3.3. Modified Data Structures The following existing data structures need to be altered to implement PAGGs: · struct task_struct: (include/linux/sched.h) struct pagg_task_s *pagg; /* List of pagg containers */ The new member in task_struct, pagg, points to a linked list of pagg_task_s structures. 3.4. New Data Structures The following new data structures will be introduced to implement PAGGs. The pagg_task_s structure will be · struct pagg_task_s: (include/linux/pagg.h) char *name; /* PAGG module name */ int (*attach)(struct task_struct *, /* Function to attach */ void *, struct pagg_task_s *); int (*detach)(struct task_struct *, /* Function to detach */ struct pagg_task_s *); void *data; /* Task specific data */ struct pagg_task_s *prev; /* Ptr to prev container */ struct pagg_task_s *next; /* Ptr to next container */ · struct pagg_module_s: (include/linux/pagg.h) char *name; /* PAGG module name */ int (*attach)(struct task_struct *, /* Function to attach */ void *, struct pagg_task_s *); int (*detach)(struct task_struct *, /* Function to detach */ struct pagg_task_s *); int (*init)(struct task_struct *, /* Load task init func. */ int (*do_paggctl)(int, void *); /* Funtion for paggctl */ void *data; /* Module specific data */ struct module *module; /* Ptr to PAGG module */ struct pagg_module_s *prev; /* Ptr to prev container */ struct pagg_module_s *next; /* Ptr to next container */ The pagg_task_s structure provides the process' reference to the PAGG containers provided by the modules. The attach function pointer is the function used to update the referenced PAGG container that the process is being attached. The detach function pointer is used to update the referenced PAGG container when the process is exiting or otherwise detaching from the container. The pagg_module_s structure provides the reference to the PAGG module that implements a type of PAGG container. In addition to the function pointers described concerning pagg_task_s, this structure provides two addition function pointers. The init function pointer is optional and is used to attach currently running processes to a default PAGG container. If the init function is not defined, then it is assumed that NULL represents the default PAGG container for that module. The do_paggctl function provides this modules interface for the paggctl system call. If paggctl is called using this modules name, this function will be used, passing it a request code and data pointer. The pagg_module_s structures will be stored in a simple hash table to provide quick table lookup capability for the paggctl system call. 3.5. Modified Functions The following functions require changed to implement PAGGs: · do_fork: (kernel/fork.c) /* execute the following pseudocode before add to run-queue */ If parent process pagg list is not empty Call attach_pagg function with child task_struct as argument · do_exit: (kernel/exit.c) /* execute the following pseudocode prior to schedule call */ If current process pagg list is not empty Call detach_pagg function with current task_struct 3.6 New Functions The following new functions will be added to implement PAGGs: · int register_pagg(struct pagg_module_s *); (kernel/pagg.c) Add module entry into table of pagg modules If module provides init function Foreach task Add initial pagg container as defined by module Else The default container is NULL · int unregister_pagg(struct pagg_module_s *); (kernel/pagg.c) Find module entry in table of pagg modules Foreach task Detach each task from containers provided by module · int attach_pagg(struct task_struct *); (kernel/pagg.c) /* Assumed task pagg list pts to paggs that it attaches to */ While another pagg container reference Make copy of pagg container reference & insert into new list Attach task to pagg container using new container reference Get next pagg container reference Make task pagg list use the new pagg list · int detach_pagg(struct task_struct *); (kernel/pagg.c) While another pagg container reference Detach task form pagg container using reference 3.7 New System Calls The following new system call will be added to implement a control interface for PAGG modules: · int sys_paggctl(const char *, int, void *); (kernel/pagg.c) If requested name is invalid Return -EINVAL If requested module name not found in pagg module table Return -ENOSYS If requested module does not provide do_paggctl function Return -ENOSYS Else Call pagg module do_paggctl function Return result The paggctl system call provides the necessary interface for controlling the function of the pagg container modules.