From erikj@subway.americas.sgi.com Thu Jan 6 01:51:43 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 06 Jan 2005 01:51:48 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j069pc8p003739 for ; Thu, 6 Jan 2005 01:51:42 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j06NGaEK019505 for ; Thu, 6 Jan 2005 15:16:36 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j06LoUCK5905423 for ; Thu, 6 Jan 2005 15:50:30 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j06LoUtC19563511 for ; Thu, 6 Jan 2005 15:50:30 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j06LoUB4009921 for ; Thu, 6 Jan 2005 15:50:30 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j06LoUUE009916 for ; Thu, 6 Jan 2005 15:50:30 -0600 (CST) Date: Thu, 6 Jan 2005 15:50:30 -0600 From: Erik Jacobson To: pagg@oss.sgi.com Subject: New PAGG patch for 2.6.10, new functionality Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 59 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg Hi there. I just uploaded a new PAGG patch for 2.6.10. It includes a request to slightly change how the attach function pointer of the PAGG hook is managed. Note that we may be posting another PAGG patch soon with some other changes. We now make it so the PAGG user can decide if a new process will actually be grouped or not by looking at the attach function pointer return value. The attach function, pointed to by the PAGG hook and run by pagg_attach, can have these return values: <0 Error which is propagated back to copy_process so the fork fails. =0 success, attach to same container as parent >0 success, but don't attach to a container It's also important to note that, as of now, if a negative value is returned by the attach function pointer, the value will be passed up through copy_process as a fork failure. Find the 'linux-2.6.10-pagg.patch' patch at the PAGG web site. http://oss.sgi.com/projects/pagg/ Click on "Download" on the left. Thank you. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From kaigai@ak.jp.nec.com Thu Jan 6 16:11:21 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 06 Jan 2005 16:11:27 -0800 (PST) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.214]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j070BKw9005272 for ; Thu, 6 Jan 2005 16:11:20 -0800 Received: from mailgate4.nec.co.jp (mailgate53.nec.co.jp [10.7.69.184]) by tyo201.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j07CAkY24205; Fri, 7 Jan 2005 21:10:46 +0900 (JST) Received: (from root@localhost) by mailgate4.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j07CAku29095; Fri, 7 Jan 2005 21:10:46 +0900 (JST) Received: from mailsv.bs1.fc.nec.co.jp (venus.hpc.bs1.fc.nec.co.jp [10.34.77.164]) by mailsv5.nec.co.jp (8.11.7/3.7W-MAILSV4-NEC) with ESMTP id j07CAj401273; Fri, 7 Jan 2005 21:10:45 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (IDENT:postfix@namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.bs1.fc.nec.co.jp (8.12.10/3.7W-HPC5.2F(mailsv)04081615) with ESMTP id j07C4QIK009217; Fri, 7 Jan 2005 21:04:30 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id 1EE9A30984; Fri, 7 Jan 2005 21:10:41 +0900 (JST) Message-ID: <41DE7C69.3030008@ak.jp.nec.com> Date: Fri, 07 Jan 2005 21:11:21 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Limin Gu Cc: linux-kernel@vger.kernel.org, Jan Engelhardt , holt@sgi.com, jeffrey.hundstad@mnsu.edu, schwab@suse.de, rusty@rustcorp.com.au, chrisw@osdl.org, pagg@oss.sgi.com Subject: Re: [RFC][PATCH] a revised job patch (with jobfs) References: <200412160006.iBG06Zj25577@dbear.engr.sgi.com> In-Reply-To: <200412160006.iBG06Zj25577@dbear.engr.sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 60 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg Hi, Limin Gu This JOB-fs approach is so ambitious, I think. I tried to apply your JOB-fs patch toward 2.6.9, then I noticed some promlems as follows. (1) The JOB-fs patch needs include/linux/jobctl.h and include/linux/job_acct.h. But these are contained in linux-2.6.9-job.patch, not JOB-fs patch. Since those patches conflict, we need to extract the jobctl.h and job_acct.h from linux-2.6.9-job.patch. (2) The return value of mkdir() under the /jids is strange. The directory of 'jids' has a mkdir() method implemented by jobfs_mkdir(). Since jobfs_mkdir() returns the result of job_create() transparently, my mkdir operations alwaly failed. ---------------- /* * job_create - create a new job and attache the calling process to it. * @jid: new job id * @user: job owner * @options: not used * * return 0 on job is DISABLE, -errno on failure, 1 on success */ ---------------- This is the description of job_create(). This returns 1 on success, but VFS layer recognize it as a failure. I modified this as follows: --- job.c 2005-01-06 20:03:47.000000000 +0900 +++ kaigai_job.c 2005-01-07 20:16:55.518703400 +0900 @@ -1505,5 +1505,5 @@ return -EINVAL; ret = job_create(jid, current->uid, 0); - return ret; + return (ret==1) ? 0 : ((ret==0) ? -EINVAL : ret); // Dirty? } (3) We can not make a JOB by using a /bin/mkdir command. When I execute '/bin/mkdir' on shell program, new process was fork()'ed and execve()'ed. This process calls mkdir() system-call and it create a JOB which contains only the self process. Then '/bin/mkdir' exits process, and the JOB created by '/bin/mkdir' contains no process. So, the JOB was destroied soon. For avoidance the problem, we need to 'create_job' command which calls mkdir() and execve('/bin/bash') in the one process. Or pagg+job framework need to allow the existance of the empty JOB. (4) "echo '123' > hid" fails by -EPERM. When we open the 'hid' with O_TRUNC flag, operation returns -EPERM. setattr() method of 'hid' was called on extention of sys_open(). * sys_open() -> filp_open() -> open_namei() -> may_open() -> do_truncate() (When O_TRUNC was appended) -> notify_change() -> inode's setattr() (It always returns -EPERM.) If we can't use 'echo', it's pretty inexpediency. And, would you have this discussion on PAGG-ML also ? Because LKML has huge traffic, I have not noticed job-fs for two weeks. orz Thanks. -- Linux Promotion Center, NEC KaiGai Kohei From limin@dbear.engr.sgi.com Thu Jan 6 22:05:42 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 06 Jan 2005 22:05:48 -0800 (PST) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0765eEY027390 for ; Thu, 6 Jan 2005 22:05:40 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j07I5WxT019881 for ; Fri, 7 Jan 2005 12:05:33 -0600 Received: from dbear.engr.sgi.com (dbear.engr.sgi.com [163.154.18.85]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id j07I5RtW6000809; Fri, 7 Jan 2005 10:05:27 -0800 (PST) Received: (from limin@localhost) by dbear.engr.sgi.com (8.11.0/8.11.0) id j07I5R428218; Fri, 7 Jan 2005 10:05:27 -0800 From: Limin Gu Message-Id: <200501071805.j07I5R428218@dbear.engr.sgi.com> Subject: Re: [RFC][PATCH] a revised job patch (with jobfs) To: kaigai@ak.jp.nec.com (Kaigai Kohei) Date: Fri, 7 Jan 2005 10:05:27 -0800 (PST) Cc: linux-kernel@vger.kernel.org, jengelh@linux01.gwdg.de (Jan Engelhardt), holt@sgi.com, jeffrey.hundstad@mnsu.edu, schwab@suse.de, rusty@rustcorp.com.au, chrisw@osdl.org, pagg@oss.sgi.com, limin@dbear.engr.sgi.com (Limin Gu) In-Reply-To: <41DE7C69.3030008@ak.jp.nec.com> from "Kaigai Kohei" at Jan 07, 2005 09:11:21 PM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 61 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: limin@dbear.engr.sgi.com Precedence: bulk X-list: pagg > > Hi, Limin Gu > > This JOB-fs approach is so ambitious, I think. > I tried to apply your JOB-fs patch toward 2.6.9, > then I noticed some promlems as follows. Hi KaiGai Kohei, Thank you for your interest in job. I am planning to post a slightly modified version of job patch against 2.6.10 pretty soon, and also the pointer to the job userland library and commands. > > (1) The JOB-fs patch needs include/linux/jobctl.h and include/linux/job_acct.h. > But these are contained in linux-2.6.9-job.patch, not JOB-fs patch. > Since those patches conflict, we need to extract the jobctl.h and job_acct.h > from linux-2.6.9-job.patch. The jobfs patch I posted last time should not need include/linux/jobctl.h and include/linux/job_acct.h, it only needed include/linux/job.h. I don't know why you had that problem. Your questions below 2-4 are all related to how to use job in the userland. We provide an extensive job library (libjob.so) for C codes, and serveral job commands for shell enviroment. User should use the library and commands instead of directly use /bin/mkdir and echo, the reason is that we want to maintain the job library and command the same as before and we want jobfs as simple as possible. > > (2) The return value of mkdir() under the /jids is strange. Yes, the return value is hacked to return the newly created jid. > The directory of 'jids' has a mkdir() method implemented by jobfs_mkdir(). > Since jobfs_mkdir() returns the result of job_create() transparently, > my mkdir operations alwaly failed. > ---------------- > /* > * job_create - create a new job and attache the calling process to it. > * @jid: new job id > * @user: job owner > * @options: not used > * > * return 0 on job is DISABLE, -errno on failure, 1 on success > */ > ---------------- > This is the description of job_create(). This returns 1 on success, > but VFS layer recognize it as a failure. > > I modified this as follows: > --- job.c 2005-01-06 20:03:47.000000000 +0900 > +++ kaigai_job.c 2005-01-07 20:16:55.518703400 +0900 > @@ -1505,5 +1505,5 @@ > return -EINVAL; > ret = job_create(jid, current->uid, 0); > - return ret; > + return (ret==1) ? 0 : ((ret==0) ? -EINVAL : ret); // Dirty? > } > > (3) We can not make a JOB by using a /bin/mkdir command. We provide job_create() library call for job creation. We also provide a library pam_job.so that allows job creation through PAM modules. For example, if add "account optional /lib/security/pam_job.so" line to /etc/pam.d/rlogin file, every rlogin will create a new job, and all the processes from that login are contained in the same job, unless somebody with proper permission decide to detach (processes or the job). > When I execute '/bin/mkdir' on shell program, new process was fork()'ed and execve()'ed. > This process calls mkdir() system-call and it create a JOB which contains only > the self process. > Then '/bin/mkdir' exits process, and the JOB created by '/bin/mkdir' contains no process. > So, the JOB was destroied soon. > For avoidance the problem, we need to 'create_job' command which calls mkdir() and > execve('/bin/bash') in the one process. > Or pagg+job framework need to allow the existance of the empty JOB. > > (4) "echo '123' > hid" fails by -EPERM. We have job_sethid() library call, and jsethid command avaible. > When we open the 'hid' with O_TRUNC flag, operation returns -EPERM. > setattr() method of 'hid' was called on extention of sys_open(). > * sys_open() -> filp_open() -> open_namei() -> may_open() > -> do_truncate() (When O_TRUNC was appended) > -> notify_change() > -> inode's setattr() (It always returns -EPERM.) > If we can't use 'echo', it's pretty inexpediency. > > And, would you have this discussion on PAGG-ML also ? > Because LKML has huge traffic, I have not noticed job-fs for two weeks. orz > Thanks. Good idea. Thanks! I am planning to post the new job patch(with jobfs implementation) and a new job userland rpm, i.e. the job library and commands that work with jobfs instead of the current ioctl calls, today on oss.sgi.com/projects/pagg. Let me know any problems. I appreciate your time. --Limin > -- > Linux Promotion Center, NEC > KaiGai Kohei > From erikj@subway.americas.sgi.com Fri Jan 7 02:14:19 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 07 Jan 2005 02:14:25 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j07AEJpW013995 for ; Fri, 7 Jan 2005 02:14:19 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j07NdPDr003445 for ; Fri, 7 Jan 2005 15:39:26 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j07MEBCK5976726 for ; Fri, 7 Jan 2005 16:14:11 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j07MEAtC19705778 for ; Fri, 7 Jan 2005 16:14:10 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j07MEAB4077360 for ; Fri, 7 Jan 2005 16:14:10 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j07MEAnP077398 for ; Fri, 7 Jan 2005 16:14:10 -0600 (CST) Date: Fri, 7 Jan 2005 16:14:10 -0600 From: Erik Jacobson To: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 62 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg I've posted another 2.6.10 PAGG patch on OSS. It's named linux-2.6.10-pagg.patch-2 This one implements some of the changes Kingsley Cheung requested. See BugZilla #382: http://oss.sgi.com/bugzilla/show_bug.cgi?id=382 One request we didn't act on was moving pagg_attach in copy_process to be near the bottom of the function. The last PAGG patch I sent out fixed a bug where the exit status of pagg_attach wasn't being handled within copy_process. As of yesterday, if pagg_attach fails, it will cause the fork to fail. This seems "right" to me. However, if we move it to the bottom of copy_process like requested, there isn't a good way to fail the fork because the task is too far along in its life and is probably too known to the system by then. Even if this weren't true, we'd need to add a new fork_cleanup section which is a bit invasive. If we hadn't changed the call to pagg_attach in copy_process to handle pagg_attach errors yesterday, it probably would have worked fine to move pagg_attach as requested. I'm open to suggestions on this. I'm going to send this patch to LKML along with Limin who is posting a new JOB patch soon. If we don't have any ideas on how to handle this here, we could try a broader audience. The current location of pagg_attach is right after sched_fork(p) in copy_process. The location requested is right above fork_out: Thank you. On Thu, 6 Jan 2005, Erik Jacobson wrote: > Hi there. > > I just uploaded a new PAGG patch for 2.6.10. It includes a request to > slightly change how the attach function pointer of the PAGG hook is > managed. > > Note that we may be posting another PAGG patch soon with some other > changes. > > We now make it so the PAGG user can decide if a new process will actually > be grouped or not by looking at the attach function pointer return > value. > > The attach function, pointed to by the PAGG hook and run by pagg_attach, > can have these return values: > > <0 Error which is propagated back to copy_process so > the fork fails. > > =0 success, attach to same container as parent > > >0 success, but don't attach to a container > > It's also important to note that, as of now, if a negative value is > returned by the attach function pointer, the value will be passed up > through copy_process as a fork failure. > > > Find the 'linux-2.6.10-pagg.patch' patch at the PAGG web site. > http://oss.sgi.com/projects/pagg/ > Click on "Download" on the left. > > Thank you. > > -- > Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota > -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From limin@dbear.engr.sgi.com Fri Jan 7 04:15:00 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 07 Jan 2005 04:15:11 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j07CExUg022156 for ; Fri, 7 Jan 2005 04:14:59 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j081e7sl026384 for ; Fri, 7 Jan 2005 17:40:07 -0800 Received: from dbear.engr.sgi.com (dbear.engr.sgi.com [163.154.18.85]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id j080EqtW6075818 for ; Fri, 7 Jan 2005 16:14:52 -0800 (PST) Received: (from limin@localhost) by dbear.engr.sgi.com (8.11.0/8.11.0) id j080EqD28799; Fri, 7 Jan 2005 16:14:52 -0800 From: Limin Gu Message-Id: <200501080014.j080EqD28799@dbear.engr.sgi.com> Subject: New job patch for 2.6.10. New implementations(jobfs)! To: pagg@oss.sgi.com Date: Fri, 7 Jan 2005 16:14:52 -0800 (PST) Cc: limin@dbear.engr.sgi.com (Limin Gu) X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 63 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: limin@dbear.engr.sgi.com Precedence: bulk X-list: pagg Hi there, I uploaded a new JOB patch for 2.6.10. This new job patch has a major change on the job kernel user communication interface. A new small virtual filesystem called jobfs is implemented to replace the binary ioctl call interface. Also I uploaded the job-1.5.0-0.2.i386.rpm and job-1.5.0-0.2.src.rpm, the job userland library and command package. Job library is the middle layer between job kernel module and job applications, it communicates with kernel through the jobfs virtual filesystem, and it provides the SAME interface to job applications through /usr/lib/libjob.so. So your current job applications should work without any change. You can get the new patch and new rpm at ftp://oss.sgi.com/projects/pagg/download/ linux-2.6.10-job.patch-1 should be applied after linux-2.6.10-pagg.patch (or linux-2.6.10-pagg.patch-2). Visit http://oss.sgi.com/projects/pagg/ for more information about PAGG and JOB. Please check them out and have fun, let me know if you have any problems. Thanks, Limin Gu - Linux System Software - Silicon Graphics From erikj@subway.americas.sgi.com Mon Jan 10 01:14:46 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 10 Jan 2005 01:14:54 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0A9EhHN009003 for ; Mon, 10 Jan 2005 01:14:46 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0AMeG0U025571 for ; Mon, 10 Jan 2005 14:40:16 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0ALDWCK6164784; Mon, 10 Jan 2005 15:13:33 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j0ALDTtC17552067; Mon, 10 Jan 2005 15:13:30 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j0ALDTB4234408; Mon, 10 Jan 2005 15:13:29 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j0ALDTLQ234407; Mon, 10 Jan 2005 15:13:29 -0600 (CST) Date: Mon, 10 Jan 2005 15:13:29 -0600 From: Erik Jacobson To: linux-kernel@vger.kernel.org cc: pagg@oss.sgi.com, limin@sgi.com Subject: [Patch] Process Aggregates (PAGG) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 64 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg Progress aggregates (PAGG) is a tool for building kernel modules that need to keep track of processes. Processes wishing to use PAGG simply register with PAGG. A structure is provided to the PAGG infrastructure during registration that tells PAGG which functions to run at various times in the life of a process (fork, exec, exit, etc). By default, a new child process will have the same PAGG associations as the parent. In this way, PAGG can be used to develop process containers. One example is Linux Job (to be posted separately). Another is CSA. For information on PAGG or Job, see this web page: http://oss.sgi.com/projects/pagg/ More information on CSA can be found here: http://oss.sgi.com/projects/csa/ Some recent changes to the patch: - Thanks to Kingsley Cheung, there is now improved handling for threads - If pagg_attach (called in copy_process) fails, it is propagated as a fork failure. Failures are no longer ignored. See the comments in pagg.h for the pagg_hook structure for details on how return values from the attach function pointer are interpreted. - The attach function pointer (referenced in the pagg_hook) now has the ability to tell the PAGG infrastructure that a given child should not necessarily inherit the parent's PAGG association. This allows more flexibility in how to group processes together. Thank you. Signed-off-by: Erik Jacobson --- Documentation/pagg.txt | 32 ++ fs/exec.c | 2 include/linux/init_task.h | 2 include/linux/pagg.h | 210 +++++++++++++++++++ include/linux/sched.h | 7 init/Kconfig | 8 kernel/Makefile | 1 kernel/exit.c | 4 kernel/fork.c | 14 + kernel/pagg.c | 491 ++++++++++++++++++++++++++++++++++++++++++++++ 10 files changed, 771 insertions(+) Index: linux/Documentation/pagg.txt =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/Documentation/pagg.txt 2005-01-07 10:34:45.192767969 -0600 @@ -0,0 +1,32 @@ +Linux Process Aggregates (PAGG) +------------------------------- + +The process aggregates infrastructure, or PAGG, provides a generalized +mechanism for providing arbitrary process groups in Linux. PAGG consists +of a series of functions for registering and unregistering support +for new types of process aggregation containers with the kernel. +This is similar to the support currently provided within Linux that +allows for dynamic support of filesystems, block and character devices, +symbol tables, network devices, serial devices, and execution domains. +This implementation of PAGG provides developers the basic hooks necessary +to implement kernel modules for specific process containers, such as +the job container. + +The do_fork function in the kernel was altered to support PAGG. If a +process is attached to any PAGG containers and subsequently forks a +child process, the child process will also be attached to the same PAGG +containers. The PAGG containers involved during the fork are notified +that a new process has been attached. The notification is accomplished +via a callback function provided by the PAGG module. + +The do_exit function in the kernel has also been altered. If a process +is attached to any PAGG containers and that process is exiting, the PAGG +containers are notified that a process has detached from the container. +The notification is accomplished via a callback function provided by +the PAGG module. + +The sys_execve function has been modified to support an optional callout +that can be run when a process in a pagg list does an exec. It can be +used, for example, by other kernel modules that wish to do advanced CPU +placement on multi-processor systems (just one example). + Index: linux/fs/exec.c =================================================================== --- linux.orig/fs/exec.c 2004-12-24 15:34:31.000000000 -0600 +++ linux/fs/exec.c 2005-01-07 10:34:45.200579646 -0600 @@ -47,6 +47,7 @@ #include #include #include +#include #include #include @@ -1153,6 +1154,7 @@ retval = search_binary_handler(bprm,regs); if (retval >= 0) { free_arg_pages(bprm); + pagg_exec(current); /* execve success */ security_bprm_free(bprm); Index: linux/include/linux/init_task.h =================================================================== --- linux.orig/include/linux/init_task.h 2004-12-24 15:33:52.000000000 -0600 +++ linux/include/linux/init_task.h 2005-01-07 10:34:45.212297161 -0600 @@ -2,6 +2,7 @@ #define _LINUX__INIT_TASK_H #include +#include #define INIT_FILES \ { \ @@ -112,6 +113,7 @@ .proc_lock = SPIN_LOCK_UNLOCKED, \ .switch_lock = SPIN_LOCK_UNLOCKED, \ .journal_info = NULL, \ + INIT_TASK_PAGG(tsk) \ } Index: linux/include/linux/pagg.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/include/linux/pagg.h 2005-01-07 10:34:45.223038217 -0600 @@ -0,0 +1,210 @@ +/* + * PAGG (Process Aggregates) interface + * + * + * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + * + * For further information regarding this notice, see: + * + * http://oss.sgi.com/projects/GenInfo/NoticeExplan + */ + +/* + * Data structure definitions and function prototypes used to implement + * process aggregates (paggs). + * + * Paggs provides a generalized way to implement process groupings or + * containers. Modules use these functions to register with the kernel as + * providers of process aggregation containers. The pagg data structures + * define the callback functions and data access pointers back into the + * pagg modules. + */ + +#ifndef _LINUX_PAGG_H +#define _LINUX_PAGG_H + +#include + +#ifdef CONFIG_PAGG + +#define PAGG_NAMELN 32 /* Max chars in PAGG module name */ + + +/** + * INIT_PAGG_LIST - used to initialize a pagg_list structure after declaration + * @_l: Task struct to init the pagg_list and semaphore in + * + */ +#define INIT_PAGG_LIST(_l) \ +do { \ + INIT_LIST_HEAD(&(_l)->pagg_list); \ + init_rwsem(&(_l)->pagg_sem); \ +} while(0) + + +/* + * Used by task_struct to manage list of pagg attachments for the process. + * Each pagg provides the link between the process and the + * correct pagg container. + * + * STRUCT MEMBERS: + * hook: Reference to pagg module structure. That struct + * holds the name key and function pointers. + * data: Opaque data pointer - defined by pagg modules. + * entry: List pointers + */ +struct pagg { + struct pagg_hook *hook; + void *data; + struct list_head entry; +}; + +/* + * Used by pagg modules to define the callback functions into the + * module. + * + * STRUCT MEMBERS: + * name: The name of the pagg container type provided by + * the module. This will be set by the pagg module. + * attach: Function pointer to function used when attaching + * a process to the pagg container referenced by + * this struct. + * Return codes from the attach function pointer have + * These meanings: + * <0 Error which is propagated back to copy_process so + * the fork fails. + * =0 success, attach to same container as parent + * >0 success, but don't attach to a container + * + * detach: Function pointer to function used when detaching + * a process to the pagg container referenced by + * this struct. + * init: Function pointer to initialization function. This + * function is used when the module is loaded to attach + * existing processes to a default container as defined by + * the pagg module. This is optional and may be set to + * NULL if it is not needed by the pagg module. + * data: Opaque data pointer - defined by pagg modules. + * module: Pointer to kernel module struct. Used to increment & + * decrement the use count for the module. + * entry: List pointers + * exec: Function pointer to function used when a process + * in the pagg container exec's a new process. This + * is optional and may be set to NULL if it is not + * needed by the pagg module. + * refcnt: Keep track of user count of the pagg hook + */ +struct pagg_hook { + struct module *module; + char *name; /* Name Key - restricted to 32 characters */ + void *data; /* Opaque module specific data */ + struct list_head entry; /* List pointers */ + atomic_t refcnt; /* usage counter */ + int (*init)(struct task_struct *, struct pagg *); + int (*attach)(struct task_struct *, struct pagg *, void*); + void (*detach)(struct task_struct *, struct pagg *); + void (*exec)(struct task_struct *, struct pagg *); +}; + + +/* Kernel service functions for providing PAGG support */ +extern struct pagg *pagg_get(struct task_struct *task, char *key); +extern struct pagg *pagg_alloc(struct task_struct *task, + struct pagg_hook *pt); +extern void pagg_free(struct pagg *pagg); +extern int pagg_hook_register(struct pagg_hook *pt_new); +extern int pagg_hook_unregister(struct pagg_hook *pt_old); +extern int __pagg_attach(struct task_struct *to_task, + struct task_struct *from_task); +extern void __pagg_detach(struct task_struct *task); +extern int __pagg_exec(struct task_struct *task); + +/** + * pagg_attach - child inherits attachment to pagg containers of its parent + * @child: child task - to inherit + * @parent: parenet task - child inherits pagg containers from this parent + * + * function used when a child process must inherit attachment to pagg + * containers from the parent. Return code is propagated as a fork fail. + * + */ +static inline int pagg_attach(struct task_struct *child, + struct task_struct *parent) +{ + INIT_PAGG_LIST(child); + if (!list_empty(&parent->pagg_list)) + return __pagg_attach(child, parent); + + return 0; +} + + +/** + * pagg_detach - Detach a process from a pagg container it is a member of + * @task: The task the pagg will be detached from + * + */ +static inline void pagg_detach(struct task_struct *task) +{ + if (!list_empty(&task->pagg_list)) + __pagg_detach(task); +} + +/** + * pagg_exec - Used when a process exec's + * @task: The process doing the exec + * + */ +static inline void pagg_exec(struct task_struct *task) +{ + if (!list_empty(&task->pagg_list)) + __pagg_exec(task); +} + +/** + * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list + * @tsk: The task work with + * + * Marco Used in INIT_TASK to set the head and sem of pagg_list. + * If CONFIG_PAGG is off, it is defined as an empty macro below. + * + */ +#define INIT_TASK_PAGG(tsk) \ + .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \ + .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem), + +#else /* CONFIG_PAGG */ + +/* + * Replacement macros used when PAGG (Process Aggregates) support is not + * compiled into the kernel. + */ +#define INIT_TASK_PAGG(tsk) +#define INIT_PAGG_LIST(l) do { } while(0) +#define pagg_attach(ct, pt) do { } while(0) +#define pagg_detach(t) do { } while(0) +#define pagg_exec(t) do { } while(0) + +#endif /* CONFIG_PAGG */ + +#endif /* _LINUX_PAGG_H */ Index: linux/include/linux/sched.h =================================================================== --- linux.orig/include/linux/sched.h 2004-12-24 15:33:59.000000000 -0600 +++ linux/include/linux/sched.h 2005-01-07 10:34:45.228896975 -0600 @@ -664,6 +664,13 @@ struct mempolicy *mempolicy; short il_next; /* could be shared with used_math */ #endif + +#ifdef CONFIG_PAGG +/* List of pagg (process aggregate) attachments */ + struct list_head pagg_list; + struct rw_semaphore pagg_sem; +#endif + }; static inline pid_t process_group(struct task_struct *tsk) Index: linux/init/Kconfig =================================================================== --- linux.orig/init/Kconfig 2004-12-24 15:35:24.000000000 -0600 +++ linux/init/Kconfig 2005-01-07 10:34:45.240614491 -0600 @@ -138,6 +138,14 @@ for processing it. A preliminary version of these tools is available at . +config PAGG + bool "Support for process aggregates (PAGGs)" + help + Say Y here if you will be loading modules which provide support + for process aggregate containers. Examples of such modules include the + Linux Jobs module and the Linux Array Sessions module. If you will not + be using such modules, say N. + config SYSCTL bool "Sysctl support" ---help--- Index: linux/kernel/Makefile =================================================================== --- linux.orig/kernel/Makefile 2004-12-24 15:34:26.000000000 -0600 +++ linux/kernel/Makefile 2005-01-07 10:34:45.253308466 -0600 @@ -18,6 +18,7 @@ obj-$(CONFIG_PM) += power/ obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o obj-$(CONFIG_COMPAT) += compat.o +obj-$(CONFIG_PAGG) += pagg.o obj-$(CONFIG_IKCONFIG) += configs.o obj-$(CONFIG_IKCONFIG_PROC) += configs.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o Index: linux/kernel/fork.c =================================================================== --- linux.orig/kernel/fork.c 2004-12-24 15:33:59.000000000 -0600 +++ linux/kernel/fork.c 2005-01-07 10:34:45.260143684 -0600 @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -128,6 +129,9 @@ init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2; init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2; + + /* Initialize the pagg list in pid 0 before it can clone itself. */ + INIT_PAGG_LIST(current); } static struct task_struct *dup_task_struct(struct task_struct *orig) @@ -941,6 +945,15 @@ sched_fork(p); /* + * call pagg modules to properly attach new process to the same + * process aggregate containers as the parent process. Fail the fork + * on error. + */ + retval = pagg_attach(p, current); + if (retval) + goto bad_fork_cleanup_namespace; + + /* * Ok, make it visible to the rest of the system. * We dont wake it up yet. */ @@ -1029,6 +1042,7 @@ return p; bad_fork_cleanup_namespace: + pagg_detach(p); exit_namespace(p); bad_fork_cleanup_keys: exit_keys(p); Index: linux/kernel/pagg.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/kernel/pagg.c 2005-01-07 14:45:41.883934931 -0600 @@ -0,0 +1,491 @@ +/* + * PAGG (Process Aggregates) interface + * + * + * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + */ + +#include +#include +#include +#include +#include +#include + +/* list of pagg hook entries that reference the "module" implementations */ +static LIST_HEAD(pagg_hook_list); +static DECLARE_RWSEM(pagg_hook_list_sem); + + +/** + * pagg_get - get a pagg given a search key + * @task: We examine the pagg_list from the given task + * @key: Key name of pagg we wish to retrieve + * + * Given a pagg_list list structure, this function will return + * a pointer to the pagg struct that matches the search + * key. If the key is not found, the function will return NULL. + * + * The caller should hold at least a read lock on the pagg_list + * for task using down_read(&task->pagg_list.sem). + * + */ +struct pagg * +pagg_get(struct task_struct *task, char *key) +{ + struct pagg *pagg; + + list_for_each_entry(pagg, &task->pagg_list, entry) { + if (!strcmp(pagg->hook->name,key)) + return pagg; + } + return NULL; +} + + +/** + * pagg_alloc - Insert a new pagg in to the pagg_list for a task + * @task: Task we want to insert the pagg in to + * @pagg_hook: Pagg hook to associate with the new pagg + * + * Given a task and a pagg hook, this function will allocate + * a new pagg structure, initialize the settings, and insert the pagg into + * the pagg_list for the task. + * + * The caller for this function should hold at least a read lock on the + * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be + * removed. If this function was called from the pagg module (usually the + * case), then the caller need not hold this lock. The caller should hold + * a write lock on for the tasks pagg_sem. This can be locked using + * down_write(&task->pagg_sem) + * + */ +struct pagg * +pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +{ + struct pagg *pagg; + + pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL); + if (!pagg) + return NULL; + + pagg->hook = pagg_hook; + pagg->data = NULL; + atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */ + list_add_tail(&pagg->entry, &task->pagg_list); + return pagg; +} + + +/** + * pagg_free - Delete pagg from the list and free its memory + * @pagg: The pagg to free + * + * This function will ensure the pagg is deleted form + * the list of pagg entries for the task. Finally, the memory for the + * pagg is discarded. + * + * The caller of this function should hold a write lock on the pagg_sem + * for the task. This can be locked using down_write(&task->pagg_sem). + * + * Prior to calling pagg_free, the pagg should have been detached from the + * pagg container represented by this pagg. That is usually done using + * p->hook->detach(task, pagg); + * + */ +void +pagg_free(struct pagg *pagg) +{ + atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ + list_del(&pagg->entry); + kfree(pagg); +} + + +/** + * get_pagg_hook - Get the pagg hook matching the requested name + * @key: The name of the pagg hook to get + * + * Given a pagg hook name key, this function will return a pointer + * to the pagg_hook struct that matches the name. + * + * You should hold either the write or read lock for pagg_hook_list_sem + * before using this function. This will ensure that the pagg_hook_list + * does not change while iterating through the list entries. + * + */ +static struct pagg_hook * +get_pagg_hook(char *key) +{ + struct pagg_hook *pagg_hook; + + list_for_each_entry(pagg_hook, &pagg_hook_list, entry) { + if (!strcmp(pagg_hook->name, key)) { + return pagg_hook; + } + } + return NULL; +} + +/** + * remove_client_paggs_from_all_tasks - Remove all paggs associated with hook + * @php: Pagg hook associated with paggs to purge + * + * Given a pagg hook, this function will remove all paggs associated with that + * pagg hook from all tasks calling the provided function on each pagg. + * + * If there is a detach function associated with the pagg, it is called + * before the pagg is freed. + * + * This is meant to be used by pagg_hook_register and pagg_hook_unregister + * + */ +static void +remove_client_paggs_from_all_tasks(struct pagg_hook *php) +{ + if (php == NULL) + return; + + /* Because of internal race conditions we can't gaurantee + * getting every task in just one pass so we just keep going + * until there are no tasks with paggs from this hook attached. + * The inefficiency of this should be tempered by the fact that this + * happens at most once for each registered client. + */ + while (atomic_read(&php->refcnt) != 0) { + struct task_struct *g = NULL, *p = NULL; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + struct pagg *paggp; + int task_exited; + + get_task_struct(p); + read_unlock(&tasklist_lock); + down_write(&p->pagg_sem); + paggp = pagg_get(p, php->name); + if (paggp != NULL) { + (void)php->detach(p, paggp); + pagg_free(paggp); + } + up_write(&p->pagg_sem); + read_lock(&tasklist_lock); + + /* If a PAGG got removed from the list while we're going through + * each process, the tasks list for the process would be empty. In + * that case, break out of this for_each_thread so we can do it + * again. */ + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (task_exited) + goto endloop; + } while_each_thread(g, p); + endloop: + read_unlock(&tasklist_lock); + } +} + +/** + * pagg_hook_register - Register a new pagg hook and enter it the list + * @pagg_hook_new: The new pagg hook to register + * + * Used to register a new pagg hook and enter it into the pagg_hook_list. + * The service name for a pagg hook is restricted to 32 characters. + * + * If an "init()" function is supplied in the hook being registered then a + * pagg will be attached to all existing tasks and the supplied "init()" + * function will be applied to it. If any call to the supplied "init()" + * function returns a non zero result the registration will be aborted. As + * part of the abort process, all paggs belonging to the new client will be + * removed from all tasks and the supplied "detach()" function will be + * called on them. + * + * If a memory error is encountered, the pagg hook is unregistered and any + * tasks that have been attached to the initial pagg container are detached + * from that container. + * + */ +int +pagg_hook_register(struct pagg_hook *pagg_hook_new) +{ + struct pagg_hook *pagg_hook = NULL; + + /* Add new pagg module to access list */ + if (!pagg_hook_new) + return -EINVAL; /* error */ + if (!list_empty(&pagg_hook_new->entry)) + return -EINVAL; /* error */ + if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN) + return -EINVAL; /* error */ + if (!pagg_hook_new->attach || !pagg_hook_new->detach) + return -EINVAL; /* error */ + + /* Try to insert new hook entry into the pagg hook list */ + down_write(&pagg_hook_list_sem); + + pagg_hook = get_pagg_hook(pagg_hook_new->name); + + if (pagg_hook) { + up_write(&pagg_hook_list_sem); + printk(KERN_WARNING "Attempt to register duplicate" + " PAGG support (name=%s)\n", pagg_hook_new->name); + return -EBUSY; + } + + /* Okay, we can insert into the pagg hook list */ + list_add_tail(&pagg_hook_new->entry, &pagg_hook_list); + /* set the ref count to zero */ + atomic_set(&pagg_hook_new->refcnt, 0); + + /* Now we can call the initializer function (if present) for each task */ + if (pagg_hook_new->init != NULL) { + struct task_struct *g = NULL, *p = NULL; + int init_result = 0; + + /* Because of internal race conditions we can't guarantee + * getting every task in just one pass so we just keep going + * until we don't find any unitialized tasks. The inefficiency + * of this should be tempered by the fact that this happens + * at most once for each registered client. + */ + read_lock(&tasklist_lock); + repeat: + do_each_thread(g, p) { + struct pagg *paggp; + int task_exited; + + get_task_struct(p); + read_unlock(&tasklist_lock); + down_write(&p->pagg_sem); + paggp = pagg_get(p, pagg_hook_new->name); + if (!paggp && !(p->flags & PF_EXITING)) { + paggp = pagg_alloc(p, pagg_hook_new); + if (paggp != NULL) + init_result = pagg_hook_new->init(p, paggp); + else + init_result = -ENOMEM; + } + up_write(&p->pagg_sem); + read_lock(&tasklist_lock); + /* Like in remove_client_paggs_from_all_tasks, if the task + * disappeared on us while we were going through the + * for_each_thread loop, we need to start over with that loop. + * That's why we have the list_empty here */ + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (init_result != 0) + goto endloop; + if (task_exited) + goto repeat; + } while_each_thread(g, p); + endloop: + read_unlock(&tasklist_lock); + + /* + * if anything went wrong during initialisation abandon the + * registration process + */ + if (init_result != 0) { + remove_client_paggs_from_all_tasks(pagg_hook_new); + list_del_init(&pagg_hook_new->entry); + up_write(&pagg_hook_list_sem); + + printk(KERN_WARNING "Registering PAGG support for" + " (name=%s) failed\n", pagg_hook_new->name); + + return init_result; /* hook init function error result */ + } + } + + up_write(&pagg_hook_list_sem); + + printk(KERN_INFO "Registering PAGG support for (name=%s)\n", + pagg_hook_new->name); + + return 0; /* success */ + +} + +/** + * pagg_hook_unregister - Unregister pagg hook and remove it from the list + * @pagg_hook_old: The hook to unregister and remove + * + * Used to unregister pagg hooks and remove them from the pagg_hook_list. + * Once the pagg hook entry in the pagg_hook_list is found, paggs associated + * with the hook (if any) will have their detach function called and will + * be detached. + * + */ +int +pagg_hook_unregister(struct pagg_hook *pagg_hook_old) +{ + struct pagg_hook *pagg_hook; + + /* Check the validity of the arguments */ + if (!pagg_hook_old) + return -EINVAL; /* error */ + if (list_empty(&pagg_hook_old->entry)) + return -EINVAL; /* error */ + if (pagg_hook_old->name == NULL) + return -EINVAL; /* error */ + + down_write(&pagg_hook_list_sem); + + pagg_hook = get_pagg_hook(pagg_hook_old->name); + + if (pagg_hook && pagg_hook == pagg_hook_old) { + remove_client_paggs_from_all_tasks(pagg_hook); + list_del_init(&pagg_hook->entry); + up_write(&pagg_hook_list_sem); + + printk(KERN_INFO "Unregistering PAGG support for" + " (name=%s)\n", pagg_hook_old->name); + + return 0; /* success */ + } + + up_write(&pagg_hook_list_sem); + + printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)" + " failed - not found\n", pagg_hook_old->name); + + return -EINVAL; /* error */ +} + + +/** + * __pagg_attach - Attach a new task to the same containers of its parent + * @to_task: The child task that will inherit the parent's containers + * @from_task: The parent task + * + * Used to attach a new task to the same pagg containers to which it's parent + * is attached. + * + * The "from" argument is the parent task. The "to" argument is the child + * task. + * + * See the attach decription in linux/include/linux/pagg.h for details on + * how to handle return codes from the attach function pointer. + * + */ +int +__pagg_attach(struct task_struct *to_task, struct task_struct *from_task) +{ + struct pagg *from_pagg; + int ret; + + /* lock the parents pagg_list we are copying from */ + down_read(&from_task->pagg_sem); /* read lock the pagg list */ + + list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { + struct pagg *to_pagg = NULL; + + to_pagg = pagg_alloc(to_task, from_pagg->hook); + if (!to_pagg) { + ret=-ENOMEM; + goto error_return; + } + ret = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data); + + if (ret < 0) { + /* Propagates to copy_process as a fork failure */ + goto error_return; + } + else if (ret > 0) { + /* Success, but attach function pointer doesn't want grouping */ + pagg_free(to_pagg); + } + } + + up_read(&from_task->pagg_sem); /* unlock the pagg list */ + + return 0; /* success */ + + error_return: + /* + * Clean up all the pagg attachments made on behalf of the new + * task. Set new task pagg ptr to NULL for return. + */ + up_read(&from_task->pagg_sem); /* unlock the pagg list */ + __pagg_detach(to_task); + return ret; /* failure */ +} + +/** + * __pagg_detach - Detach a task from all pagg containers it is attached to + * @task: Task to detach from pagg containers + * + * Used to detach a task from all pagg containers to which it is attached. + * + */ +void +__pagg_detach(struct task_struct *task) +{ + struct pagg *pagg; + struct pagg *paggtmp; + + /* Remove ref. to paggs from task immediately */ + down_write(&task->pagg_sem); /* write lock pagg list */ + + list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) { + pagg->hook->detach(task, pagg); + pagg_free(pagg); + } + + up_write(&task->pagg_sem); /* write unlock the pagg list */ + + return; /* 0 = success, else return last code for failure */ +} + + +/** + * __pagg_exec - Execute callback when a process in a container execs + * @task: We go through the pagg list in the given task + * + * Used to when a process that is in a pagg container does an exec. + * + * The "from" argument is the task. The "name" argument is the name + * of the process being exec'ed. + * + */ +int +__pagg_exec(struct task_struct *task) +{ + struct pagg *pagg; + + down_read(&task->pagg_sem); /* lock the pagg list */ + + list_for_each_entry(pagg, &task->pagg_list, entry) { + if (pagg->hook->exec) /* conditional because it's optional */ + pagg->hook->exec(task, pagg); + } + + up_read(&task->pagg_sem); /* unlock the pagg list */ + return 0; +} + + +EXPORT_SYMBOL(pagg_get); +EXPORT_SYMBOL(pagg_alloc); +EXPORT_SYMBOL(pagg_free); +EXPORT_SYMBOL(pagg_hook_register); +EXPORT_SYMBOL(pagg_hook_unregister); Index: linux/kernel/exit.c =================================================================== --- linux.orig/kernel/exit.c 2004-12-24 15:35:27.000000000 -0600 +++ linux/kernel/exit.c 2005-01-07 10:34:45.275767038 -0600 @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -826,6 +827,9 @@ module_put(tsk->binfmt->module); tsk->exit_code = code; + + pagg_detach(tsk); + exit_notify(tsk); #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From kingsley@sw.oz.au Mon Jan 10 03:40:27 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 10 Jan 2005 03:40:31 -0800 (PST) Received: from smtp.sw.oz.au (IDENT:FWUSER@mail.aurema.com [203.31.96.1]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0ABePfl020036 for ; Mon, 10 Jan 2005 03:40:26 -0800 Received: from kingsley.sw.oz.au (kingsley.sw.oz.au [192.41.203.97]) by smtp.sw.oz.au with ESMTP id j0ANbqTo023479; Tue, 11 Jan 2005 10:37:52 +1100 (EST) Received: from kingsley.sw.oz.au (localhost.localdomain [127.0.0.1]) by kingsley.sw.oz.au (8.13.1/8.12.10) with ESMTP id j0ANbqFc002081; Tue, 11 Jan 2005 10:37:52 +1100 Received: (from kingsley@localhost) by kingsley.sw.oz.au (8.13.1/8.13.1/Submit) id j0ANbpWS002080; Tue, 11 Jan 2005 10:37:51 +1100 Date: Tue, 11 Jan 2005 10:37:50 +1100 From: Kingsley Cheung To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality Message-ID: <20050110233750.GC26466@aurema.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-Scanned-By: MIMEDefang 2.48 on 192.41.203.35 X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 65 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg On Thu, Jan 06, 2005 at 03:50:30PM -0600, Erik Jacobson wrote: > Hi there. > > I just uploaded a new PAGG patch for 2.6.10. It includes a request to > slightly change how the attach function pointer of the PAGG hook is > managed. > > Note that we may be posting another PAGG patch soon with some other > changes. > > We now make it so the PAGG user can decide if a new process will actually > be grouped or not by looking at the attach function pointer return > value. > > The attach function, pointed to by the PAGG hook and run by pagg_attach, > can have these return values: > > <0 Error which is propagated back to copy_process so > the fork fails. > > =0 success, attach to same container as parent > > >0 success, but don't attach to a container > > It's also important to note that, as of now, if a negative value is > returned by the attach function pointer, the value will be passed up > through copy_process as a fork failure. Eric, One thought has come to mind. Was there a reason why similar semantics weren't applied to pagg_init? I would have thought it would make things consistent with pagg_attach. With error returns like: <0 Error which is propagated back to copy_process so the registration function fails completely. =0 success, attach to same container as parent >0 success, but don't attach to a container That way processes can be ignored by pagg_init just as they can be by pagg_attach. Thanks, -- Kingsley From erikj@subway.americas.sgi.com Mon Jan 10 19:12:37 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 10 Jan 2005 19:12:42 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0B3Cara023722 for ; Mon, 10 Jan 2005 19:12:37 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0BGcGlS004691 for ; Tue, 11 Jan 2005 08:38:16 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0BFBSCK6215496; Tue, 11 Jan 2005 09:11:28 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j0BFBRtC19909940; Tue, 11 Jan 2005 09:11:28 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j0BFBRB4273548; Tue, 11 Jan 2005 09:11:27 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j0BFBPLg273561; Tue, 11 Jan 2005 09:11:27 -0600 (CST) Date: Tue, 11 Jan 2005 09:11:25 -0600 From: Erik Jacobson To: Kingsley Cheung cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality In-Reply-To: <20050110233750.GC26466@aurema.com> Message-ID: References: <20050110233750.GC26466@aurema.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 66 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg Just to make sure I understand -- You'd use this in case you want to be notified about each task but you don't necessarily want paggs allocated for each task as you have no desire to group them. Is that right? If so, I just didn't think of that use. I thought people who wanted the init function would also want pagg associations. Could a person who wants notification of all processes use something like for_each_process/for_each_task instead? So is the reason you suggest this just to be consistent like you said in your note? Or is there a use you had in mind that I didn't think of? One thing I'd say is that it isn't possible to be fully consistent with attach anyway. If the init function pointer fails, it isn't like we can propegate the error to fork like we do in attach. And I doubt we'd want to kill the running task :) Let me know your thoughts on this... On Tue, 11 Jan 2005, Kingsley Cheung wrote: > On Thu, Jan 06, 2005 at 03:50:30PM -0600, Erik Jacobson wrote: > > Hi there. > > > > I just uploaded a new PAGG patch for 2.6.10. It includes a request to > > slightly change how the attach function pointer of the PAGG hook is > > managed. > > > > Note that we may be posting another PAGG patch soon with some other > > changes. > > > > We now make it so the PAGG user can decide if a new process will actually > > be grouped or not by looking at the attach function pointer return > > value. > > > > The attach function, pointed to by the PAGG hook and run by pagg_attach, > > can have these return values: > > > > <0 Error which is propagated back to copy_process so > > the fork fails. > > > > =0 success, attach to same container as parent > > > > >0 success, but don't attach to a container > > > > It's also important to note that, as of now, if a negative value is > > returned by the attach function pointer, the value will be passed up > > through copy_process as a fork failure. > > Eric, > > One thought has come to mind. Was there a reason why similar > semantics weren't applied to pagg_init? I would have thought it would > make things consistent with pagg_attach. With error returns like: > > <0 Error which is propagated back to copy_process so > the registration function fails completely. > > =0 success, attach to same container as parent > > >0 success, but don't attach to a container > > That way processes can be ignored by pagg_init just as they can be by > pagg_attach. > > Thanks, > -- > Kingsley > -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From kingsley@aurema.com Tue Jan 11 14:37:23 2005 Received: with ECARTIS (v1.0.0; list pagg); Tue, 11 Jan 2005 14:37:26 -0800 (PST) Received: from smtp.sw.oz.au (IDENT:FWUSER@alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0BMbLWK007614 for ; Tue, 11 Jan 2005 14:37:22 -0800 Received: from smtp.sw.oz.au (localhost [127.0.0.1]) by smtp.sw.oz.au with ESMTP id j0BMYPTo026627; Wed, 12 Jan 2005 09:34:25 +1100 (EST) Received: (from kingsley@localhost) by smtp.sw.oz.au id j0BMYOvQ026626; Wed, 12 Jan 2005 09:34:24 +1100 (EST) Date: Wed, 12 Jan 2005 09:34:24 +1100 From: kingsley@aurema.com To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality Message-ID: <20050111223424.GA14765@aurema.com> References: <20050110233750.GC26466@aurema.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Scanned-By: MIMEDefang 2.48 on 192.41.203.35 X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 67 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg On Tue, Jan 11, 2005 at 09:11:25AM -0600, Erik Jacobson wrote: > Just to make sure I understand -- > > You'd use this in case you want to be notified about each task but you > don't necessarily want paggs allocated for each task as you have no > desire to group them. Is that right? Yes. > > If so, I just didn't think of that use. I thought people who wanted the init > function would also want pagg associations. Could a person who wants > notification of all processes use something like > for_each_process/for_each_task instead? I see. (I'm not sure what you mean by using for_each_process/for_each_task for all processes though). I had the impression that the only difference between init and attach was that init served to catch all existing tasks and attach all tasks during a fork. As for skipping pagg associations, my thought was that there might be users who would choose to skip a task simply because that task did not interest him based on some set criteria. Some of the existing tasks in the system at the time of the user's registration could fall under the category of being skipped. For example, "do not do a pagg association for all real time tasks in both init and attach". If only attach allowed for associations to be skipped then users would not be able to apply their criteria for existing tasks. > > So is the reason you suggest this just to be consistent like you said in > your note? Or is there a use you had in mind that I didn't think of? I'm not sure - perhaps I wasn't clear enough earlier. Does the above make my reasoning clearer? > > One thing I'd say is that it isn't possible to be fully consistent with > attach anyway. If the init function pointer fails, it isn't like we can > propegate the error to fork like we do in attach. And I doubt we'd want to > kill the running task :) Absolutely ;) Full consistency isn't possible. I was only thinking about skipping pagg associations. A failure during init would have to be different to a failure in attach. Hence the explanation of what happens with a <0 error return: the registration function fails but the task is not killed. <0 Error which is propagated back to copy_process so the registration function fails completely. =0 success, attach to same container as parent >0 success, but don't attach to a container > > Let me know your thoughts on this... Thanks for listening, -- Kingsley From erikj@subway.americas.sgi.com Wed Jan 12 07:28:28 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 12 Jan 2005 07:28:35 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0CFSRsZ028989 for ; Wed, 12 Jan 2005 07:28:27 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0CGsNh4016296 for ; Wed, 12 Jan 2005 08:54:23 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0CFRQCK6284990; Wed, 12 Jan 2005 09:27:26 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j0CFRQtC17670241; Wed, 12 Jan 2005 09:27:26 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j0CFROB4326799; Wed, 12 Jan 2005 09:27:24 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j0CFRNKU328140; Wed, 12 Jan 2005 09:27:24 -0600 (CST) Date: Wed, 12 Jan 2005 09:27:23 -0600 From: Erik Jacobson To: kingsley@aurema.com cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality In-Reply-To: <20050111223424.GA14765@aurema.com> Message-ID: References: <20050110233750.GC26466@aurema.com> <20050111223424.GA14765@aurema.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 68 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg Sounds worth doing. I'll have a new patch tomorrow (or I hope to). Thanks! On Wed, 12 Jan 2005 kingsley@aurema.com wrote: > On Tue, Jan 11, 2005 at 09:11:25AM -0600, Erik Jacobson wrote: > > Just to make sure I understand -- > > > > You'd use this in case you want to be notified about each task but you > > don't necessarily want paggs allocated for each task as you have no > > desire to group them. Is that right? > > Yes. > > > > > If so, I just didn't think of that use. I thought people who wanted the init > > function would also want pagg associations. Could a person who wants > > notification of all processes use something like > > for_each_process/for_each_task instead? > > I see. (I'm not sure what you mean by using > for_each_process/for_each_task for all processes though). > > I had the impression that the only difference between init and attach > was that init served to catch all existing tasks and attach all tasks > during a fork. > > As for skipping pagg associations, my thought was that there might be > users who would choose to skip a task simply because that task did not > interest him based on some set criteria. Some of the existing tasks > in the system at the time of the user's registration could fall under > the category of being skipped. For example, "do not do a pagg > association for all real time tasks in both init and attach". If only > attach allowed for associations to be skipped then users would not be > able to apply their criteria for existing tasks. > > > > > So is the reason you suggest this just to be consistent like you said in > > your note? Or is there a use you had in mind that I didn't think of? > > I'm not sure - perhaps I wasn't clear enough earlier. Does the above > make my reasoning clearer? > > > > > One thing I'd say is that it isn't possible to be fully consistent with > > attach anyway. If the init function pointer fails, it isn't like we can > > propegate the error to fork like we do in attach. And I doubt we'd want to > > kill the running task :) > > Absolutely ;) Full consistency isn't possible. I was only thinking > about skipping pagg associations. A failure during init would have to > be different to a failure in attach. Hence the explanation of what > happens with a <0 error return: the registration function fails but > the task is not killed. > > <0 Error which is propagated back to copy_process so > the registration function fails completely. > > =0 success, attach to same container as parent > > >0 success, but don't attach to a container > > > > > Let me know your thoughts on this... > > Thanks for listening, > -- > Kingsley > -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From erikj@subway.americas.sgi.com Thu Jan 13 06:44:46 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 13 Jan 2005 06:44:52 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0DEikLq017770 for ; Thu, 13 Jan 2005 06:44:46 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0DGAoth000547 for ; Thu, 13 Jan 2005 08:10:50 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0DEijCK6355255; Thu, 13 Jan 2005 08:44:45 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j0DEijtC19999320; Thu, 13 Jan 2005 08:44:45 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j0DEiiB4411960; Thu, 13 Jan 2005 08:44:44 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j0DEih23411834; Thu, 13 Jan 2005 08:44:44 -0600 (CST) Date: Thu, 13 Jan 2005 08:44:43 -0600 From: Erik Jacobson To: kingsley@aurema.com cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality In-Reply-To: Message-ID: References: <20050110233750.GC26466@aurema.com> <20050111223424.GA14765@aurema.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 69 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg I copied linux-2.6.10-pagg.patch-3 to the ftp site just now. This implements what you requested. http://oss.sgi.com/projects/pagg/ Click 'Download' on the left. Thanks! On Wed, 12 Jan 2005, Erik Jacobson wrote: > Sounds worth doing. I'll have a new patch tomorrow (or I hope to). > > Thanks! > > On Wed, 12 Jan 2005 kingsley@aurema.com wrote: > > > On Tue, Jan 11, 2005 at 09:11:25AM -0600, Erik Jacobson wrote: > > > Just to make sure I understand -- > > > > > > You'd use this in case you want to be notified about each task but you > > > don't necessarily want paggs allocated for each task as you have no > > > desire to group them. Is that right? > > > > Yes. > > > > > > > > If so, I just didn't think of that use. I thought people who wanted the init > > > function would also want pagg associations. Could a person who wants > > > notification of all processes use something like > > > for_each_process/for_each_task instead? > > > > I see. (I'm not sure what you mean by using > > for_each_process/for_each_task for all processes though). > > > > I had the impression that the only difference between init and attach > > was that init served to catch all existing tasks and attach all tasks > > during a fork. > > > > As for skipping pagg associations, my thought was that there might be > > users who would choose to skip a task simply because that task did not > > interest him based on some set criteria. Some of the existing tasks > > in the system at the time of the user's registration could fall under > > the category of being skipped. For example, "do not do a pagg > > association for all real time tasks in both init and attach". If only > > attach allowed for associations to be skipped then users would not be > > able to apply their criteria for existing tasks. > > > > > > > > So is the reason you suggest this just to be consistent like you said in > > > your note? Or is there a use you had in mind that I didn't think of? > > > > I'm not sure - perhaps I wasn't clear enough earlier. Does the above > > make my reasoning clearer? > > > > > > > > One thing I'd say is that it isn't possible to be fully consistent with > > > attach anyway. If the init function pointer fails, it isn't like we can > > > propegate the error to fork like we do in attach. And I doubt we'd want to > > > kill the running task :) > > > > Absolutely ;) Full consistency isn't possible. I was only thinking > > about skipping pagg associations. A failure during init would have to > > be different to a failure in attach. Hence the explanation of what > > happens with a <0 error return: the registration function fails but > > the task is not killed. > > > > <0 Error which is propagated back to copy_process so > > the registration function fails completely. > > > > =0 success, attach to same container as parent > > > > >0 success, but don't attach to a container > > > > > > > > Let me know your thoughts on this... > > > > Thanks for listening, > > -- > > Kingsley > > > > -- > Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota > -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From kingsley@aurema.com Thu Jan 13 20:37:05 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 13 Jan 2005 20:37:09 -0800 (PST) Received: from smtp.sw.oz.au (IDENT:FWUSER@alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0E4b3qg012645 for ; Thu, 13 Jan 2005 20:37:04 -0800 Received: from smtp.sw.oz.au (localhost [127.0.0.1]) by smtp.sw.oz.au with ESMTP id j0E4XJTo010928; Fri, 14 Jan 2005 15:33:19 +1100 (EST) Received: (from kingsley@localhost) by smtp.sw.oz.au id j0E4XHbY010924; Fri, 14 Jan 2005 15:33:17 +1100 (EST) Date: Fri, 14 Jan 2005 15:33:16 +1100 From: kingsley@aurema.com To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality Message-ID: <20050114043316.GA8955@aurema.com> References: <20050110233750.GC26466@aurema.com> <20050111223424.GA14765@aurema.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Scanned-By: MIMEDefang 2.48 on 192.41.203.35 X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 70 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg On Thu, Jan 13, 2005 at 08:44:43AM -0600, Erik Jacobson wrote: > I copied linux-2.6.10-pagg.patch-3 to the ftp site just now. This > implements what you requested. > > http://oss.sgi.com/projects/pagg/ > Click 'Download' on the left. > > Thanks! Eric, It looks good. Much appreciated and many thanks! -- Kingsley From kingsley@aurema.com Thu Jan 13 23:37:42 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 13 Jan 2005 23:37:47 -0800 (PST) Received: from smtp.sw.oz.au (IDENT:FWUSER@alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0E7bfuf022455 for ; Thu, 13 Jan 2005 23:37:41 -0800 Received: from smtp.sw.oz.au (localhost [127.0.0.1]) by smtp.sw.oz.au with ESMTP id j0E7X2To016533; Fri, 14 Jan 2005 18:33:02 +1100 (EST) Received: (from kingsley@localhost) by smtp.sw.oz.au id j0E7X19i016503; Fri, 14 Jan 2005 18:33:01 +1100 (EST) Date: Fri, 14 Jan 2005 18:33:01 +1100 From: kingsley@aurema.com To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality Message-ID: <20050114073301.GA15596@aurema.com> References: <20050110233750.GC26466@aurema.com> <20050111223424.GA14765@aurema.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Scanned-By: MIMEDefang 2.48 on 192.41.203.35 X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 71 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg On Thu, Jan 13, 2005 at 08:44:43AM -0600, Erik Jacobson wrote: > I copied linux-2.6.10-pagg.patch-3 to the ftp site just now. This > implements what you requested. > > http://oss.sgi.com/projects/pagg/ > Click 'Download' on the left. > > Thanks! Eric, I've noticed one minor issue with the implementation for skipping pagg associations during pagg_init. If the register function finds that a task was taken off the task list during registration it traverses the list from the beginning. Tasks that were skipped would therefore be looked at again. Still, it's not a big issue. I suppose clients should be able to handle looking at skipped tasks a few times. Thanks, -- Kingsley From erikj@subway.americas.sgi.com Tue Jan 18 12:42:22 2005 Received: with ECARTIS (v1.0.0; list pagg); Tue, 18 Jan 2005 12:42:35 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0IKgKlB020819 for ; Tue, 18 Jan 2005 12:42:22 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0IM9BNL024670 for ; Tue, 18 Jan 2005 14:09:11 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0IKfJF3242591 for ; Tue, 18 Jan 2005 14:41:19 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j0IKfJtC20281718 for ; Tue, 18 Jan 2005 14:41:19 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j0IKfJB4628246 for ; Tue, 18 Jan 2005 14:41:19 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j0IKfIkS628250 for ; Tue, 18 Jan 2005 14:41:18 -0600 (CST) Date: Tue, 18 Jan 2005 14:41:18 -0600 From: Erik Jacobson To: pagg@oss.sgi.com Subject: PAGG in Open Source projects? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 72 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg Are any of you using PAGG in open source projects? One of the reasons PAGG has had trouble being accepted is because we can't point to enough open source users. Here at SGI, we have a few different open source packages making use of it. However, only one PAGG user so far has gone through community review (Job). We think we might be able to improve our case for including PAGG in the kernel if other open source projects are using PAGG. Please send me a note if you have something we can describe as a user of PAGG. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From erikj@subway.americas.sgi.com Wed Jan 19 08:25:48 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 19 Jan 2005 08:25:53 -0800 (PST) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0JGPlNa027399 for ; Wed, 19 Jan 2005 08:25:48 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0JGPlxT012390 for ; Wed, 19 Jan 2005 10:25:47 -0600 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0JGPlF3299024; Wed, 19 Jan 2005 10:25:47 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j0JGPktC20506841; Wed, 19 Jan 2005 10:25:46 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j0JGPkB4691449; Wed, 19 Jan 2005 10:25:46 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j0JGPiYW691239; Wed, 19 Jan 2005 10:25:46 -0600 (CST) Date: Wed, 19 Jan 2005 10:25:43 -0600 From: Erik Jacobson To: kingsley@aurema.com cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality In-Reply-To: <20050114073301.GA15596@aurema.com> Message-ID: References: <20050110233750.GC26466@aurema.com> <20050111223424.GA14765@aurema.com> <20050114073301.GA15596@aurema.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 73 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg > I've noticed one minor issue with the implementation for skipping pagg > associations during pagg_init. If the register function finds that a > task was taken off the task list during registration it traverses the > list from the beginning. Tasks that were skipped would therefore be > looked at again. Still, it's not a big issue. I suppose clients > should be able to handle looking at skipped tasks a few times. Hi. I was loooking at this a bit today. I'm not quite sure how to improve this. I could add a comment about this :) I suppose we could have a list of already skipped tasks and not even try them again if they were skipped once. But I'm not sure if that is too ugly? Unless I hear more feedback, I'm just going to add a comment in the existing comments for the init function pointer in pagg.h. Something like: The implementation of pagg_hook_register causes us to evaluate some tasks more than once in some cases. See the comments in pagg_hook_register for why. Therefore, if the init function pointer returns >0, which means that it doesn't want a pagg association, that init function must be prepared to possibly look at the same "skipped" task more than once. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From erikj@subway.americas.sgi.com Wed Jan 19 11:42:11 2005 Received: with ECARTIS (v1.0.0; list pagg); Wed, 19 Jan 2005 11:42:17 -0800 (PST) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0JJgBNA007220 for ; Wed, 19 Jan 2005 11:42:11 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0JJgAxT020654 for ; Wed, 19 Jan 2005 13:42:10 -0600 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0JJgAF3310237 for ; Wed, 19 Jan 2005 13:42:10 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id j0JJgAtC20539642 for ; Wed, 19 Jan 2005 13:42:10 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id j0JJg9B4696675 for ; Wed, 19 Jan 2005 13:42:09 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id j0JJg9Xa696760 for ; Wed, 19 Jan 2005 13:42:09 -0600 (CST) Date: Wed, 19 Jan 2005 13:42:09 -0600 From: Erik Jacobson To: pagg@oss.sgi.com Subject: New PAGG patch for 2.6.10, build fix Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 74 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg I just uploaded a new PAGG patch for 2.6.10. When we updated pagg_attach to have a return value, I neglected to fix up the pagg_attach macro used when CONFIG_PAGG isn't set. The end result was, if you had the PAGG patch in place but didn't configure PAGG on for your kernel, you would get a build failure. This is now fixed. I also updated the pagg hook init function pointer comments per our earlier discussion. Find the 'linux-2.6.10-pagg.patch-4' patch at the PAGG web site. http://oss.sgi.com/projects/pagg/ Click on "Download" on the left. Thank you. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From kingsley@aurema.com Thu Jan 20 14:41:54 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 20 Jan 2005 14:41:59 -0800 (PST) Received: from smtp.sw.oz.au (IDENT:FWUSER@alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0KMfo0Q027184 for ; Thu, 20 Jan 2005 14:41:52 -0800 Received: from smtp.sw.oz.au (localhost [127.0.0.1]) by smtp.sw.oz.au with ESMTP id j0KMXXil008600; Fri, 21 Jan 2005 09:33:33 +1100 (EST) Received: (from kingsley@localhost) by smtp.sw.oz.au id j0KMXWQa008595; Fri, 21 Jan 2005 09:33:32 +1100 (EST) Date: Fri, 21 Jan 2005 09:33:32 +1100 From: kingsley@aurema.com To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: Re: New PAGG patch for 2.6.10, new functionality Message-ID: <20050120223332.GA6869@aurema.com> References: <20050110233750.GC26466@aurema.com> <20050111223424.GA14765@aurema.com> <20050114073301.GA15596@aurema.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i X-Scanned-By: MIMEDefang 2.48 on 192.41.203.35 X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 75 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg On Wed, Jan 19, 2005 at 10:25:43AM -0600, Erik Jacobson wrote: > > I've noticed one minor issue with the implementation for skipping pagg > > associations during pagg_init. If the register function finds that a > > task was taken off the task list during registration it traverses the > > list from the beginning. Tasks that were skipped would therefore be > > looked at again. Still, it's not a big issue. I suppose clients > > should be able to handle looking at skipped tasks a few times. > > Hi. I was loooking at this a bit today. > > I'm not quite sure how to improve this. I could add a comment about this :) > > I suppose we could have a list of already skipped tasks and not even > try them again if they were skipped once. But I'm not sure if that is > too ugly? Yes, I think so. It would make the registration implementation more complicated. > > Unless I hear more feedback, I'm just going to add a comment in the > existing comments for the init function pointer in pagg.h. Something like: > > The implementation of pagg_hook_register causes us to evaluate some tasks > more than once in some cases. See the comments in pagg_hook_register for > why. Therefore, if the init function pointer returns >0, which means that it > doesn't want a pagg association, that init function must be prepared to > possibly look at the same "skipped" task more than once. I think a comment is adequate. The above sounds good enough to me ;) -- Kingsley From kaigai@ak.jp.nec.com Thu Jan 27 04:40:18 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 27 Jan 2005 04:40:26 -0800 (PST) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.214]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0RCeHwO017989 for ; Thu, 27 Jan 2005 04:40:17 -0800 Received: from mailgate3.nec.co.jp (mailgate53.nec.co.jp [10.7.69.161] (may be forged)) by tyo201.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j0RCd8e29321; Thu, 27 Jan 2005 21:39:08 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j0RCd8Q08033; Thu, 27 Jan 2005 21:39:08 +0900 (JST) Received: from mailsv.bs1.fc.nec.co.jp (venus.hpc.bs1.fc.nec.co.jp [10.34.77.164]) by mailsv.nec.co.jp (8.11.7/3.7W-MAILSV-NEC) with ESMTP id j0RCd7t18739; Thu, 27 Jan 2005 21:39:07 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (IDENT:postfix@namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.bs1.fc.nec.co.jp (8.12.10/3.7W-HPC5.2F(mailsv)04081615) with ESMTP id j0RCVLIK018856; Thu, 27 Jan 2005 21:31:22 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id 67E2130984; Thu, 27 Jan 2005 21:39:05 +0900 (JST) Message-ID: <41F8E117.5030501@ak.jp.nec.com> Date: Thu, 27 Jan 2005 21:39:51 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Erik Jacobson Cc: pagg@oss.sgi.com, Limin Gu , Paul Jackson , lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: CpuSet on PAGG (Re: PAGG in Open Source projects?) References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 76 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg Hi, Erik Jacobson wrote: > Are any of you using PAGG in open source projects? Currently, only Job (and CSA) is known as the PAGG user. But we can use the PAGG framework as the generic purpose fork()/exit() event handling semantics, I think. For example, the CpuSet is typically appliable on this. > One of the reasons PAGG has had trouble being accepted is because we can't > point to enough open source users. Here at SGI, we have a few different > open source packages making use of it. However, only one PAGG user so far > has gone through community review (Job). > > We think we might be able to improve our case for including PAGG in the > kernel if other open source projects are using PAGG. Indeed, I tried to include the CpuSet into PAGG. And, some modification for PAGG is needed. [1/3] linux-2.6.11-rc2-mm1-pagg.patch This patch modifies linux-2.6.10-pagg.patch-4 for 2.6.11-rc2-mm1. We can't apply the original PAGG patch to -mm kernel completely, hence I fixed up it. [2/3] linux-2.6.11-rc2-mm1-pagg_on_RCU When we call pagg_get(), we must hold the task->pagg_sem read-semaphore. This make it difficult to refere the PAGG object in the interruption context or under the any types of spinlock. This patch make it possible to refere the PAGG object without any locking. (CpuSet-patch needs lockless references.) [3/3] linux-2.6.11-rc2-mm1-CpuSet_by_PAGG.patch We can use PAGG as the fork()/exit() event handling framework for generic purposes. Some functions, like as CpuSet, fit the PAGG framework, I think. We want to use Job(and CSA) or CpuSet without specific patches. And, it's so important to adopt the PAGG framework into the stock kernel. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei From kaigai@ak.jp.nec.com Thu Jan 27 04:47:01 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 27 Jan 2005 04:47:09 -0800 (PST) Received: from tyo202.gate.nec.co.jp (TYO202.gate.nec.co.jp [202.32.8.202]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0RCkwtM018240 for ; Thu, 27 Jan 2005 04:46:59 -0800 Received: from mailgate4.nec.co.jp (mailgate54.nec.co.jp [10.7.69.197]) by tyo202.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j0RCjAM10492; Thu, 27 Jan 2005 21:45:10 +0900 (JST) Received: (from root@localhost) by mailgate4.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j0RCjA300628; Thu, 27 Jan 2005 21:45:10 +0900 (JST) Received: from mailsv.bs1.fc.nec.co.jp (venus.hpc.bs1.fc.nec.co.jp [10.34.77.164]) by mailsv.nec.co.jp (8.11.7/3.7W-MAILSV-NEC) with ESMTP id j0RCj9t23168; Thu, 27 Jan 2005 21:45:09 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (IDENT:postfix@namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.bs1.fc.nec.co.jp (8.12.10/3.7W-HPC5.2F(mailsv)04081615) with ESMTP id j0RCbOIK018917; Thu, 27 Jan 2005 21:37:25 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id 8694C30984; Thu, 27 Jan 2005 21:45:08 +0900 (JST) Message-ID: <41F8E283.9020900@ak.jp.nec.com> Date: Thu, 27 Jan 2005 21:45:55 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Erik Jacobson Cc: Kaigai Kohei , pagg@oss.sgi.com, Limin Gu , Paul Jackson , lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: [1/3] Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) References: <41F8E117.5030501@ak.jp.nec.com> In-Reply-To: <41F8E117.5030501@ak.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 77 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg [1/3] linux-2.6.11-rc2-mm1-pagg.patch This patch modifies linux-2.6.10-pagg.patch-4 for 2.6.11-rc2-mm1. We can't apply the original PAGG patch to -mm kernel completely, hence I fixed up it. -- Linux Promotion Center, NEC KaiGai Kohei diff -rpNU3 linux-2.6.11-rc2-mm1/Documentation/pagg.txt linux-2.6.11-rc2-mm1.pagg/Documentation/pagg.txt --- linux-2.6.11-rc2-mm1/Documentation/pagg.txt 1970-01-01 09:00:00.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/Documentation/pagg.txt 2005-01-25 15:13:18.000000000 +0900 @@ -0,0 +1,32 @@ +Linux Process Aggregates (PAGG) +------------------------------- + +The process aggregates infrastructure, or PAGG, provides a generalized +mechanism for providing arbitrary process groups in Linux. PAGG consists +of a series of functions for registering and unregistering support +for new types of process aggregation containers with the kernel. +This is similar to the support currently provided within Linux that +allows for dynamic support of filesystems, block and character devices, +symbol tables, network devices, serial devices, and execution domains. +This implementation of PAGG provides developers the basic hooks necessary +to implement kernel modules for specific process containers, such as +the job container. + +The do_fork function in the kernel was altered to support PAGG. If a +process is attached to any PAGG containers and subsequently forks a +child process, the child process will also be attached to the same PAGG +containers. The PAGG containers involved during the fork are notified +that a new process has been attached. The notification is accomplished +via a callback function provided by the PAGG module. + +The do_exit function in the kernel has also been altered. If a process +is attached to any PAGG containers and that process is exiting, the PAGG +containers are notified that a process has detached from the container. +The notification is accomplished via a callback function provided by +the PAGG module. + +The sys_execve function has been modified to support an optional callout +that can be run when a process in a pagg list does an exec. It can be +used, for example, by other kernel modules that wish to do advanced CPU +placement on multi-processor systems (just one example). + diff -rpNU3 linux-2.6.11-rc2-mm1/fs/exec.c linux-2.6.11-rc2-mm1.pagg/fs/exec.c --- linux-2.6.11-rc2-mm1/fs/exec.c 2005-01-25 14:56:17.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/fs/exec.c 2005-01-25 15:13:18.000000000 +0900 @@ -49,6 +49,7 @@ #include #include #include +#include #include #include @@ -1192,6 +1193,7 @@ int do_execve(char * filename, retval = search_binary_handler(bprm,regs); if (retval >= 0) { free_arg_pages(bprm); + pagg_exec(current); /* execve success */ security_bprm_free(bprm); diff -rpNU3 linux-2.6.11-rc2-mm1/include/linux/init_task.h linux-2.6.11-rc2-mm1.pagg/include/linux/init_task.h --- linux-2.6.11-rc2-mm1/include/linux/init_task.h 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/include/linux/init_task.h 2005-01-25 15:37:43.000000000 +0900 @@ -2,6 +2,7 @@ #define _LINUX__INIT_TASK_H #include +#include #define INIT_FILES \ { \ @@ -112,6 +113,7 @@ extern struct group_info init_groups; .switch_lock = SPIN_LOCK_UNLOCKED, \ .journal_info = NULL, \ .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ + INIT_TASK_PAGG(tsk) \ .private_pages = LIST_HEAD_INIT(tsk.private_pages), \ .private_pages_count = 0, \ } diff -rpNU3 linux-2.6.11-rc2-mm1/include/linux/pagg.h linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h --- linux-2.6.11-rc2-mm1/include/linux/pagg.h 1970-01-01 09:00:00.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h 2005-01-25 15:13:18.000000000 +0900 @@ -0,0 +1,223 @@ +/* + * PAGG (Process Aggregates) interface + * + * + * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + * + * For further information regarding this notice, see: + * + * http://oss.sgi.com/projects/GenInfo/NoticeExplan + */ + +/* + * Data structure definitions and function prototypes used to implement + * process aggregates (paggs). + * + * Paggs provides a generalized way to implement process groupings or + * containers. Modules use these functions to register with the kernel as + * providers of process aggregation containers. The pagg data structures + * define the callback functions and data access pointers back into the + * pagg modules. + */ + +#ifndef _LINUX_PAGG_H +#define _LINUX_PAGG_H + +#include + +#ifdef CONFIG_PAGG + +#define PAGG_NAMELN 32 /* Max chars in PAGG module name */ + + +/** + * INIT_PAGG_LIST - used to initialize a pagg_list structure after declaration + * @_l: Task struct to init the pagg_list and semaphore in + * + */ +#define INIT_PAGG_LIST(_l) \ +do { \ + INIT_LIST_HEAD(&(_l)->pagg_list); \ + init_rwsem(&(_l)->pagg_sem); \ +} while(0) + + +/* + * Used by task_struct to manage list of pagg attachments for the process. + * Each pagg provides the link between the process and the + * correct pagg container. + * + * STRUCT MEMBERS: + * hook: Reference to pagg module structure. That struct + * holds the name key and function pointers. + * data: Opaque data pointer - defined by pagg modules. + * entry: List pointers + */ +struct pagg { + struct pagg_hook *hook; + void *data; + struct list_head entry; +}; + +/* + * Used by pagg modules to define the callback functions into the + * module. + * + * STRUCT MEMBERS: + * name: The name of the pagg container type provided by + * the module. This will be set by the pagg module. + * attach: Function pointer to function used when attaching + * a process to the pagg container referenced by + * this struct. + * Return codes from the attach function pointer have + * These meanings: + * <0 Error which is propagated back to copy_process so + * the fork fails. + * =0 success, attach to same container as parent + * >0 success, but don't attach to a container + * + * detach: Function pointer to function used when detaching + * a process to the pagg container referenced by + * this struct. + * init: Function pointer to initialization function. This + * function is used when the module is loaded to attach + * existing processes to a default container as defined by + * the pagg module. This is optional and may be set to + * NULL if it is not needed by the pagg module. + * + * Note: The return values are managed the same way as in + * attach above. Except, of course, an error doesn't + * result in a fork failure. + * + * Note: The implementation of pagg_hook_register causes + * us to evaluate some tasks more than once in some cases. + * See the comments in pagg_hook_register for why. + * Therefore, if the init function pointer returns >0, + * which means that it doesn't want a pagg association, + * that init function must be prepared to possibly look at + * the same "skipped" task more than once. + * + * data: Opaque data pointer - defined by pagg modules. + * module: Pointer to kernel module struct. Used to increment & + * decrement the use count for the module. + * entry: List pointers + * exec: Function pointer to function used when a process + * in the pagg container exec's a new process. This + * is optional and may be set to NULL if it is not + * needed by the pagg module. + * refcnt: Keep track of user count of the pagg hook + */ +struct pagg_hook { + struct module *module; + char *name; /* Name Key - restricted to 32 characters */ + void *data; /* Opaque module specific data */ + struct list_head entry; /* List pointers */ + atomic_t refcnt; /* usage counter */ + int (*init)(struct task_struct *, struct pagg *); + int (*attach)(struct task_struct *, struct pagg *, void*); + void (*detach)(struct task_struct *, struct pagg *); + void (*exec)(struct task_struct *, struct pagg *); +}; + + +/* Kernel service functions for providing PAGG support */ +extern struct pagg *pagg_get(struct task_struct *task, char *key); +extern struct pagg *pagg_alloc(struct task_struct *task, + struct pagg_hook *pt); +extern void pagg_free(struct pagg *pagg); +extern int pagg_hook_register(struct pagg_hook *pt_new); +extern int pagg_hook_unregister(struct pagg_hook *pt_old); +extern int __pagg_attach(struct task_struct *to_task, + struct task_struct *from_task); +extern void __pagg_detach(struct task_struct *task); +extern int __pagg_exec(struct task_struct *task); + +/** + * pagg_attach - child inherits attachment to pagg containers of its parent + * @child: child task - to inherit + * @parent: parenet task - child inherits pagg containers from this parent + * + * function used when a child process must inherit attachment to pagg + * containers from the parent. Return code is propagated as a fork fail. + * + */ +static inline int pagg_attach(struct task_struct *child, + struct task_struct *parent) +{ + INIT_PAGG_LIST(child); + if (!list_empty(&parent->pagg_list)) + return __pagg_attach(child, parent); + + return 0; +} + + +/** + * pagg_detach - Detach a process from a pagg container it is a member of + * @task: The task the pagg will be detached from + * + */ +static inline void pagg_detach(struct task_struct *task) +{ + if (!list_empty(&task->pagg_list)) + __pagg_detach(task); +} + +/** + * pagg_exec - Used when a process exec's + * @task: The process doing the exec + * + */ +static inline void pagg_exec(struct task_struct *task) +{ + if (!list_empty(&task->pagg_list)) + __pagg_exec(task); +} + +/** + * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list + * @tsk: The task work with + * + * Marco Used in INIT_TASK to set the head and sem of pagg_list. + * If CONFIG_PAGG is off, it is defined as an empty macro below. + * + */ +#define INIT_TASK_PAGG(tsk) \ + .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \ + .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem), + +#else /* CONFIG_PAGG */ + +/* + * Replacement macros used when PAGG (Process Aggregates) support is not + * compiled into the kernel. + */ +#define INIT_TASK_PAGG(tsk) +#define INIT_PAGG_LIST(l) do { } while(0) +#define pagg_attach(ct, pt) ({ 0; }) +#define pagg_detach(t) do { } while(0) +#define pagg_exec(t) do { } while(0) + +#endif /* CONFIG_PAGG */ + +#endif /* _LINUX_PAGG_H */ diff -rpNU3 linux-2.6.11-rc2-mm1/include/linux/sched.h linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h --- linux-2.6.11-rc2-mm1/include/linux/sched.h 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h 2005-01-25 15:36:35.000000000 +0900 @@ -729,6 +729,12 @@ struct task_struct { int cpuset_mems_generation; #endif +#ifdef CONFIG_PAGG +/* List of pagg (process aggregate) attachments */ + struct list_head pagg_list; + struct rw_semaphore pagg_sem; +#endif + struct list_head private_pages; /* per-process private pages */ int private_pages_count; }; diff -rpNU3 linux-2.6.11-rc2-mm1/init/Kconfig linux-2.6.11-rc2-mm1.pagg/init/Kconfig --- linux-2.6.11-rc2-mm1/init/Kconfig 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/init/Kconfig 2005-01-25 15:13:18.000000000 +0900 @@ -138,6 +138,14 @@ config BSD_PROCESS_ACCT_V3 for processing it. A preliminary version of these tools is available at . +config PAGG + bool "Support for process aggregates (PAGGs)" + help + Say Y here if you will be loading modules which provide support + for process aggregate containers. Examples of such modules include the + Linux Jobs module and the Linux Array Sessions module. If you will not + be using such modules, say N. + config SYSCTL bool "Sysctl support" ---help--- diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/Makefile linux-2.6.11-rc2-mm1.pagg/kernel/Makefile --- linux-2.6.11-rc2-mm1/kernel/Makefile 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/Makefile 2005-01-25 15:13:18.000000000 +0900 @@ -21,6 +21,7 @@ obj-$(CONFIG_KEXEC) += kexec.o obj-$(CONFIG_LTT) += ltt-core.o obj-$(CONFIG_COMPAT) += compat.o obj-$(CONFIG_CPUSETS) += cpuset.o +obj-$(CONFIG_PAGG) += pagg.o obj-$(CONFIG_IKCONFIG) += configs.o obj-$(CONFIG_IKCONFIG_PROC) += configs.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/exit.c linux-2.6.11-rc2-mm1.pagg/kernel/exit.c --- linux-2.6.11-rc2-mm1/kernel/exit.c 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/exit.c 2005-01-25 15:13:18.000000000 +0900 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -837,6 +838,9 @@ fastcall NORET_TYPE void do_exit(long co module_put(tsk->binfmt->module); tsk->exit_code = code; + + pagg_detach(tsk); + exit_notify(tsk); #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/fork.c linux-2.6.11-rc2-mm1.pagg/kernel/fork.c --- linux-2.6.11-rc2-mm1/kernel/fork.c 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/fork.c 2005-01-25 15:13:18.000000000 +0900 @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -131,6 +132,9 @@ void __init fork_init(unsigned long memp init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2; init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2; + + /* Initialize the pagg list in pid 0 before it can clone itself. */ + INIT_PAGG_LIST(current); } static struct task_struct *dup_task_struct(struct task_struct *orig) @@ -981,6 +985,15 @@ static task_t *copy_process(unsigned lon sched_fork(p); /* + * call pagg modules to properly attach new process to the same + * process aggregate containers as the parent process. Fail the fork + * on error. + */ + retval = pagg_attach(p, current); + if (retval) + goto bad_fork_cleanup_namespace; + + /* * Ok, make it visible to the rest of the system. * We dont wake it up yet. */ @@ -1087,6 +1100,7 @@ fork_out: return p; bad_fork_cleanup_namespace: + pagg_detach(p); exit_namespace(p); bad_fork_cleanup_keys: exit_keys(p); diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/pagg.c linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c --- linux-2.6.11-rc2-mm1/kernel/pagg.c 1970-01-01 09:00:00.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c 2005-01-25 15:13:18.000000000 +0900 @@ -0,0 +1,496 @@ +/* + * PAGG (Process Aggregates) interface + * + * + * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + */ + +#include +#include +#include +#include +#include +#include + +/* list of pagg hook entries that reference the "module" implementations */ +static LIST_HEAD(pagg_hook_list); +static DECLARE_RWSEM(pagg_hook_list_sem); + + +/** + * pagg_get - get a pagg given a search key + * @task: We examine the pagg_list from the given task + * @key: Key name of pagg we wish to retrieve + * + * Given a pagg_list list structure, this function will return + * a pointer to the pagg struct that matches the search + * key. If the key is not found, the function will return NULL. + * + * The caller should hold at least a read lock on the pagg_list + * for task using down_read(&task->pagg_list.sem). + * + */ +struct pagg * +pagg_get(struct task_struct *task, char *key) +{ + struct pagg *pagg; + + list_for_each_entry(pagg, &task->pagg_list, entry) { + if (!strcmp(pagg->hook->name,key)) + return pagg; + } + return NULL; +} + + +/** + * pagg_alloc - Insert a new pagg in to the pagg_list for a task + * @task: Task we want to insert the pagg in to + * @pagg_hook: Pagg hook to associate with the new pagg + * + * Given a task and a pagg hook, this function will allocate + * a new pagg structure, initialize the settings, and insert the pagg into + * the pagg_list for the task. + * + * The caller for this function should hold at least a read lock on the + * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be + * removed. If this function was called from the pagg module (usually the + * case), then the caller need not hold this lock. The caller should hold + * a write lock on for the tasks pagg_sem. This can be locked using + * down_write(&task->pagg_sem) + * + */ +struct pagg * +pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +{ + struct pagg *pagg; + + pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL); + if (!pagg) + return NULL; + + pagg->hook = pagg_hook; + pagg->data = NULL; + atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */ + list_add_tail(&pagg->entry, &task->pagg_list); + return pagg; +} + + +/** + * pagg_free - Delete pagg from the list and free its memory + * @pagg: The pagg to free + * + * This function will ensure the pagg is deleted form + * the list of pagg entries for the task. Finally, the memory for the + * pagg is discarded. + * + * The caller of this function should hold a write lock on the pagg_sem + * for the task. This can be locked using down_write(&task->pagg_sem). + * + * Prior to calling pagg_free, the pagg should have been detached from the + * pagg container represented by this pagg. That is usually done using + * p->hook->detach(task, pagg); + * + */ +void +pagg_free(struct pagg *pagg) +{ + atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ + list_del(&pagg->entry); + kfree(pagg); +} + + +/** + * get_pagg_hook - Get the pagg hook matching the requested name + * @key: The name of the pagg hook to get + * + * Given a pagg hook name key, this function will return a pointer + * to the pagg_hook struct that matches the name. + * + * You should hold either the write or read lock for pagg_hook_list_sem + * before using this function. This will ensure that the pagg_hook_list + * does not change while iterating through the list entries. + * + */ +static struct pagg_hook * +get_pagg_hook(char *key) +{ + struct pagg_hook *pagg_hook; + + list_for_each_entry(pagg_hook, &pagg_hook_list, entry) { + if (!strcmp(pagg_hook->name, key)) { + return pagg_hook; + } + } + return NULL; +} + +/** + * remove_client_paggs_from_all_tasks - Remove all paggs associated with hook + * @php: Pagg hook associated with paggs to purge + * + * Given a pagg hook, this function will remove all paggs associated with that + * pagg hook from all tasks calling the provided function on each pagg. + * + * If there is a detach function associated with the pagg, it is called + * before the pagg is freed. + * + * This is meant to be used by pagg_hook_register and pagg_hook_unregister + * + */ +static void +remove_client_paggs_from_all_tasks(struct pagg_hook *php) +{ + if (php == NULL) + return; + + /* Because of internal race conditions we can't gaurantee + * getting every task in just one pass so we just keep going + * until there are no tasks with paggs from this hook attached. + * The inefficiency of this should be tempered by the fact that this + * happens at most once for each registered client. + */ + while (atomic_read(&php->refcnt) != 0) { + struct task_struct *g = NULL, *p = NULL; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + struct pagg *paggp; + int task_exited; + + get_task_struct(p); + read_unlock(&tasklist_lock); + down_write(&p->pagg_sem); + paggp = pagg_get(p, php->name); + if (paggp != NULL) { + (void)php->detach(p, paggp); + pagg_free(paggp); + } + up_write(&p->pagg_sem); + read_lock(&tasklist_lock); + + /* If a PAGG got removed from the list while we're going through + * each process, the tasks list for the process would be empty. In + * that case, break out of this for_each_thread so we can do it + * again. */ + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (task_exited) + goto endloop; + } while_each_thread(g, p); + endloop: + read_unlock(&tasklist_lock); + } +} + +/** + * pagg_hook_register - Register a new pagg hook and enter it the list + * @pagg_hook_new: The new pagg hook to register + * + * Used to register a new pagg hook and enter it into the pagg_hook_list. + * The service name for a pagg hook is restricted to 32 characters. + * + * If an "init()" function is supplied in the hook being registered then a + * pagg will be attached to all existing tasks and the supplied "init()" + * function will be applied to it. If any call to the supplied "init()" + * function returns a non zero result the registration will be aborted. As + * part of the abort process, all paggs belonging to the new client will be + * removed from all tasks and the supplied "detach()" function will be + * called on them. + * + * If a memory error is encountered, the pagg hook is unregistered and any + * tasks that have been attached to the initial pagg container are detached + * from that container. + * + */ +int +pagg_hook_register(struct pagg_hook *pagg_hook_new) +{ + struct pagg_hook *pagg_hook = NULL; + + /* Add new pagg module to access list */ + if (!pagg_hook_new) + return -EINVAL; /* error */ + if (!list_empty(&pagg_hook_new->entry)) + return -EINVAL; /* error */ + if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN) + return -EINVAL; /* error */ + if (!pagg_hook_new->attach || !pagg_hook_new->detach) + return -EINVAL; /* error */ + + /* Try to insert new hook entry into the pagg hook list */ + down_write(&pagg_hook_list_sem); + + pagg_hook = get_pagg_hook(pagg_hook_new->name); + + if (pagg_hook) { + up_write(&pagg_hook_list_sem); + printk(KERN_WARNING "Attempt to register duplicate" + " PAGG support (name=%s)\n", pagg_hook_new->name); + return -EBUSY; + } + + /* Okay, we can insert into the pagg hook list */ + list_add_tail(&pagg_hook_new->entry, &pagg_hook_list); + /* set the ref count to zero */ + atomic_set(&pagg_hook_new->refcnt, 0); + + /* Now we can call the initializer function (if present) for each task */ + if (pagg_hook_new->init != NULL) { + struct task_struct *g = NULL, *p = NULL; + int init_result = 0; + + /* Because of internal race conditions we can't guarantee + * getting every task in just one pass so we just keep going + * until we don't find any unitialized tasks. The inefficiency + * of this should be tempered by the fact that this happens + * at most once for each registered client. + */ + read_lock(&tasklist_lock); + repeat: + do_each_thread(g, p) { + struct pagg *paggp; + int task_exited; + + get_task_struct(p); + read_unlock(&tasklist_lock); + down_write(&p->pagg_sem); + paggp = pagg_get(p, pagg_hook_new->name); + if (!paggp && !(p->flags & PF_EXITING)) { + paggp = pagg_alloc(p, pagg_hook_new); + if (paggp != NULL) { + init_result = pagg_hook_new->init(p, paggp); + + /* Success, but init function pointer doesn't want grouping */ + if (init_result > 0) + pagg_free(paggp); + } + else + init_result = -ENOMEM; + } + up_write(&p->pagg_sem); + read_lock(&tasklist_lock); + /* Like in remove_client_paggs_from_all_tasks, if the task + * disappeared on us while we were going through the + * for_each_thread loop, we need to start over with that loop. + * That's why we have the list_empty here */ + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (init_result < 0) + goto endloop; + if (task_exited) + goto repeat; + } while_each_thread(g, p); + endloop: + read_unlock(&tasklist_lock); + + /* + * if anything went wrong during initialisation abandon the + * registration process + */ + if (init_result < 0) { + remove_client_paggs_from_all_tasks(pagg_hook_new); + list_del_init(&pagg_hook_new->entry); + up_write(&pagg_hook_list_sem); + + printk(KERN_WARNING "Registering PAGG support for" + " (name=%s) failed\n", pagg_hook_new->name); + + return init_result; /* hook init function error result */ + } + } + + up_write(&pagg_hook_list_sem); + + printk(KERN_INFO "Registering PAGG support for (name=%s)\n", + pagg_hook_new->name); + + return 0; /* success */ + +} + +/** + * pagg_hook_unregister - Unregister pagg hook and remove it from the list + * @pagg_hook_old: The hook to unregister and remove + * + * Used to unregister pagg hooks and remove them from the pagg_hook_list. + * Once the pagg hook entry in the pagg_hook_list is found, paggs associated + * with the hook (if any) will have their detach function called and will + * be detached. + * + */ +int +pagg_hook_unregister(struct pagg_hook *pagg_hook_old) +{ + struct pagg_hook *pagg_hook; + + /* Check the validity of the arguments */ + if (!pagg_hook_old) + return -EINVAL; /* error */ + if (list_empty(&pagg_hook_old->entry)) + return -EINVAL; /* error */ + if (pagg_hook_old->name == NULL) + return -EINVAL; /* error */ + + down_write(&pagg_hook_list_sem); + + pagg_hook = get_pagg_hook(pagg_hook_old->name); + + if (pagg_hook && pagg_hook == pagg_hook_old) { + remove_client_paggs_from_all_tasks(pagg_hook); + list_del_init(&pagg_hook->entry); + up_write(&pagg_hook_list_sem); + + printk(KERN_INFO "Unregistering PAGG support for" + " (name=%s)\n", pagg_hook_old->name); + + return 0; /* success */ + } + + up_write(&pagg_hook_list_sem); + + printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)" + " failed - not found\n", pagg_hook_old->name); + + return -EINVAL; /* error */ +} + + +/** + * __pagg_attach - Attach a new task to the same containers of its parent + * @to_task: The child task that will inherit the parent's containers + * @from_task: The parent task + * + * Used to attach a new task to the same pagg containers to which it's parent + * is attached. + * + * The "from" argument is the parent task. The "to" argument is the child + * task. + * + * See the attach decription in linux/include/linux/pagg.h for details on + * how to handle return codes from the attach function pointer. + * + */ +int +__pagg_attach(struct task_struct *to_task, struct task_struct *from_task) +{ + struct pagg *from_pagg; + int ret; + + /* lock the parents pagg_list we are copying from */ + down_read(&from_task->pagg_sem); /* read lock the pagg list */ + + list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { + struct pagg *to_pagg = NULL; + + to_pagg = pagg_alloc(to_task, from_pagg->hook); + if (!to_pagg) { + ret=-ENOMEM; + goto error_return; + } + ret = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data); + + if (ret < 0) { + /* Propagates to copy_process as a fork failure */ + goto error_return; + } + else if (ret > 0) { + /* Success, but attach function pointer doesn't want grouping */ + pagg_free(to_pagg); + } + } + + up_read(&from_task->pagg_sem); /* unlock the pagg list */ + + return 0; /* success */ + + error_return: + /* + * Clean up all the pagg attachments made on behalf of the new + * task. Set new task pagg ptr to NULL for return. + */ + up_read(&from_task->pagg_sem); /* unlock the pagg list */ + __pagg_detach(to_task); + return ret; /* failure */ +} + +/** + * __pagg_detach - Detach a task from all pagg containers it is attached to + * @task: Task to detach from pagg containers + * + * Used to detach a task from all pagg containers to which it is attached. + * + */ +void +__pagg_detach(struct task_struct *task) +{ + struct pagg *pagg; + struct pagg *paggtmp; + + /* Remove ref. to paggs from task immediately */ + down_write(&task->pagg_sem); /* write lock pagg list */ + + list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) { + pagg->hook->detach(task, pagg); + pagg_free(pagg); + } + + up_write(&task->pagg_sem); /* write unlock the pagg list */ + + return; /* 0 = success, else return last code for failure */ +} + + +/** + * __pagg_exec - Execute callback when a process in a container execs + * @task: We go through the pagg list in the given task + * + * Used to when a process that is in a pagg container does an exec. + * + * The "from" argument is the task. The "name" argument is the name + * of the process being exec'ed. + * + */ +int +__pagg_exec(struct task_struct *task) +{ + struct pagg *pagg; + + down_read(&task->pagg_sem); /* lock the pagg list */ + + list_for_each_entry(pagg, &task->pagg_list, entry) { + if (pagg->hook->exec) /* conditional because it's optional */ + pagg->hook->exec(task, pagg); + } + + up_read(&task->pagg_sem); /* unlock the pagg list */ + return 0; +} + + +EXPORT_SYMBOL(pagg_get); +EXPORT_SYMBOL(pagg_alloc); +EXPORT_SYMBOL(pagg_free); +EXPORT_SYMBOL(pagg_hook_register); +EXPORT_SYMBOL(pagg_hook_unregister); From kaigai@ak.jp.nec.com Thu Jan 27 04:47:47 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 27 Jan 2005 04:47:53 -0800 (PST) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.214]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0RClkvr018268 for ; Thu, 27 Jan 2005 04:47:46 -0800 Received: from mailgate4.nec.co.jp (mailgate54.nec.co.jp [10.7.69.193]) by tyo201.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j0RCkZe05939; Thu, 27 Jan 2005 21:46:35 +0900 (JST) Received: (from root@localhost) by mailgate4.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j0RCkZc11564; Thu, 27 Jan 2005 21:46:35 +0900 (JST) Received: from mailsv.bs1.fc.nec.co.jp (venus.hpc.bs1.fc.nec.co.jp [10.34.77.164]) by mailsv5.nec.co.jp (8.11.7/3.7W-MAILSV4-NEC) with ESMTP id j0RCkY129398; Thu, 27 Jan 2005 21:46:34 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (IDENT:postfix@namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.bs1.fc.nec.co.jp (8.12.10/3.7W-HPC5.2F(mailsv)04081615) with ESMTP id j0RCcnIK018933; Thu, 27 Jan 2005 21:38:49 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id 1968930984; Thu, 27 Jan 2005 21:46:34 +0900 (JST) Message-ID: <41F8E2D8.8050701@ak.jp.nec.com> Date: Thu, 27 Jan 2005 21:47:20 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Erik Jacobson Cc: Kaigai Kohei , pagg@oss.sgi.com, Limin Gu , Paul Jackson , lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: [2/3] Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) References: <41F8E117.5030501@ak.jp.nec.com> In-Reply-To: <41F8E117.5030501@ak.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 78 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg [2/3] linux-2.6.11-rc2-mm1-pagg_on_RCU When we call pagg_get(), we must hold the task->pagg_sem read-semaphore. This make it difficult to refere the PAGG object in the interruption context or under the any types of spinlock. This patch make it possible to refere the PAGG object without any locking. (CpuSet-patch needs lockless references.) Notice: - task_struct->pagg_sem was replaced by pagg_lock (spinlock_t). - We must call pagg_get() under the rcu_read_lock(), and the existance of the returned PAGG object is guaranteed until rcu_read_unlock(). - We must call pagg_alloc() and pagg_free() under the spin_lock(&task->pagg_lock) to make sure the processing serialization. -- Linux Promotion Center, NEC KaiGai Kohei diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h 2005-01-27 17:09:40.000000000 +0900 @@ -44,6 +44,7 @@ #define _LINUX_PAGG_H #include +#include #ifdef CONFIG_PAGG @@ -57,8 +58,8 @@ */ #define INIT_PAGG_LIST(_l) \ do { \ - INIT_LIST_HEAD(&(_l)->pagg_list); \ - init_rwsem(&(_l)->pagg_sem); \ + INIT_LIST_HEAD(&(_l)->pagg_list); \ + spin_lock_init(&(_l)->pagg_lock); \ } while(0) @@ -74,9 +75,10 @@ do { \ * entry: List pointers */ struct pagg { - struct pagg_hook *hook; - void *data; - struct list_head entry; + struct pagg_hook *hook; + void *data; + struct list_head entry; + struct rcu_head rhead; }; /* @@ -147,52 +149,10 @@ extern struct pagg *pagg_alloc(struct ta extern void pagg_free(struct pagg *pagg); extern int pagg_hook_register(struct pagg_hook *pt_new); extern int pagg_hook_unregister(struct pagg_hook *pt_old); -extern int __pagg_attach(struct task_struct *to_task, +extern int pagg_attach(struct task_struct *to_task, struct task_struct *from_task); -extern void __pagg_detach(struct task_struct *task); -extern int __pagg_exec(struct task_struct *task); - -/** - * pagg_attach - child inherits attachment to pagg containers of its parent - * @child: child task - to inherit - * @parent: parenet task - child inherits pagg containers from this parent - * - * function used when a child process must inherit attachment to pagg - * containers from the parent. Return code is propagated as a fork fail. - * - */ -static inline int pagg_attach(struct task_struct *child, - struct task_struct *parent) -{ - INIT_PAGG_LIST(child); - if (!list_empty(&parent->pagg_list)) - return __pagg_attach(child, parent); - - return 0; -} - - -/** - * pagg_detach - Detach a process from a pagg container it is a member of - * @task: The task the pagg will be detached from - * - */ -static inline void pagg_detach(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_detach(task); -} - -/** - * pagg_exec - Used when a process exec's - * @task: The process doing the exec - * - */ -static inline void pagg_exec(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_exec(task); -} +extern void pagg_detach(struct task_struct *task); +extern int pagg_exec(struct task_struct *task); /** * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list @@ -204,7 +164,7 @@ static inline void pagg_exec(struct task */ #define INIT_TASK_PAGG(tsk) \ .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \ - .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem), + .pagg_lock = SPIN_LOCK_UNLOCKED, #else /* CONFIG_PAGG */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h 2005-01-27 17:08:46.000000000 +0900 @@ -732,7 +732,7 @@ struct task_struct { #ifdef CONFIG_PAGG /* List of pagg (process aggregate) attachments */ struct list_head pagg_list; - struct rw_semaphore pagg_sem; + spinlock_t pagg_lock; #endif struct list_head private_pages; /* per-process private pages */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c --- linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c 2005-01-27 17:35:53.000000000 +0900 @@ -45,16 +45,15 @@ static DECLARE_RWSEM(pagg_hook_list_sem) * a pointer to the pagg struct that matches the search * key. If the key is not found, the function will return NULL. * - * The caller should hold at least a read lock on the pagg_list - * for task using down_read(&task->pagg_list.sem). - * + * The caller must be under the rcu_read_lock(), and the existance + * of the object which is returned is guaranteed by rcu_read_unlock(). */ struct pagg * pagg_get(struct task_struct *task, char *key) { struct pagg *pagg; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (!strcmp(pagg->hook->name,key)) return pagg; } @@ -74,24 +73,36 @@ pagg_get(struct task_struct *task, char * The caller for this function should hold at least a read lock on the * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be * removed. If this function was called from the pagg module (usually the - * case), then the caller need not hold this lock. The caller should hold - * a write lock on for the tasks pagg_sem. This can be locked using - * down_write(&task->pagg_sem) + * case), then the caller need not hold this lock. The caller must hold + * a spin lock on for the tasks pagg_lock. This can be locked using + * spin_lock(&task->pagg_lock) * */ -struct pagg * -pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +static struct pagg * +__pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) { struct pagg *pagg; pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL); if (!pagg) return NULL; - pagg->hook = pagg_hook; pagg->data = NULL; + INIT_LIST_HEAD(&pagg->entry); atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */ - list_add_tail(&pagg->entry, &task->pagg_list); + + return pagg; +} + +struct pagg * +pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +{ + struct pagg *pagg; + + pagg = __pagg_alloc(task, pagg_hook); + if (!pagg) + return NULL; + list_add_tail_rcu(&pagg->entry, &task->pagg_list); return pagg; } @@ -100,24 +111,37 @@ pagg_alloc(struct task_struct *task, str * pagg_free - Delete pagg from the list and free its memory * @pagg: The pagg to free * - * This function will ensure the pagg is deleted form + * This function will ensure the pagg is deleted from * the list of pagg entries for the task. Finally, the memory for the * pagg is discarded. * - * The caller of this function should hold a write lock on the pagg_sem - * for the task. This can be locked using down_write(&task->pagg_sem). + * The caller of this function must hold a spin lock on the pagg_list + * for the task. This can be locked using spin_lock(&task->pagg_list). * * Prior to calling pagg_free, the pagg should have been detached from the * pagg container represented by this pagg. That is usually done using * p->hook->detach(task, pagg); * */ +static void +rcu_pagg_free(struct rcu_head *rhead) +{ + struct pagg *pg = container_of(rhead, struct pagg, rhead); + kfree(pg); +} + +static void +__pagg_free(struct pagg *pagg) +{ + atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ + call_rcu(&pagg->rhead, rcu_pagg_free); +} + void pagg_free(struct pagg *pagg) { - atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ - list_del(&pagg->entry); - kfree(pagg); + list_del_rcu(&pagg->entry); + __pagg_free(pagg); } @@ -181,13 +205,20 @@ remove_client_paggs_from_all_tasks(struc get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + rcu_read_lock(); + spin_lock(&p->pagg_lock); paggp = pagg_get(p, php->name); if (paggp != NULL) { + list_del_rcu(&paggp->entry); + spin_unlock(&p->pagg_lock); (void)php->detach(p, paggp); - pagg_free(paggp); + __pagg_free(paggp); + } else { + spin_unlock(&p->pagg_lock); } - up_write(&p->pagg_sem); + rcu_read_unlock(); + read_lock(&tasklist_lock); /* If a PAGG got removed from the list while we're going through @@ -275,21 +306,24 @@ pagg_hook_register(struct pagg_hook *pag get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + spin_lock(&p->pagg_lock); paggp = pagg_get(p, pagg_hook_new->name); if (!paggp && !(p->flags & PF_EXITING)) { - paggp = pagg_alloc(p, pagg_hook_new); + paggp = __pagg_alloc(p, pagg_hook_new); if (paggp != NULL) { init_result = pagg_hook_new->init(p, paggp); - - /* Success, but init function pointer doesn't want grouping */ - if (init_result > 0) - pagg_free(paggp); - } - else + if (init_result == 0) { + list_add_tail_rcu(&paggp->entry, &p->pagg_list); + } else { + __pagg_free(paggp); + } + } else { init_result = -ENOMEM; + } } - up_write(&p->pagg_sem); + spin_unlock(&p->pagg_lock); + read_lock(&tasklist_lock); /* Like in remove_client_paggs_from_all_tasks, if the task * disappeared on us while we were going through the @@ -388,41 +422,46 @@ pagg_hook_unregister(struct pagg_hook *p * The "from" argument is the parent task. The "to" argument is the child * task. * - * See the attach decription in linux/include/linux/pagg.h for details on - * how to handle return codes from the attach function pointer. - * + * The child task must not be referenced yet. */ int -__pagg_attach(struct task_struct *to_task, struct task_struct *from_task) +pagg_attach(struct task_struct *to_task, struct task_struct *from_task) { struct pagg *from_pagg; int ret; - /* lock the parents pagg_list we are copying from */ - down_read(&from_task->pagg_sem); /* read lock the pagg list */ + INIT_PAGG_LIST(to_task); + + rcu_read_lock(); + if (list_empty(&from_task->pagg_list)) { + rcu_read_unlock(); + return 0; + } + + /* lock the parents pagg_list we are copying from */ list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { struct pagg *to_pagg = NULL; - to_pagg = pagg_alloc(to_task, from_pagg->hook); + to_pagg = __pagg_alloc(to_task, from_pagg->hook); if (!to_pagg) { ret=-ENOMEM; goto error_return; } ret = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data); - - if (ret < 0) { + if (likely(ret==0)) { + /* Success, and PAGG will be chained */ + list_add_tail_rcu(&to_pagg->entry, &to_task->pagg_list); + } else if (ret < 0) { /* Propagates to copy_process as a fork failure */ goto error_return; - } - else if (ret > 0) { + } else { /* Success, but attach function pointer doesn't want grouping */ - pagg_free(to_pagg); + __pagg_free(to_pagg); } } - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - + rcu_read_unlock(); return 0; /* success */ error_return: @@ -430,8 +469,8 @@ __pagg_attach(struct task_struct *to_tas * Clean up all the pagg attachments made on behalf of the new * task. Set new task pagg ptr to NULL for return. */ - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - __pagg_detach(to_task); + rcu_read_unlock(); + pagg_detach(to_task); return ret; /* failure */ } @@ -443,21 +482,28 @@ __pagg_attach(struct task_struct *to_tas * */ void -__pagg_detach(struct task_struct *task) +pagg_detach(struct task_struct *task) { struct pagg *pagg; struct pagg *paggtmp; - /* Remove ref. to paggs from task immediately */ - down_write(&task->pagg_sem); /* write lock pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; + spin_lock(&task->pagg_lock); list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) { - pagg->hook->detach(task, pagg); - pagg_free(pagg); - } + list_del_rcu(&pagg->entry); + spin_unlock(&task->pagg_lock); - up_write(&task->pagg_sem); /* write unlock the pagg list */ + pagg->hook->detach(task, pagg); + __pagg_free(pagg); + spin_lock(&task->pagg_lock); + } + spin_unlock(&task->pagg_lock); +out: + rcu_read_unlock(); return; /* 0 = success, else return last code for failure */ } @@ -473,18 +519,20 @@ __pagg_detach(struct task_struct *task) * */ int -__pagg_exec(struct task_struct *task) +pagg_exec(struct task_struct *task) { struct pagg *pagg; - down_read(&task->pagg_sem); /* lock the pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (pagg->hook->exec) /* conditional because it's optional */ pagg->hook->exec(task, pagg); } - - up_read(&task->pagg_sem); /* unlock the pagg list */ + out: + rcu_read_unlock(); return 0; } From kaigai@ak.jp.nec.com Thu Jan 27 04:48:29 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 27 Jan 2005 04:48:35 -0800 (PST) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.214]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0RCmSYS018297 for ; Thu, 27 Jan 2005 04:48:29 -0800 Received: from mailgate4.nec.co.jp (mailgate54.nec.co.jp [10.7.69.193]) by tyo201.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j0RClIe06487; Thu, 27 Jan 2005 21:47:18 +0900 (JST) Received: (from root@localhost) by mailgate4.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j0RClIY12174; Thu, 27 Jan 2005 21:47:18 +0900 (JST) Received: from mailsv.bs1.fc.nec.co.jp (venus.hpc.bs1.fc.nec.co.jp [10.34.77.164]) by mailsv3.nec.co.jp (8.11.7/3.7W-MAILSV4-NEC) with ESMTP id j0RClI224374; Thu, 27 Jan 2005 21:47:18 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (IDENT:postfix@namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.bs1.fc.nec.co.jp (8.12.10/3.7W-HPC5.2F(mailsv)04081615) with ESMTP id j0RCdXIK018936; Thu, 27 Jan 2005 21:39:33 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id BB0DF30984; Thu, 27 Jan 2005 21:47:17 +0900 (JST) Message-ID: <41F8E304.1070401@ak.jp.nec.com> Date: Thu, 27 Jan 2005 21:48:04 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Erik Jacobson Cc: Kaigai Kohei , pagg@oss.sgi.com, Limin Gu , Paul Jackson , lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: [3/3] Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) References: <41F8E117.5030501@ak.jp.nec.com> In-Reply-To: <41F8E117.5030501@ak.jp.nec.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 79 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg [3/3] linux-2.6.11-rc2-mm1-CpuSet_by_PAGG.patch We can use PAGG as the fork()/exit() event handling framework for generic purposes. Some functions, like as CpuSet, fit the PAGG framework, I think. -- Linux Promotion Center, NEC KaiGai Kohei diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h 2005-01-27 17:09:40.000000000 +0900 @@ -44,6 +44,7 @@ #define _LINUX_PAGG_H #include +#include #ifdef CONFIG_PAGG @@ -57,8 +58,8 @@ */ #define INIT_PAGG_LIST(_l) \ do { \ - INIT_LIST_HEAD(&(_l)->pagg_list); \ - init_rwsem(&(_l)->pagg_sem); \ + INIT_LIST_HEAD(&(_l)->pagg_list); \ + spin_lock_init(&(_l)->pagg_lock); \ } while(0) @@ -74,9 +75,10 @@ do { \ * entry: List pointers */ struct pagg { - struct pagg_hook *hook; - void *data; - struct list_head entry; + struct pagg_hook *hook; + void *data; + struct list_head entry; + struct rcu_head rhead; }; /* @@ -147,52 +149,10 @@ extern struct pagg *pagg_alloc(struct ta extern void pagg_free(struct pagg *pagg); extern int pagg_hook_register(struct pagg_hook *pt_new); extern int pagg_hook_unregister(struct pagg_hook *pt_old); -extern int __pagg_attach(struct task_struct *to_task, +extern int pagg_attach(struct task_struct *to_task, struct task_struct *from_task); -extern void __pagg_detach(struct task_struct *task); -extern int __pagg_exec(struct task_struct *task); - -/** - * pagg_attach - child inherits attachment to pagg containers of its parent - * @child: child task - to inherit - * @parent: parenet task - child inherits pagg containers from this parent - * - * function used when a child process must inherit attachment to pagg - * containers from the parent. Return code is propagated as a fork fail. - * - */ -static inline int pagg_attach(struct task_struct *child, - struct task_struct *parent) -{ - INIT_PAGG_LIST(child); - if (!list_empty(&parent->pagg_list)) - return __pagg_attach(child, parent); - - return 0; -} - - -/** - * pagg_detach - Detach a process from a pagg container it is a member of - * @task: The task the pagg will be detached from - * - */ -static inline void pagg_detach(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_detach(task); -} - -/** - * pagg_exec - Used when a process exec's - * @task: The process doing the exec - * - */ -static inline void pagg_exec(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_exec(task); -} +extern void pagg_detach(struct task_struct *task); +extern int pagg_exec(struct task_struct *task); /** * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list @@ -204,7 +164,7 @@ static inline void pagg_exec(struct task */ #define INIT_TASK_PAGG(tsk) \ .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \ - .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem), + .pagg_lock = SPIN_LOCK_UNLOCKED, #else /* CONFIG_PAGG */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h 2005-01-27 17:08:46.000000000 +0900 @@ -732,7 +732,7 @@ struct task_struct { #ifdef CONFIG_PAGG /* List of pagg (process aggregate) attachments */ struct list_head pagg_list; - struct rw_semaphore pagg_sem; + spinlock_t pagg_lock; #endif struct list_head private_pages; /* per-process private pages */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c --- linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c 2005-01-27 17:35:53.000000000 +0900 @@ -45,16 +45,15 @@ static DECLARE_RWSEM(pagg_hook_list_sem) * a pointer to the pagg struct that matches the search * key. If the key is not found, the function will return NULL. * - * The caller should hold at least a read lock on the pagg_list - * for task using down_read(&task->pagg_list.sem). - * + * The caller must be under the rcu_read_lock(), and the existance + * of the object which is returned is guaranteed by rcu_read_unlock(). */ struct pagg * pagg_get(struct task_struct *task, char *key) { struct pagg *pagg; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (!strcmp(pagg->hook->name,key)) return pagg; } @@ -74,24 +73,36 @@ pagg_get(struct task_struct *task, char * The caller for this function should hold at least a read lock on the * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be * removed. If this function was called from the pagg module (usually the - * case), then the caller need not hold this lock. The caller should hold - * a write lock on for the tasks pagg_sem. This can be locked using - * down_write(&task->pagg_sem) + * case), then the caller need not hold this lock. The caller must hold + * a spin lock on for the tasks pagg_lock. This can be locked using + * spin_lock(&task->pagg_lock) * */ -struct pagg * -pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +static struct pagg * +__pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) { struct pagg *pagg; pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL); if (!pagg) return NULL; - pagg->hook = pagg_hook; pagg->data = NULL; + INIT_LIST_HEAD(&pagg->entry); atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */ - list_add_tail(&pagg->entry, &task->pagg_list); + + return pagg; +} + +struct pagg * +pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +{ + struct pagg *pagg; + + pagg = __pagg_alloc(task, pagg_hook); + if (!pagg) + return NULL; + list_add_tail_rcu(&pagg->entry, &task->pagg_list); return pagg; } @@ -100,24 +111,37 @@ pagg_alloc(struct task_struct *task, str * pagg_free - Delete pagg from the list and free its memory * @pagg: The pagg to free * - * This function will ensure the pagg is deleted form + * This function will ensure the pagg is deleted from * the list of pagg entries for the task. Finally, the memory for the * pagg is discarded. * - * The caller of this function should hold a write lock on the pagg_sem - * for the task. This can be locked using down_write(&task->pagg_sem). + * The caller of this function must hold a spin lock on the pagg_list + * for the task. This can be locked using spin_lock(&task->pagg_list). * * Prior to calling pagg_free, the pagg should have been detached from the * pagg container represented by this pagg. That is usually done using * p->hook->detach(task, pagg); * */ +static void +rcu_pagg_free(struct rcu_head *rhead) +{ + struct pagg *pg = container_of(rhead, struct pagg, rhead); + kfree(pg); +} + +static void +__pagg_free(struct pagg *pagg) +{ + atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ + call_rcu(&pagg->rhead, rcu_pagg_free); +} + void pagg_free(struct pagg *pagg) { - atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ - list_del(&pagg->entry); - kfree(pagg); + list_del_rcu(&pagg->entry); + __pagg_free(pagg); } @@ -181,13 +205,20 @@ remove_client_paggs_from_all_tasks(struc get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + rcu_read_lock(); + spin_lock(&p->pagg_lock); paggp = pagg_get(p, php->name); if (paggp != NULL) { + list_del_rcu(&paggp->entry); + spin_unlock(&p->pagg_lock); (void)php->detach(p, paggp); - pagg_free(paggp); + __pagg_free(paggp); + } else { + spin_unlock(&p->pagg_lock); } - up_write(&p->pagg_sem); + rcu_read_unlock(); + read_lock(&tasklist_lock); /* If a PAGG got removed from the list while we're going through @@ -275,21 +306,24 @@ pagg_hook_register(struct pagg_hook *pag get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + spin_lock(&p->pagg_lock); paggp = pagg_get(p, pagg_hook_new->name); if (!paggp && !(p->flags & PF_EXITING)) { - paggp = pagg_alloc(p, pagg_hook_new); + paggp = __pagg_alloc(p, pagg_hook_new); if (paggp != NULL) { init_result = pagg_hook_new->init(p, paggp); - - /* Success, but init function pointer doesn't want grouping */ - if (init_result > 0) - pagg_free(paggp); - } - else + if (init_result == 0) { + list_add_tail_rcu(&paggp->entry, &p->pagg_list); + } else { + __pagg_free(paggp); + } + } else { init_result = -ENOMEM; + } } - up_write(&p->pagg_sem); + spin_unlock(&p->pagg_lock); + read_lock(&tasklist_lock); /* Like in remove_client_paggs_from_all_tasks, if the task * disappeared on us while we were going through the @@ -388,41 +422,46 @@ pagg_hook_unregister(struct pagg_hook *p * The "from" argument is the parent task. The "to" argument is the child * task. * - * See the attach decription in linux/include/linux/pagg.h for details on - * how to handle return codes from the attach function pointer. - * + * The child task must not be referenced yet. */ int -__pagg_attach(struct task_struct *to_task, struct task_struct *from_task) +pagg_attach(struct task_struct *to_task, struct task_struct *from_task) { struct pagg *from_pagg; int ret; - /* lock the parents pagg_list we are copying from */ - down_read(&from_task->pagg_sem); /* read lock the pagg list */ + INIT_PAGG_LIST(to_task); + + rcu_read_lock(); + if (list_empty(&from_task->pagg_list)) { + rcu_read_unlock(); + return 0; + } + + /* lock the parents pagg_list we are copying from */ list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { struct pagg *to_pagg = NULL; - to_pagg = pagg_alloc(to_task, from_pagg->hook); + to_pagg = __pagg_alloc(to_task, from_pagg->hook); if (!to_pagg) { ret=-ENOMEM; goto error_return; } ret = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data); - - if (ret < 0) { + if (likely(ret==0)) { + /* Success, and PAGG will be chained */ + list_add_tail_rcu(&to_pagg->entry, &to_task->pagg_list); + } else if (ret < 0) { /* Propagates to copy_process as a fork failure */ goto error_return; - } - else if (ret > 0) { + } else { /* Success, but attach function pointer doesn't want grouping */ - pagg_free(to_pagg); + __pagg_free(to_pagg); } } - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - + rcu_read_unlock(); return 0; /* success */ error_return: @@ -430,8 +469,8 @@ __pagg_attach(struct task_struct *to_tas * Clean up all the pagg attachments made on behalf of the new * task. Set new task pagg ptr to NULL for return. */ - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - __pagg_detach(to_task); + rcu_read_unlock(); + pagg_detach(to_task); return ret; /* failure */ } @@ -443,21 +482,28 @@ __pagg_attach(struct task_struct *to_tas * */ void -__pagg_detach(struct task_struct *task) +pagg_detach(struct task_struct *task) { struct pagg *pagg; struct pagg *paggtmp; - /* Remove ref. to paggs from task immediately */ - down_write(&task->pagg_sem); /* write lock pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; + spin_lock(&task->pagg_lock); list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) { - pagg->hook->detach(task, pagg); - pagg_free(pagg); - } + list_del_rcu(&pagg->entry); + spin_unlock(&task->pagg_lock); - up_write(&task->pagg_sem); /* write unlock the pagg list */ + pagg->hook->detach(task, pagg); + __pagg_free(pagg); + spin_lock(&task->pagg_lock); + } + spin_unlock(&task->pagg_lock); +out: + rcu_read_unlock(); return; /* 0 = success, else return last code for failure */ } @@ -473,18 +519,20 @@ __pagg_detach(struct task_struct *task) * */ int -__pagg_exec(struct task_struct *task) +pagg_exec(struct task_struct *task) { struct pagg *pagg; - down_read(&task->pagg_sem); /* lock the pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (pagg->hook->exec) /* conditional because it's optional */ pagg->hook->exec(task, pagg); } - - up_read(&task->pagg_sem); /* unlock the pagg list */ + out: + rcu_read_unlock(); return 0; } From pj@sgi.com Thu Jan 27 08:18:01 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 27 Jan 2005 08:18:07 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0RGHuq3028620 for ; Thu, 27 Jan 2005 08:17:57 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0RHk6Bt017199 for ; Thu, 27 Jan 2005 09:46:06 -0800 Received: from vpn2 (mtv-vpn-hw-pj-2.corp.sgi.com [134.15.25.219]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with SMTP id j0RGHrgW15751579; Thu, 27 Jan 2005 08:17:53 -0800 (PST) Date: Thu, 27 Jan 2005 08:17:53 -0800 From: Paul Jackson To: Kaigai Kohei Cc: erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) Message-Id: <20050127081753.5a9d16af.pj@sgi.com> In-Reply-To: <41F8E117.5030501@ak.jp.nec.com> References: <41F8E117.5030501@ak.jp.nec.com> Organization: SGI X-Mailer: Sylpheed version 0.9.12 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 80 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: pj@sgi.com Precedence: bulk X-list: pagg Kaigai Kohei wrote: > Indeed, I tried to include the CpuSet into PAGG. Could you describe more what CpuSet patch this is that you are including in PAGG? I have a cpuset patch in Andrew Morton's *-mm patch series for several months now, but I have not thought that it was a good candidate customer of PAGG, for the main reason that my cpuset patch requires other kernel changes, in the kernel memory allocator, and in the other calls that manipulate scheduling (sched_setaffinity) and memory placement (mbind, set_mempolicy), as well as in the /proc file system. See the added kernel files include/linux/cpuset.h and kernel/cpuset.c, for the central portions of the cpuset patch in any *-mm release of the last few months. My understanding of PAGG is that it is especially useful in supporting loadable modules that require to construct some grouping of the tasks on a system, and that require to take some actions on key task events such as fork and exit. Since the cpuset's that I know require several additional specialized hooks not provided by PAGG, I have concluded that PAGG is not a valuable base for cpusets. I have also concluded that cpuset's is not a potential loadable module -- too many kernel hooks required. Are you referring to these cpusets, or about some other facility by that name? If you are referring to these same cpusets, then what benefit do you consider that PAGG provides to these cpusets? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373, 1.925.600.0401 From kaigai@ak.jp.nec.com Fri Jan 28 04:42:34 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 28 Jan 2005 04:42:42 -0800 (PST) Received: from tyo201.gate.nec.co.jp (TYO201.gate.nec.co.jp [202.32.8.214]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0SCgXct031706 for ; Fri, 28 Jan 2005 04:42:34 -0800 Received: from mailgate3.nec.co.jp (mailgate53.nec.co.jp [10.7.69.160] (may be forged)) by tyo201.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j0SCf7e27977; Fri, 28 Jan 2005 21:41:07 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j0SCf7718077; Fri, 28 Jan 2005 21:41:07 +0900 (JST) Received: from mailsv.bs1.fc.nec.co.jp (venus.hpc.bs1.fc.nec.co.jp [10.34.77.164]) by mailsv3.nec.co.jp (8.11.7/3.7W-MAILSV4-NEC) with ESMTP id j0SCf6228118; Fri, 28 Jan 2005 21:41:06 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (IDENT:postfix@namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.bs1.fc.nec.co.jp (8.12.10/3.7W-HPC5.2F(mailsv)04081615) with ESMTP id j0SCXEIK027372; Fri, 28 Jan 2005 21:33:17 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id 2366A30806; Fri, 28 Jan 2005 21:41:03 +0900 (JST) Message-ID: <41FA330A.2030303@ak.jp.nec.com> Date: Fri, 28 Jan 2005 21:41:46 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Paul Jackson Cc: erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) References: <41F8E117.5030501@ak.jp.nec.com> <20050127081753.5a9d16af.pj@sgi.com> In-Reply-To: <20050127081753.5a9d16af.pj@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 81 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg Hi, Paul. Thanks for your comments. I also understood that CpuSet requires some more kernel changes, like as sched_setaffinity() and so on, than PAGG provided. But my main subject is not this point. The purpose of those patches is to restrain incrementation of hook functions in fork() or exit(). I used PAGG for this, as a common event handling framework. Currently CpuSet is the representative example widely known, and Job+CSA and CKRM also require fork()/exit() event handling mechanism. (CKRM uses ckrm_cb_fork(), PAGG uses pagg_attach()) Of course, above advanced features can't implement all of own functions completely without some kernel modifications. I have little motivation to implement CpuSet (or Job+CSA, CKRM) as a kernel-loadable module. The main motivation is that those advanced features use a common fork()/exit() event handling framework, and it will make to restrain the unregulated hook functions in fork()/exit(). I chosen PAGG as a common event handling framework, merely. But what I really wanted is a Common fork()/exit() event handling framework. It may be called PAGG, or not. Thanks. Paul Jackson wrote: > I have a cpuset patch in Andrew Morton's *-mm patch series for several > months now, but I have not thought that it was a good candidate customer > of PAGG, for the main reason that my cpuset patch requires other kernel > changes, in the kernel memory allocator, and in the other calls that > manipulate scheduling (sched_setaffinity) and memory placement (mbind, > set_mempolicy), as well as in the /proc file system. See the added > kernel files include/linux/cpuset.h and kernel/cpuset.c, for the central > portions of the cpuset patch in any *-mm release of the last few months. > > My understanding of PAGG is that it is especially useful in supporting > loadable modules that require to construct some grouping of the tasks on > a system, and that require to take some actions on key task events such > as fork and exit. Since the cpuset's that I know require several > additional specialized hooks not provided by PAGG, I have concluded that > PAGG is not a valuable base for cpusets. I have also concluded that > cpuset's is not a potential loadable module -- too many kernel hooks > required. > > Are you referring to these cpusets, or about some other facility by > that name? > > If you are referring to these same cpusets, then what benefit do you > consider that PAGG provides to these cpusets? -- Linux Promotion Center, NEC KaiGai Kohei From pj@sgi.com Fri Jan 28 05:08:12 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 28 Jan 2005 05:08:19 -0800 (PST) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0SD8Avo032471 for ; Fri, 28 Jan 2005 05:08:11 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0SD8AxT029251 for ; Fri, 28 Jan 2005 07:08:10 -0600 Received: from vpn2 (mtv-vpn-hw-pj-2.corp.sgi.com [134.15.25.219]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with SMTP id j0SD88gW16121301; Fri, 28 Jan 2005 05:08:08 -0800 (PST) Date: Fri, 28 Jan 2005 05:08:07 -0800 From: Paul Jackson To: Kaigai Kohei Cc: erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) Message-Id: <20050128050807.24018fb3.pj@sgi.com> In-Reply-To: <41FA330A.2030303@ak.jp.nec.com> References: <41F8E117.5030501@ak.jp.nec.com> <20050127081753.5a9d16af.pj@sgi.com> <41FA330A.2030303@ak.jp.nec.com> Organization: SGI X-Mailer: Sylpheed version 0.9.12 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 82 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: pj@sgi.com Precedence: bulk X-list: pagg Thank-you for your informative response. Kaigai wrote: > But what I really wanted is a Common fork()/exit() event handling framework. Could you expand on this a bit? Especially since you acknowledge that loadable modules are not particularly essential to your work, I am curious as to what else you find valuable in such a fork/exit framework. > it will make to restrain the unregulated hook functions in fork()/exit(). I will confess to not quite making sense of this statement - sorry. Thanks for your reply so far. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373, 1.925.600.0401 From guillaume.thouvenin@bull.net Fri Jan 28 05:28:25 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 28 Jan 2005 05:28:30 -0800 (PST) Received: from ecfrec.frec.bull.fr (ecfrec.frec.bull.fr [129.183.4.8]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0SDSNKK001483 for ; Fri, 28 Jan 2005 05:28:24 -0800 Received: from localhost (localhost [127.0.0.1]) by ecfrec.frec.bull.fr (Postfix) with ESMTP id 9D30019D910; Fri, 28 Jan 2005 14:28:17 +0100 (CET) Received: from ecfrec.frec.bull.fr ([127.0.0.1]) by localhost (ecfrec.frec.bull.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 25303-04; Fri, 28 Jan 2005 14:28:15 +0100 (CET) Received: from ecn002.frec.bull.fr (ecn002.frec.bull.fr [129.183.4.6]) by ecfrec.frec.bull.fr (Postfix) with ESMTP id 211EF19D904; Fri, 28 Jan 2005 14:28:14 +0100 (CET) Received: from frecb000711.frec.bull.fr ([129.183.101.50]) by ecn002.frec.bull.fr (Lotus Domino Release 5.0.12) with ESMTP id 2005012814360982:1171 ; Fri, 28 Jan 2005 14:36:09 +0100 Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) From: Guillaume Thouvenin To: Kaigai Kohei Cc: Paul Jackson , erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, LSE-Tech , guillaume.thouvenin@bull.net In-Reply-To: <41FA330A.2030303@ak.jp.nec.com> References: <41F8E117.5030501@ak.jp.nec.com> <20050127081753.5a9d16af.pj@sgi.com> <41FA330A.2030303@ak.jp.nec.com> Date: Fri, 28 Jan 2005 14:27:44 +0100 Message-Id: <1106918864.8419.52.camel@frecb000711.frec.bull.fr> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 X-MIMETrack: Itemize by SMTP Server on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 28/01/2005 14:36:09, Serialize by Router on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 28/01/2005 14:36:44, Serialize complete at 28/01/2005 14:36:44 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Scanned: by amavisd-new at frec.bull.fr X-Virus-Status: Clean X-archive-position: 83 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: guillaume.thouvenin@bull.net Precedence: bulk X-list: pagg On Fri, 2005-01-28 at 21:41 +0900, Kaigai Kohei wrote: > But my main subject is not this point. The purpose of those patches is > to restrain incrementation of hook functions in fork() or exit(). > I used PAGG for this, as a common event handling framework. I agree with this point. It seems that several applications need hook functions in fork() or/and exit(). I can give example like CSA, ELSA, CKRM, CpuSet, LSM or Dprobes. Thus, if I need a hook in fork() for my accounting application, ELSA for example , and if I don't want to add my own hook, PAGG is a solution. AFAIU, I can't use LSM hooks because it's a security framework, I can't use Dprobes because it's a debugging framework and the hooks used by CpuSet and CKRM don't allow any registration. There was also another project called kernelhooks (the former GKHI I think) but I don't know if it's still maintained... Best, Guillaume From kaigai@ak.jp.nec.com Mon Jan 31 01:40:24 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 31 Jan 2005 01:40:32 -0800 (PST) Received: from tyo202.gate.nec.co.jp (TYO202.gate.nec.co.jp [202.32.8.202]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0V9eNjf028927 for ; Mon, 31 Jan 2005 01:40:24 -0800 Received: from mailgate3.nec.co.jp (mailgate53.nec.co.jp [10.7.69.162] (may be forged)) by tyo202.gate.nec.co.jp (8.11.7/3.7W01080315) with ESMTP id j0V9crM07760; Mon, 31 Jan 2005 18:38:53 +0900 (JST) Received: (from root@localhost) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) id j0V9cre19416; Mon, 31 Jan 2005 18:38:53 +0900 (JST) Received: from mailsv.bs1.fc.nec.co.jp (venus.hpc.bs1.fc.nec.co.jp [10.34.77.164]) by mailsv4.nec.co.jp (8.11.7/3.7W-MAILSV4-NEC) with ESMTP id j0V9crE29785; Mon, 31 Jan 2005 18:38:53 +0900 (JST) Received: from mailsv.linux.bs1.fc.nec.co.jp (IDENT:postfix@namesv2.linux.bs1.fc.nec.co.jp [10.34.125.2]) by mailsv.bs1.fc.nec.co.jp (8.12.10/3.7W-HPC5.2F(mailsv)04081615) with ESMTP id j0V9UoIK011586; Mon, 31 Jan 2005 18:30:51 +0900 (JST) Received: from [10.34.125.249] (sanma.linux.bs1.fc.nec.co.jp [10.34.125.249]) by mailsv.linux.bs1.fc.nec.co.jp (Postfix) with ESMTP id C127B30987; Mon, 31 Jan 2005 18:38:51 +0900 (JST) Message-ID: <41FDFCDC.8080504@ak.jp.nec.com> Date: Mon, 31 Jan 2005 18:39:40 +0900 From: Kaigai Kohei User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: ja, en-us, en MIME-Version: 1.0 To: Paul Jackson Cc: erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) References: <41F8E117.5030501@ak.jp.nec.com> <20050127081753.5a9d16af.pj@sgi.com> <41FA330A.2030303@ak.jp.nec.com> <20050128050807.24018fb3.pj@sgi.com> In-Reply-To: <20050128050807.24018fb3.pj@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 84 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kaigai@ak.jp.nec.com Precedence: bulk X-list: pagg Thanks for your comments. >>But what I really wanted is a Common fork()/exit() event handling framework. > > Could you expand on this a bit? Especially since you acknowledge that loadable > modules are not particularly essential to your work, I am curious as to what > else you find valuable in such a fork/exit framework. If we can implement some advanced features (CpuSet, CSA+Job, CKRM, etc...) as a kernel loadable module, it's best I also think. But using the hooks in fork()/exit() is better than patching to fork.c or exit.c for each feature, even though it can't be implemented as a kernel loadable module. Because we need not modify kernel/fork.c or kernel/exit.c directly. For example, we must append individually cpuset_fork() for CpuSet, pagg_attach() for PAGG(CSA+Job), ckrm_cb_fork() for CKRM in kernel/fork.c when we try to use those advanced features. In this case, we need to patch into three points in kernel/fork.c. But if we have a common purpose hook in kernel/fork.c, those advanced features does not need to modify kernel/fork.c directly. They have only to register their own event handler for the fork-hook. In short, my motivation is to integrate the hooks plugged ramdomly in kernel/fork.c and so on. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei From guillaume.thouvenin@bull.net Mon Jan 31 02:30:50 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 31 Jan 2005 02:30:55 -0800 (PST) Received: from ecfrec.frec.bull.fr (ecfrec.frec.bull.fr [129.183.4.8]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0VAUn3G032710 for ; Mon, 31 Jan 2005 02:30:50 -0800 Received: from localhost (localhost [127.0.0.1]) by ecfrec.frec.bull.fr (Postfix) with ESMTP id F2B0C19D907; Mon, 31 Jan 2005 11:30:43 +0100 (CET) Received: from ecfrec.frec.bull.fr ([127.0.0.1]) by localhost (ecfrec.frec.bull.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 31703-04; Mon, 31 Jan 2005 11:30:41 +0100 (CET) Received: from ecn002.frec.bull.fr (ecn002.frec.bull.fr [129.183.4.6]) by ecfrec.frec.bull.fr (Postfix) with ESMTP id 8CD7819D906; Mon, 31 Jan 2005 11:30:41 +0100 (CET) Received: from frecb000711.frec.bull.fr ([129.183.101.50]) by ecn002.frec.bull.fr (Lotus Domino Release 5.0.12) with ESMTP id 2005013111375482:571 ; Mon, 31 Jan 2005 11:37:54 +0100 Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) From: Guillaume Thouvenin To: Kaigai Kohei Cc: Paul Jackson , erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, LSE-Tech , guillaume.thouvenin@bull.net In-Reply-To: <41FDFCDC.8080504@ak.jp.nec.com> References: <41F8E117.5030501@ak.jp.nec.com> <20050127081753.5a9d16af.pj@sgi.com> <41FA330A.2030303@ak.jp.nec.com> <20050128050807.24018fb3.pj@sgi.com> <41FDFCDC.8080504@ak.jp.nec.com> Date: Mon, 31 Jan 2005 11:29:26 +0100 Message-Id: <1107167366.8473.20.camel@frecb000711.frec.bull.fr> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 X-MIMETrack: Itemize by SMTP Server on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 31/01/2005 11:37:54, Serialize by Router on ECN002/FR/BULL(Release 5.0.12 |February 13, 2003) at 31/01/2005 11:39:13, Serialize complete at 31/01/2005 11:39:13 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Scanned: by amavisd-new at frec.bull.fr X-Virus-Status: Clean X-archive-position: 85 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: guillaume.thouvenin@bull.net Precedence: bulk X-list: pagg On Mon, 2005-01-31 at 18:39 +0900, Kaigai Kohei wrote: > For example, we must append individually cpuset_fork() for CpuSet, > pagg_attach() for PAGG(CSA+Job), ckrm_cb_fork() for CKRM in kernel/fork.c > when we try to use those advanced features. > In this case, we need to patch into three points in kernel/fork.c. > But if we have a common purpose hook in kernel/fork.c, those advanced > features does not need to modify kernel/fork.c directly. > They have only to register their own event handler for the fork-hook. Thus in this case, the interesting aspect of PAGG is not the "container" aspect but it's the "hook manager" aspect, right? PAGG has only a callback for "exec" and not for "fork". So, if you want to use PAGG has a common hook in "fork" for several applications, we need to add new hook (like pagg_fork in kernel/fork.c:do_fork()). Is it be possible to split PAGG into two pieces: 1. The container manager part 2. The hook manager part Thanks, Guillaume From pj@sgi.com Mon Jan 31 03:07:21 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 31 Jan 2005 03:07:27 -0800 (PST) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0VB7JjR006024 for ; Mon, 31 Jan 2005 03:07:19 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0VB7IxT011497 for ; Mon, 31 Jan 2005 05:07:18 -0600 Received: from vpn2 (mtv-vpn-hw-pj-2.corp.sgi.com [134.15.25.219]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with SMTP id j0VB7GgW17494730; Mon, 31 Jan 2005 03:07:16 -0800 (PST) Date: Mon, 31 Jan 2005 03:07:15 -0800 From: Paul Jackson To: Kaigai Kohei Cc: erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) Message-Id: <20050131030715.05cbb981.pj@sgi.com> In-Reply-To: <41FDFCDC.8080504@ak.jp.nec.com> References: <41F8E117.5030501@ak.jp.nec.com> <20050127081753.5a9d16af.pj@sgi.com> <41FA330A.2030303@ak.jp.nec.com> <20050128050807.24018fb3.pj@sgi.com> <41FDFCDC.8080504@ak.jp.nec.com> Organization: SGI X-Mailer: Sylpheed version 1.0.0 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 86 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: pj@sgi.com Precedence: bulk X-list: pagg Thank-you, Kaigai Kohei, for taking the time to explain your motivation for preferring PAGG to hook fork/exit. In your initial post a few days ago, you wrote: > We want to use Job(and CSA) or CpuSet without specific patches. > And, it's so important to adopt the PAGG framework into the stock kernel. I agree that it would be nice if cpusets didn't require a specific kernel patch. It would make the job of getting cpusets accepted into mainstream Linux much easier. However I think that there is no way that you can use cpusets without the specific cpuset patch. Even with your patches to allow the cpuset fork/exit hooks to be done using PAGG, you still have the other, more specialized, cpuset hooks to consider. > (CpuSet-patch needs lockless references.) I am a little surprised that the fork/exit cpuset hooks must be lockless. Are you talking about the cpuset patch that is in Andrew Morton's *-mm kernels of the last few months, or some other CpuSet patch? The cpuset_fork() just does an atomic_inc() of a reference count, so doesn't care what locks are held when it is called. But the cpuset_exit() code can grab the cpuset semaphore if the last task using a cpuset exits, when one needs to consider invoking notify_on_release. I thought it was ok to nest semaphores inside semaphores (so long as you respect an order, so that you can't deadlock), so I don't understand why you needed to replace that pagg semaphore with an rcu section. In your patch 3 of 3, you wrote: > Some functions, like as CpuSet, fit the PAGG framework, I think. I am sure that my colleagues at SGI who are supporting PAGG hope that you are right. However I still don't see it. Call-by-string-name dynamically evaluated invocations are not necessarily better or worse than simple, hard coded, directly linked function calls. They _are_ more expensive, by far, and more complex and obscure, which impairs the ease of both reading and debugging code. They _have_ to provide some balancing benefit to be justified. If something can be made entirely a loadable module, requiring no specific patches (to use your nice phrase) then that might be such a benefit. Until you can dynamically plug each of the following hooks: int cpuset_init(void); void cpuset_init_smp(void); void cpuset_fork(struct task_struct *p); void cpuset_exit(struct task_struct *p); const cpumask_t cpuset_cpus_allowed(const struct task_struct *p); void cpuset_init_current_mems_allowed(void); void cpuset_update_current_mems_allowed(void); void cpuset_restrict_to_mems_allowed(unsigned long *nodes); int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl); int cpuset_zone_allowed(struct zone *z); struct file_operations proc_cpuset_operations; char *cpuset_task_status_allowed(struct task_struct *task, char *buffer); I think that you will require a cpuset specific patch. Am I missing something ?? Aside -- if you do value cpusets, please put in a good word for them with Andrew Morton, on lkml perhaps. He will _not_ further the advance of cpusets unless others outside SGI ask for them eagerly. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373, 1.925.600.0401 From pj@sgi.com Mon Jan 31 03:35:03 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 31 Jan 2005 03:35:10 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0VBZ2MQ006922 for ; Mon, 31 Jan 2005 03:35:03 -0800 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id j0VD3kxC027127 for ; Mon, 31 Jan 2005 05:03:46 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by nodin.corp.sgi.com (SGI-8.12.5/8.12.10/SGI_generic_relay-1.2) with ESMTP id j0VBZ2tj28207918 for ; Mon, 31 Jan 2005 03:35:02 -0800 (PST) Received: from vpn2 (mtv-vpn-hw-pj-2.corp.sgi.com [134.15.25.219]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with SMTP id j0VBY0gW17478500; Mon, 31 Jan 2005 03:34:00 -0800 (PST) Date: Mon, 31 Jan 2005 03:34:00 -0800 From: Paul Jackson To: Guillaume Thouvenin Cc: kaigai@ak.jp.nec.com, erikj@subway.americas.sgi.com, pagg@oss.sgi.com, limin@dbear.engr.sgi.com, lse-tech@lists.sourceforge.net, guillaume.thouvenin@bull.net Subject: Re: CpuSet on PAGG (Re: PAGG in Open Source projects?) Message-Id: <20050131033400.3e14c4d3.pj@sgi.com> In-Reply-To: <1107167366.8473.20.camel@frecb000711.frec.bull.fr> References: <41F8E117.5030501@ak.jp.nec.com> <20050127081753.5a9d16af.pj@sgi.com> <41FA330A.2030303@ak.jp.nec.com> <20050128050807.24018fb3.pj@sgi.com> <41FDFCDC.8080504@ak.jp.nec.com> <1107167366.8473.20.camel@frecb000711.frec.bull.fr> Organization: SGI X-Mailer: Sylpheed version 1.0.0 (GTK+ 1.2.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 87 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: pj@sgi.com Precedence: bulk X-list: pagg Guillaume writes: > Is it be possible to split PAGG into two pieces: > 1. The container manager part > 2. The hook manager part But the "container manager" is simply the projection of the hook manager onto the set of tasks . In other, less obfuscated terms, I mean that two tasks are in the same "container" iff they have the same hooks. So with the current implementation, I doubt they split. We have two ways of looking at one mechanism, not two mechanisms. Or at least, that's my understanding of PAGG (I could easily be mistaken here - beware). I could see some refactoring being of benefit here, however. I'd be tempted to consider, if these were my projects: 1) Implementing the 'container manager' using a simple integer id field in the task struct, some sort of "job id" (jid), with associated getjid system call, similar to gettid and getpid, and "Jid: " /proc/*/status field. 2) Continuing to work with the others doing system accounting data collection to integrate CSA, as I see others already doing. 3) For any resource management aspects, work with CKRM. The hook manager is an implementation intended to support loadable kernel modules doing some of this work. I will be surprised if these can be done this way, and do not share the enthusiasm of my SGI colleagues for this mechanism. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.650.933.1373, 1.925.600.0401