From kingsley@sw.oz.au Mon Dec 13 15:23:39 2004 Received: with ECARTIS (v1.0.0; list pagg); Mon, 13 Dec 2004 15:23:47 -0800 (PST) Received: from smtp.sw.oz.au (IDENT:FWUSER@alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id iBDNNHKF031388 for ; Mon, 13 Dec 2004 15:23:38 -0800 Received: from kingsley.sw.oz.au (kingsley.sw.oz.au [192.41.203.97]) by smtp.sw.oz.au with ESMTP id iBDNLwSP018941; Tue, 14 Dec 2004 10:21:58 +1100 (EST) Received: from kingsley.sw.oz.au (localhost.localdomain [127.0.0.1]) by kingsley.sw.oz.au (8.13.1/8.12.10) with ESMTP id iBDNLwo1017604; Tue, 14 Dec 2004 10:21:58 +1100 Received: (from kingsley@localhost) by kingsley.sw.oz.au (8.13.1/8.13.1/Submit) id iBDNLvpL017603; Tue, 14 Dec 2004 10:21:57 +1100 Date: Tue, 14 Dec 2004 10:21:56 +1100 From: Kingsley Cheung To: erikj@subway.americas.sgi.com Cc: pagg@oss.sgi.com Subject: [patch 1/2] Re: PAGG Threading Issues Message-ID: <20041213232156.GD21157@aurema.com> References: <20041108033648.GB18308@aurema.com> <20041108230039.GE18308@aurema.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="17pEHd4RhPHOinZp" Content-Disposition: inline In-Reply-To: <20041108230039.GE18308@aurema.com> User-Agent: Mutt/1.4.1i X-Scanned-By: MIMEDefang 2.48 on 192.41.203.35 X-Virus-Scanned: ClamAV 0.80/624/Thu Dec 9 11:01:06 2004 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 55 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg --17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Erik, I've had a chance to look at the issue and have come up with two patches. This email has the first of two patches. This patch addresses the first problem described below by using the "thread" macros during registration and deregistration. It also makes a check to skip exiting processes. On Tue, Nov 09, 2004 at 10:00:39AM +1100, Kingsley Cheung wrote: > Hi Erik, > > Some other issues as well. Please correct me if I'm wrong :) > > 1) I've noticed that pagg_detach() has been moved from > __put_task_struct() to do_exit() between the pagg patches for Linux > 2.6.8. This occurred as described at > http://oss.sgi.com/archives/pagg/2004-08/msg00002.html > > Anyway, I think the move might have raised a problem with the > registration code. Below is the relevant segment of code I'd like to > consider: > > get_task_struct(p); > read_unlock(&tasklist_lock); > down_write(&p->pagg_sem); > paggp = pagg_get(p, pagg_hook_new->name); > if (paggp == NULL) { > paggp = pagg_alloc(p, pagg_hook_new); > if (paggp != NULL) > init_result = pagg_hook_new->init(p, paggp); > else > init_result = -ENOMEM; > } > up_write(&p->pagg_sem); > read_lock(&tasklist_lock); > /* Like in remove_client_paggs_from_all_tasks, if the task > * disappeared on us while we were going through the > * for_each_process loop, we need to start over with that loop. > * That's why we have the list_empty here */ > task_exited = list_empty(&p->tasks); > put_task_struct(p); > > Now the tasks on the task list include those that are zombies. This > occurs during exit_notify(), which sets their state to ZOMBIE but > leaves them on the task list. Therefore the above code will allocate > paggs for zombie tasks as well. [Note that task_exited won't trigger > as the task is still on the task list.] > > Now unfortunately, by moving pagg_detach() out of __put_task_struct(), > when the zombie has its task structure released, the pagg that was > allocated during registration is no longer detached. Thus we'll could > have a memory leak once the task structure is dropped and > detach_pagg() is not invoked. > > In fact the leak could happen not just for zombies, but for tasks just > about to exit. The race occurs in do_exit(), in which the > detach_pagg() could be called before we allocate the pagg. > > Perhaps the quickest solution is to place a detach_pagg() call back > into __put_task_struct(). Another suggestion might be to check for > the PF_EXITING flag when allocating the pagg. Thanks, -- Kingsley --17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="linux-2.6.9-pagg-threading.patch" Index: linux/kernel/pagg.c diff -u -r1.1.2.2 pagg.c --- linux/kernel/pagg.c 15 Nov 2004 01:54:58 -0000 1.1.2.2 +++ linux/kernel/pagg.c 8 Dec 2004 06:25:22 -0000 @@ -172,11 +172,12 @@ * happens at most once for each registered client. */ while (atomic_read(&php->refcnt) != 0) { - struct task_struct *p = NULL; + struct task_struct *g = NULL, *p = NULL; read_lock(&tasklist_lock); - for_each_process(p) { + do_each_thread(g, p) { struct pagg *paggp; + int task_exited; get_task_struct(p); read_unlock(&tasklist_lock); @@ -191,15 +192,14 @@ /* If a PAGG got removed from the list while we're going through * each process, the tasks list for the process would be empty. In - * that case, break out of this for_each_process so we can do it + * that case, break out of this for_each_thread so we can do it * again. */ - if (list_empty(&p->tasks)) { - put_task_struct(p); - break; - } else - put_task_struct(p); - - } + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (task_exited) + goto endloop; + } while_each_thread(g, p); + endloop: read_unlock(&tasklist_lock); } } @@ -258,8 +258,8 @@ /* Now we can call the initializer function (if present) for each task */ if (pagg_hook_new->init != NULL) { + struct task_struct *g = NULL, *p = NULL; int init_result = 0; - int task_exited = 0; /* Because of internal race conditions we can't gaurantee * getting every task in just one pass so we just keep going @@ -267,38 +267,38 @@ * of this should be tempered by the fact that this happens * at most once for each registered client. */ - do { - struct task_struct *p = NULL; - - read_lock(&tasklist_lock); - for_each_process(p) { - struct pagg *paggp; + read_lock(&tasklist_lock); + repeat: + do_each_thread(g, p) { + struct pagg *paggp; + int task_exited; - get_task_struct(p); - read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); - paggp = pagg_get(p, pagg_hook_new->name); - if (paggp == NULL) { - paggp = pagg_alloc(p, pagg_hook_new); - if (paggp != NULL) - init_result = pagg_hook_new->init(p, paggp); - else - init_result = -ENOMEM; - } - up_write(&p->pagg_sem); - read_lock(&tasklist_lock); - /* Like in remove_client_paggs_from_all_tasks, if the task - * disappeared on us while we were going through the - * for_each_process loop, we need to start over with that loop. - * That's why we have the list_empty here */ - task_exited = list_empty(&p->tasks); - put_task_struct(p); - if ((init_result != 0) || task_exited) { - break; - } - } + get_task_struct(p); read_unlock(&tasklist_lock); - } while ((init_result == 0) && task_exited); + down_write(&p->pagg_sem); + paggp = pagg_get(p, pagg_hook_new->name); + if (!paggp && !(p->flags & PF_EXITING)) { + paggp = pagg_alloc(p, pagg_hook_new); + if (paggp != NULL) + init_result = pagg_hook_new->init(p, paggp); + else + init_result = -ENOMEM; + } + up_write(&p->pagg_sem); + read_lock(&tasklist_lock); + /* Like in remove_client_paggs_from_all_tasks, if the task + * disappeared on us while we were going through the + * for_each_thread loop, we need to start over with that loop. + * That's why we have the list_empty here */ + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (init_result != 0) + goto endloop; + if (task_exited) + goto repeat; + } while_each_thread(g, p); + endloop: + read_unlock(&tasklist_lock); /* * if anything went wrong during initialisation abandon the --17pEHd4RhPHOinZp-- From kingsley@sw.oz.au Mon Dec 13 17:29:37 2004 Received: with ECARTIS (v1.0.0; list pagg); Mon, 13 Dec 2004 17:29:44 -0800 (PST) Received: from smtp.sw.oz.au (IDENT:FWUSER@alt.aurema.com [203.217.18.57]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id iBE1TF21004414 for ; Mon, 13 Dec 2004 17:29:36 -0800 Received: from kingsley.sw.oz.au (kingsley.sw.oz.au [192.41.203.97]) by smtp.sw.oz.au with ESMTP id iBE1SSSP014681; Tue, 14 Dec 2004 12:28:28 +1100 (EST) Received: from kingsley.sw.oz.au (localhost.localdomain [127.0.0.1]) by kingsley.sw.oz.au (8.13.1/8.12.10) with ESMTP id iBE1SS93018196; Tue, 14 Dec 2004 12:28:28 +1100 Received: (from kingsley@localhost) by kingsley.sw.oz.au (8.13.1/8.13.1/Submit) id iBE1SSQ4018195; Tue, 14 Dec 2004 12:28:28 +1100 Date: Tue, 14 Dec 2004 12:28:28 +1100 From: Kingsley Cheung To: Erik Jacobson Cc: pagg@oss.sgi.com Subject: [patch 2/2] Re: PAGG Threading Issues Message-ID: <20041214012828.GE21157@aurema.com> References: <20041108033648.GB18308@aurema.com> <20041108230039.GE18308@aurema.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="CblX+4bnyfN0pR09" Content-Disposition: inline In-Reply-To: <20041108230039.GE18308@aurema.com> User-Agent: Mutt/1.4.1i X-Scanned-By: MIMEDefang 2.48 on 192.41.203.35 X-Virus-Scanned: ClamAV 0.80/627/Sun Dec 12 11:53:11 2004 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 56 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: kingsley@aurema.com Precedence: bulk X-list: pagg --CblX+4bnyfN0pR09 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Erik, This is the second of the patches. I believe it is in the best interest of PAGG to be remain as container for Linux tasks and to let PAGG clients decide whether they are interested in thread groups (equivalent to processes) or individual threads. To achieve this we need to ensure we only attach individual tasks _after_ their thread group has been established in the fork code. For this reason this patch moves the fork attach to the end of copy_process(). Since this change can mean that we race with the registration procedure, the attach code also checks to ensure that tasks are not attached twice. Another change the patch makes is to continue attachments for other clients in pagg_attach even if the attach callback fails for one client. IMHO each client should be permitted to operate independently of each other. On Tue, Nov 09, 2004 at 10:00:39AM +1100, Kingsley Cheung wrote: > 2) Is pagg intended to be a container for Linux tasks or thread > groups? I'm avoiding the use of the word "process" as that could be > confusing in this context. > > The reason I'm asking is because throughout the pagg registration code > the for_each_process() macro is used. This happens in > remove_client_paggs_from_all_tasks() and pagg_hook_register(). The > problem is, however, since 2.6 for_each_process() only traverses > thread group leader tasks. Thus the registration and deregistration > code only considers thread group leaders. > > In contrast the pagg_attach() and pagg_detach() hooks are placed to > catch all tasks in the fork and exit code. With the way the > registration and deregistration code is behaving, this means that we > won't catch all existing tasks and we won't clean up after all tasks > properly for multi-threaded applications. > > So what is pagg intended for? To catch every single task? In that > case the do_each_thread() and while_each_thread() macros should be > used. Or is pagg a container for thread groups only? The changes for > that would require much more thought... Thanks, -- Kingsley --CblX+4bnyfN0pR09 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="linux-2.6.9-pagg-attach.patch" --- linux-2.6.5-7.97ZP2RC3/kernel/fork.c 2004-12-03 10:37:15.000000000 +1100 +++ linux-2.6.5-7.97ZP2RC4/kernel/fork.c 2004-12-08 16:59:26.480431272 +1100 @@ -1051,12 +1051,6 @@ sched_fork(p); /* - * call pagg modules to properly attach new process to the same - * process aggregate containers as the parent process. - */ - pagg_attach(p, current); - - /* * Ok, make it visible to the rest of the system. * We dont wake it up yet. */ @@ -1131,13 +1125,20 @@ write_unlock_irq(&tasklist_lock); retval = 0; + /* + * call pagg modules to properly attach new process to the + * same process aggregate containers as the parent process. + * We do this after its thread group has been assigned to + * permit clients to distinguish thread groups clearly. + */ + pagg_attach(p, current); + fork_out: if (retval) return ERR_PTR(retval); return p; bad_fork_cleanup_namespace: - pagg_detach(p); exit_namespace(p); bad_fork_cleanup_mm: exit_mm(p); --- linux-2.6.5-7.97ZP2RC3/kernel/pagg.c 2004-12-03 10:37:15.000000000 +1100 +++ linux-2.6.5-7.97ZP2RC4/kernel/pagg.c 2004-12-08 17:04:08.512555872 +1100 @@ -261,7 +261,7 @@ struct task_struct *g = NULL, *p = NULL; int init_result = 0; - /* Because of internal race conditions we can't gaurantee + /* Because of internal race conditions we can't guarantee * getting every task in just one pass so we just keep going * until we don't find any unitialized tasks. The inefficiency * of this should be tempered by the fact that this happens @@ -389,32 +389,33 @@ { struct pagg *from_pagg; - /* lock the parents pagg_list we are copying from */ - down_read(&from_task->pagg_sem); /* read lock the pagg list */ + /* Lock the parents pagg_list we are copying from. We will be + * on the task list already and may have competition with + * registering clients, so lock our own list. + */ + down_read(&from_task->pagg_sem); + down_write(&to_task->pagg_sem); list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { struct pagg *to_pagg = NULL; - to_pagg = pagg_alloc(to_task, from_pagg->hook); - if (!to_pagg) { - goto error_return; - } - if (to_pagg->hook->attach(to_task, to_pagg, from_pagg->data) != 0 ) - goto error_return; - } - - up_read(&from_task->pagg_sem); /* unlock the pagg list */ + /* Raced with allocation from registration. */ + if (pagg_get(to_task, from_pagg->hook->name) != NULL) + continue; + + /* Break on memory allocation failure. */ + if (!(to_pagg = pagg_alloc(to_task, from_pagg->hook))) + break; - return; /* success */ + /* Do not detach all attachments on error. This + * behaviour is inconsistent with registration. + */ + if (to_pagg->hook->attach(to_task, to_pagg, from_pagg->data)) + pagg_free(to_pagg); + } - error_return: - /* - * Clean up all the pagg attachments made on behalf of the new - * task. Set new task pagg ptr to NULL for return. - */ + up_write(&to_task->pagg_sem); up_read(&from_task->pagg_sem); /* unlock the pagg list */ - __pagg_detach(to_task); - return; /* failure */ } /** @@ -437,9 +438,9 @@ pagg->hook->detach(task, pagg); pagg_free(pagg); } - + up_write(&task->pagg_sem); /* write unlock the pagg list */ - + return; /* 0 = success, else return last code for failure */ } @@ -459,7 +460,6 @@ { struct pagg *pagg; - /* lock the parents pagg_list we are copying from */ down_read(&task->pagg_sem); /* lock the pagg list */ list_for_each_entry(pagg, &task->pagg_list, entry) { --CblX+4bnyfN0pR09-- From erikj@subway.americas.sgi.com Tue Dec 21 11:25:17 2004 Received: with ECARTIS (v1.0.0; list pagg); Tue, 21 Dec 2004 11:25:24 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id iBLJOudo004839 for ; Tue, 21 Dec 2004 11:25:17 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id iBLKlEHN030598 for ; Tue, 21 Dec 2004 12:47:15 -0800 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id iBLJOUCK4807032; Tue, 21 Dec 2004 13:24:30 -0600 (CST) Received: from subway.americas.sgi.com (subway.americas.sgi.com [128.162.236.152]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id iBLJOTtC18328778; Tue, 21 Dec 2004 13:24:29 -0600 (CST) Received: from subway.americas.sgi.com (localhost [127.0.0.1]) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/erikj-IRIX6519-news) with ESMTP id iBLJOR0i673717; Tue, 21 Dec 2004 13:24:27 -0600 (CST) Received: from localhost (erikj@localhost) by subway.americas.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id iBLJOQmH970929; Tue, 21 Dec 2004 13:24:27 -0600 (CST) Date: Tue, 21 Dec 2004 13:24:26 -0600 From: Erik Jacobson To: kingsley@aurema.com cc: pagg@oss.sgi.com Subject: Re: PAGG Threading Issues (was Re: [patch] Registration Check for Compulsory Hooks) In-Reply-To: <20041110064053.GB6092@aurema.com> Message-ID: References: <20041108033648.GB18308@aurema.com> <20041108230039.GE18308@aurema.com> <20041110064053.GB6092@aurema.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV 0.80/627/Sun Dec 12 11:53:11 2004 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 57 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@subway.americas.sgi.com Precedence: bulk X-list: pagg Hi. We got your patches and I added them to our Bugzilla bug. I'm sorry for the long delay. We've been up against some deadlines. We can discuss the patches in the bug. http://oss.sgi.com/bugzilla On Wed, 10 Nov 2004 kingsley@aurema.com wrote: > On Tue, Nov 09, 2004 at 03:02:24PM -0600, Erik Jacobson wrote: > > I think you make some good points here and I honestly need to do some > > research (and probably talk to some co-workers) before I can get you a > > good response. We've got some deadlines we're up against right now. > > > > Would it be ok if I defer this for a couple days? > > Sure thing, that's fine with me. > > > > > If you want to be sure the ball isn't dropped, you could file a Bugzilla > > bug: > > > > http://oss.sgi.com/bugzilla/ > > > > Okay, I might do that. > > > In any case, I hope to get back to you in a few days. If you need a > > response sooner let me know and I'll try to shuffle some things around. > > > > On Tue, 9 Nov 2004, Kingsley Cheung wrote: > > > > > Hi Erik, > > > > > > Some other issues as well. Please correct me if I'm wrong :) > (snip) > > Thanks, > -- > Kingsley > -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From limin@dbear.engr.sgi.com Fri Dec 24 15:19:12 2004 Received: with ECARTIS (v1.0.0; list pagg); Fri, 24 Dec 2004 15:19:19 -0800 (PST) Received: from omx2.sgi.com (omx2-ext.sgi.com [192.48.171.19]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id iBONIqGS022610 for ; Fri, 24 Dec 2004 15:19:12 -0800 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by omx2.sgi.com (8.12.11/8.12.9/linux-outbound_gateway-1.1) with ESMTP id iBP0hWL6000654 for ; Fri, 24 Dec 2004 16:43:32 -0800 Received: from dbear.engr.sgi.com (dbear.engr.sgi.com [163.154.18.85]) by cthulhu.engr.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id iBONKMFw1707012; Fri, 24 Dec 2004 15:20:22 -0800 (PST) Received: (from limin@localhost) by dbear.engr.sgi.com (8.11.0/8.11.0) id iBONKMf02768; Fri, 24 Dec 2004 15:20:22 -0800 From: Limin Gu Message-Id: <200412242320.iBONKMf02768@dbear.engr.sgi.com> Subject: New job patch for 2.6.9 and job rpms are available now To: pagg@oss.sgi.com Date: Fri, 24 Dec 2004 15:20:22 -0800 (PST) Cc: limin@dbear.engr.sgi.com (Limin Gu), jlan@dbear.engr.sgi.com (Jay Lan), erikj@dbear.engr.sgi.com (Erik Jacobson), jh@dbear.engr.sgi.com (John Hesterberg) X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/638/Tue Dec 21 14:41:34 2004 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 58 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: limin@dbear.engr.sgi.com Precedence: bulk X-list: pagg Hi There, Merry Christmas and Happy New Year! Here is your Christmas gift ;-) Find the 'linux-2.6.9-job.patch', 'job-1.5.0-0.1.i386.rpm' and 'job-1.5.0-0.1.src.rpm' at this loaction: ftp://oss.sgi.com/projects/pagg/download As usual job depends on pagg, linux-2.6.9-job.patch should apply to 2.6.9 kernel after you have already applied linux-2.6.9-pagg.patch-2. New feature and changes in linux-2.6.9-job.patch: - A new ioctl call job_sys_attachpid() is added. This function allows a process which does not belong to any job to attach itself to the specified existing job. The user needs CAP_SYS_RESOURCE capability for this to succeed. - Job header filenames were confusing, so the following change has been made. include/linux/paggctl.h -> include/linux/jobctl.h (job ioctl call headers) include/linux/job.h -> include/linux/job_acct.h (job and its acct subscribers) - Job code cleanup based on the LKML community feedback, including purging unused defines, removing lock debug macros, removing unnecessary initialization, fixing oboslete codes, converting to kernel-doc comment style. - Fix a small bug: a missed unlock in case kmalloc fails in job_attach(). New feature in job-1.5.0-0.1 rpm: - A new job library call job_attachpid(). - Also a new job command jattach. - Updated manpages to reflect above changes. The userland job-1.5.0-0.1 rpm should only work with the new linux-2.6.9-job.patch. Please visit http://oss.sgi.com/projects/pagg for general information about PAGG and job. Happy Holidays! Limin Gu - Linux System Software - Silicon Graphics