From erikj@sgi.com Mon Dec 5 13:53:51 2005 Received: with ECARTIS (v1.0.0; list pagg); Mon, 05 Dec 2005 13:53:54 -0800 (PST) Received: from omx1.americas.sgi.com (omx1-ext.sgi.com [192.48.179.11]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jB5LrpqY013811 for ; Mon, 5 Dec 2005 13:53:51 -0800 Received: from flecktone.americas.sgi.com (flecktone.americas.sgi.com [198.149.16.15]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with ESMTP id jB5JogxT005066 for ; Mon, 5 Dec 2005 13:50:42 -0600 Received: from thistle-e236.americas.sgi.com (thistle-e236.americas.sgi.com [128.162.236.204]) by flecktone.americas.sgi.com (8.12.9/8.12.10/SGI_generic_relay-1.2) with ESMTP id jB5JofDN21155873 for ; Mon, 5 Dec 2005 13:50:41 -0600 (CST) Received: from snoot.americas.sgi.com (hoot.americas.sgi.com [128.162.233.104]) by thistle-e236.americas.sgi.com (8.12.9/SGI-server-1.8) with ESMTP id jB5JnqpM4404970 for ; Mon, 5 Dec 2005 13:49:52 -0600 (CST) Received: by snoot.americas.sgi.com (Postfix, from userid 31161) id 4A6428C6EDD; Mon, 5 Dec 2005 13:50:39 -0600 (CST) Date: Mon, 5 Dec 2005 13:50:39 -0600 From: Erik Jacobson To: pagg@oss.sgi.com Subject: Possible reduced Job functionality coming - comments requested Message-ID: <20051205195039.GA18974@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.9i X-archive-position: 168 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: erikj@sgi.com Precedence: bulk X-list: pagg Some functionality in Linux Job may go away in order to make use of a possible replacement for PAGG/pnotify in the kernel (task notifiers from Jack Steiner if you follow lse-tech). There is opposition (not from me) to having locks in similar places that PAGG/pnotify had, so operations that allow a random process to operate on the Job data of another random process will no longer be available. This means these library calls may go away: job_detachjid job_detachpid job_attachpid These two job commands make use of the above functions and would also be purged: jattach jdetach Further, due to the same locking issues, looking up a JID given a PID will not be as efficient. If these operations are in any 'hot paths', performance will be reduced. I believe job_getjid calls will be the main issue here. It appears I may need to add two new library calls - a job_detach call that detaches the current process from a job, and a job_attach function that attaches the 'current' process to a supplied job. But these functions would only operateon the current running process, not other processes on the system. I need to be aware, ASAP, if we know of any customers, companies, or community people making use of functionality that may be purged or less efficient. If the functionality proposed for removal is known to be used by customers or the community, it would be justification for us to push for locking similar to PAGG/pnotify. Please let me know. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota From gh@us.ibm.com Thu Dec 15 11:53:36 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 15 Dec 2005 11:53:45 -0800 (PST) Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBFJrU8n031513 for ; Thu, 15 Dec 2005 11:53:36 -0800 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e36.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBFJnqRM023115 for ; Thu, 15 Dec 2005 14:49:52 -0500 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBFJn0Xg110416 for ; Thu, 15 Dec 2005 12:49:00 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jBFJnp2a007575 for ; Thu, 15 Dec 2005 12:49:52 -0700 Received: from w-gerrit.beaverton.ibm.com (sig-9-65-26-83.mts.ibm.com [9.65.26.83]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jBFJnpYc007566; Thu, 15 Dec 2005 12:49:51 -0700 Received: from localhost ([127.0.0.1] helo=us.ibm.com ident=gerrit) by w-gerrit.beaverton.ibm.com with esmtp (Exim 3.36 #1 (Debian)) id 1Emz6c-0006c3-00; Thu, 15 Dec 2005 11:49:34 -0800 To: Hubertus Franke , ckrm-tech@lists.sourceforge.net cc: linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net, vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com Reply-To: Gerrit Huizenga From: Gerrit Huizenga Subject: Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches In-reply-to: Your message of Thu, 15 Dec 2005 09:35:57 EST. <20051215143557.421393000@elg11.watson.ibm.com> Date: Thu, 15 Dec 2005 11:49:34 -0800 Message-Id: X-archive-position: 169 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: gh@us.ibm.com Precedence: bulk X-list: pagg On Thu, 15 Dec 2005 09:35:57 EST, Hubertus Franke wrote: > This patchset is a followup to the posting by Serge. > http://marc.theaimsgroup.com/?l=linux-kernel&m=113200410620972&w=2 > > In this patchset here, we are providing the pid virtualization mentioned > in serge's posting. > > > I'm part of a project implementing checkpoint/restart processes. > > After a process or group of processes is checkpointed, killed, and > > restarted, the changing of pids could confuse them. There are many > > other such issues, but we wanted to start with pids. > > > > This patchset introduces functions to access task->pid and ->tgid, > > and updates ->pid accessors to use the functions. This is in > > preparation for a subsequent patchset which will separate the kernel > > and virtualized pidspaces. This will allow us to virtualize pids > > from users' pov, so that, for instance, a checkpointed set of > > processes could be restarted with particular pids. Even though their > > kernel pids may already be in use by new processes, the checkpointed > > processes can be started in a new user pidspace with their old > > virtual pid. This also gives vserver a simpler way to fake vserver > > init processes as pid 1. Note that this does not change the kernel's > > internal idea of pids, only what users see. > > > > The first 12 patches change all locations which access ->pid and > > ->tgid to use the inlined functions. The last patch actually > > introduces task_pid() and task_tgid(), and renames ->pid and ->tgid > > to __pid and __tgid to make sure any uncaught users error out. > > > > Does something like this, presumably after much working over, seem > > mergeable? > > These patches build on top of serge's posted patches (if necessary > we can repost them here). > > PID Virtualization is based on the concept of a container. > The ultimate goal is to checkpoint/restart containers. > > The mechanism to start a container > is to 'echo "container_name" > /proc/container' which creates a new > container and associates the calling process with it. All subsequently > forked tasks then belong to that container. > There is a separate pid space associated with each container. > Only processes/task belonging to the same container "see" each other. > The exception is an implied default system container that has > a global view. > > The following patches accomplish 3 things: > 1) identify the locations at the user/kernel boundary where pids and > related ids ( pgrp, sessionids, .. ) need to be (de-)virtualized and > call appropriate (de-)virtualization functions. > 2) provide the virtualization implementation in these functions. > 3) implement a container object and a simple /proc interface to create one > 4) provide a per container /proc/fs > > -- Hubertus Franke (frankeh@watson.ibm.com) > -- Cedric Le Goater (clg@fr.ibm.com) > -- Serge E Hallyn (serue@us.ibm.com) > -- Dave Hansen (haveblue@us.ibm.com) I think this is actually quite interesting in a number of ways - it might actually be a way of cleanly addressing several current out of tree problems, several of which are indpendently (occasionally) striving for mainline adoption: vserver, openvz, cluster checkpoint/restart. I think perhaps this could also be the basis for a CKRM "class" grouping as well. Rather than maintaining an independent class affiliation for tasks, why not have a class devolve (evolve?) into a "container" as described here. The container provides much of the same grouping capabilities as a class as far as I can see. The right information would be availble for scheduling and IO resource management. The memory component of CKRM is perhaps a bit tricky still, but an overall strategy (can I use that word here? ;-) might be to use these "containers" as the single intrinsic grouping mechanism for vserver, openvz, application checkpoint/restart, resource management, and possibly others? Opinions, especially from the CKRM folks? This might even be useful to the PAGG folks as a grouping mechanism, similar to their jobs or containers. "This patchset solves multiple problems". gerrit From haveblue@us.ibm.com Thu Dec 15 12:06:46 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 15 Dec 2005 12:06:55 -0800 (PST) Received: from e2.ny.us.ibm.com ([32.97.182.142]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBFK6d8n000666 for ; Thu, 15 Dec 2005 12:06:45 -0800 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBFK2qOK029831 for ; Thu, 15 Dec 2005 15:02:52 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBFK2pRK096614 for ; Thu, 15 Dec 2005 15:02:51 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBFK2pej012080 for ; Thu, 15 Dec 2005 15:02:51 -0500 Received: from [10.8.0.15] (aurora.beaverton.ibm.com [9.47.17.16]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBFK2nK7011954; Thu, 15 Dec 2005 15:02:50 -0500 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Dave Hansen To: Gerrit Huizenga Cc: Hubertus Franke , ckrm-tech@lists.sourceforge.net, Linux Kernel Mailing List , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Date: Thu, 15 Dec 2005 12:02:41 -0800 Message-Id: <1134676961.22525.72.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-archive-position: 170 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: haveblue@us.ibm.com Precedence: bulk X-list: pagg On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote: > I think perhaps this could also be the basis for a CKRM "class" > grouping as well. Rather than maintaining an independent class > affiliation for tasks, why not have a class devolve (evolve?) into > a "container" as described here. Wasn't one of the grand schemes of CKRM to be able to have application instances be shared? For instance, running a single DB2, Oracle, or Apache server, and still accounting for all of the classes separately. If so, that wouldn't work with a scheme that requires process separation. But, sharing the application instances is probably mostly (only) important for databases anyway. I would imagine that most of the overhead in a server like an Apache instance is for the page cache for content, as well as a bit for Apache's executables themselves. The container schemes should be able to share page cache for both cases. The main issues would be managing multiple configurations, and the increased overhead from having more processes around than with a single server. There might also be some serious restrictions on containerized applications. For instance, taking a running application, moving it out of one container, and into another might not be feasible. Is this something that is common or desired in the current CKRM framework? -- Dave From gh@us.ibm.com Thu Dec 15 12:16:12 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 15 Dec 2005 12:16:14 -0800 (PST) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBFKGB8n001991 for ; Thu, 15 Dec 2005 12:16:12 -0800 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e35.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBFKCX4X000428 for ; Thu, 15 Dec 2005 15:12:33 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBFKBfXg110358 for ; Thu, 15 Dec 2005 13:11:41 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jBFKCXA1008312 for ; Thu, 15 Dec 2005 13:12:33 -0700 Received: from w-gerrit.beaverton.ibm.com (sig-9-65-26-83.mts.ibm.com [9.65.26.83]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jBFKCWeF008266; Thu, 15 Dec 2005 13:12:32 -0700 Received: from localhost ([127.0.0.1] helo=us.ibm.com ident=gerrit) by w-gerrit.beaverton.ibm.com with esmtp (Exim 3.36 #1 (Debian)) id 1EmzSZ-0007z2-00; Thu, 15 Dec 2005 12:12:15 -0800 To: Dave Hansen cc: Hubertus Franke , ckrm-tech@lists.sourceforge.net, Linux Kernel Mailing List , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com Reply-To: Gerrit Huizenga From: Gerrit Huizenga Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches In-reply-to: Your message of Thu, 15 Dec 2005 12:02:41 PST. <1134676961.22525.72.camel@localhost> Date: Thu, 15 Dec 2005 12:12:15 -0800 Message-Id: X-archive-position: 171 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: gh@us.ibm.com Precedence: bulk X-list: pagg On Thu, 15 Dec 2005 12:02:41 PST, Dave Hansen wrote: > On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote: > > I think perhaps this could also be the basis for a CKRM "class" > > grouping as well. Rather than maintaining an independent class > > affiliation for tasks, why not have a class devolve (evolve?) into > > a "container" as described here. > > Wasn't one of the grand schemes of CKRM to be able to have application > instances be shared? For instance, running a single DB2, Oracle, or > Apache server, and still accounting for all of the classes separately. > If so, that wouldn't work with a scheme that requires process > separation. Yes, it is. However, that may be a sub-case where a single, large server application actually jumps around from container to container. I consider that a detail (well, our DB2 folks don't but I'm all for solving one problem at a time ;-) and we can work some of that out later. They are less concerned about the application being shared or part of multiple "classes" simultaneously, as opposed to being appropriately resource contrained based on the (large) transactions that they are handling on behalf of a user. So, if it were possible to jump from one container to another dynamically, then the appropriate resource management stuff could be handled at some other level. > There might also be some serious restrictions on containerized > applications. For instance, taking a running application, moving it out > of one container, and into another might not be feasible. Is this > something that is common or desired in the current CKRM framework? Desired, but primarily for large server applications. And, I don't think I see much in this patch set that makes that infeasible. If containers are going to work, you are going to have to have a mechanism to get applications into them and to move them anyway, right? While it would be nice if that were dirt-cheap, if it isn't, applications may have to adapt their usage of them based on the cost. Not a big deal as I see it. gerrit From frankeh@watson.ibm.com Thu Dec 15 14:06:16 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 15 Dec 2005 14:06:31 -0800 (PST) Received: from e3.ny.us.ibm.com ([32.97.182.143]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBFM6B8n021294 for ; Thu, 15 Dec 2005 14:06:15 -0800 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e3.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBFM2Nrp001260 for ; Thu, 15 Dec 2005 17:02:23 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBFM2NFX115236 for ; Thu, 15 Dec 2005 17:02:23 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBFM2NTk026623 for ; Thu, 15 Dec 2005 17:02:23 -0500 Received: from elg11.watson.ibm.com (elg11.watson.ibm.com [9.2.142.11]) by d01av01.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBFM2N65026458; Thu, 15 Dec 2005 17:02:23 -0500 Subject: Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Hubertus Franke To: Gerrit Huizenga Cc: ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net, vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Date: Thu, 15 Dec 2005 17:02:20 -0500 Message-Id: <1134684140.28082.6.camel@elg11.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-22) Content-Transfer-Encoding: 7bit X-archive-position: 172 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: frankeh@watson.ibm.com Precedence: bulk X-list: pagg On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote: > On Thu, 15 Dec 2005 09:35:57 EST, Hubertus Franke wrote: > > PID Virtualization is based on the concept of a container. > > The ultimate goal is to checkpoint/restart containers. > > > > The mechanism to start a container > > is to 'echo "container_name" > /proc/container' which creates a new > > container and associates the calling process with it. All subsequently > > forked tasks then belong to that container. > > There is a separate pid space associated with each container. > > Only processes/task belonging to the same container "see" each other. > > The exception is an implied default system container that has > > a global view. > > > > The following patches accomplish 3 things: > > 1) identify the locations at the user/kernel boundary where pids and > > related ids ( pgrp, sessionids, .. ) need to be (de-)virtualized and > > call appropriate (de-)virtualization functions. > > 2) provide the virtualization implementation in these functions. > > 3) implement a container object and a simple /proc interface to create one > > 4) provide a per container /proc/fs > > > > -- Hubertus Franke (frankeh@watson.ibm.com) > > -- Cedric Le Goater (clg@fr.ibm.com) > > -- Serge E Hallyn (serue@us.ibm.com) > > -- Dave Hansen (haveblue@us.ibm.com) > > I think this is actually quite interesting in a number of ways - it > might actually be a way of cleanly addressing several current out > of tree problems, several of which are indpendently (occasionally) striving > for mainline adoption: vserver, openvz, cluster checkpoint/restart. Indeed the entire set might be able to benefit wrt to pid virtualization. I think we are quite open to embrace a larger set of applications of pid virtualization. > I think perhaps this could also be the basis for a CKRM "class" > grouping as well. Rather than maintaining an independent class > affiliation for tasks, why not have a class devolve (evolve?) into > a "container" as described here. The container provides much of > the same grouping capabilities as a class as far as I can see. The > right information would be availble for scheduling and IO resource > management. The memory component of CKRM is perhaps a bit tricky > still, but an overall strategy (can I use that word here? ;-) might > be to use these "containers" as the single intrinsic grouping mechanism > for vserver, openvz, application checkpoint/restart, resource > management, and possibly others? > > Opinions, especially from the CKRM folks? This might even be useful > to the PAGG folks as a grouping mechanism, similar to their jobs or > containers. > Not being to alien to the CKRM concept, yes there is some nice synergy here. As well as to PAGG and SGI's jobs. CKRM provides resource constraints and runtime enforcements based on some grouping of processes. Similar to container, class membership is inherited (if that's still the case from last time I looked at it) until explicitely changed. Containers and in particular provide another dimension namely the ability to constraint "visibility" of resources and objects, in this particular case pids as the first resource used. > "This patchset solves multiple problems". > gerrit > -- Hubertus Franke From matthltc@us.ibm.com Thu Dec 15 15:02:42 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 15 Dec 2005 15:02:49 -0800 (PST) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBFN2a8n029782 for ; Thu, 15 Dec 2005 15:02:42 -0800 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBFMwqhF020322 for ; Thu, 15 Dec 2005 17:58:52 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBFMw01f105832 for ; Thu, 15 Dec 2005 15:58:00 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jBFMwfUf030273 for ; Thu, 15 Dec 2005 15:58:42 -0700 Received: from dyn9047017090.beaverton.ibm.com (dyn9047017090.beaverton.ibm.com [9.47.17.90]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jBFMwfCi030019; Thu, 15 Dec 2005 15:58:41 -0700 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Matt Helsley To: Dave Hansen Cc: Gerrit Huizenga , Hubertus Franke , CKRM-Tech , Linux Kernel Mailing List , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: <1134676961.22525.72.camel@localhost> References: <1134676961.22525.72.camel@localhost> Content-Type: text/plain Date: Thu, 15 Dec 2005 14:52:49 -0800 Message-Id: <1134687169.10396.33.camel@stark> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-archive-position: 173 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: matthltc@us.ibm.com Precedence: bulk X-list: pagg On Thu, 2005-12-15 at 12:02 -0800, Dave Hansen wrote: > On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote: > > I think perhaps this could also be the basis for a CKRM "class" > > grouping as well. Rather than maintaining an independent class > > affiliation for tasks, why not have a class devolve (evolve?) into > > a "container" as described here. > > Wasn't one of the grand schemes of CKRM to be able to have application > instances be shared? For instance, running a single DB2, Oracle, or > Apache server, and still accounting for all of the classes separately. > If so, that wouldn't work with a scheme that requires process > separation. f-series CKRM manages tasks via the task struct -- this means it manages each thread and not a process. Since, generally speaking, each thread is assigned the same class as the main thread this effectively manages processes. So yes, separate DB2, Oracle, Apache, etc. threads could be assigned to different classes. This is definitely something a strict container could not do. > But, sharing the application instances is probably mostly (only) > important for databases anyway. I would imagine that most of the I wouldn't say only for databases. human-interaction-bound processes can share instances (gnome-terminal). Granted, these probably would never need to span a container or a class... > overhead in a server like an Apache instance is for the page cache for > content, as well as a bit for Apache's executables themselves. The > container schemes should be able to share page cache for both cases. > The main issues would be managing multiple configurations, and the > increased overhead from having more processes around than with a single > server. > > There might also be some serious restrictions on containerized > applications. For instance, taking a running application, moving it out > of one container, and into another might not be feasible. Is this > something that is common or desired in the current CKRM framework? > > -- Dave Yes, being able to move a process from one class to another is important. This can happen as a consequence of the system administrator deciding to change the distribution of resources without having to restart services. The change in distribution can be done by changing shares of a class, manually moving processes between classes, by making or deleting classes, or a combination of these operations. Cheers, -Matt Helsley From matthltc@us.ibm.com Thu Dec 15 18:32:38 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 15 Dec 2005 18:32:47 -0800 (PST) Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBG2Wb8n003665 for ; Thu, 15 Dec 2005 18:32:38 -0800 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e36.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBG2T0AD029597 for ; Thu, 15 Dec 2005 21:29:00 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBG2UfHW107026 for ; Thu, 15 Dec 2005 19:30:41 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jBG2Sxfc022336 for ; Thu, 15 Dec 2005 19:29:00 -0700 Received: from dyn9047017090.beaverton.ibm.com (dyn9047017090.beaverton.ibm.com [9.47.17.90]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jBG2SwrV022302; Thu, 15 Dec 2005 19:28:58 -0700 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Matt Helsley To: Gerrit Huizenga Cc: Hubertus Franke , CKRM-Tech , LKML , lse-tech@lists.sourceforge.net, vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Date: Thu, 15 Dec 2005 18:20:52 -0800 Message-Id: <1134699652.10396.161.camel@stark> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-archive-position: 174 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: matthltc@us.ibm.com Precedence: bulk X-list: pagg On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote: > On Thu, 15 Dec 2005 09:35:57 EST, Hubertus Franke wrote: > > This patchset is a followup to the posting by Serge. > > http://marc.theaimsgroup.com/?l=linux-kernel&m=113200410620972&w=2 > > > > In this patchset here, we are providing the pid virtualization mentioned > > in serge's posting. > > > > > I'm part of a project implementing checkpoint/restart processes. > > > After a process or group of processes is checkpointed, killed, and > > > restarted, the changing of pids could confuse them. There are many > > > other such issues, but we wanted to start with pids. > > > > > > This patchset introduces functions to access task->pid and ->tgid, > > > and updates ->pid accessors to use the functions. This is in > > > preparation for a subsequent patchset which will separate the kernel > > > and virtualized pidspaces. This will allow us to virtualize pids > > > from users' pov, so that, for instance, a checkpointed set of > > > processes could be restarted with particular pids. Even though their > > > kernel pids may already be in use by new processes, the checkpointed > > > processes can be started in a new user pidspace with their old > > > virtual pid. This also gives vserver a simpler way to fake vserver > > > init processes as pid 1. Note that this does not change the kernel's > > > internal idea of pids, only what users see. > > > > > > The first 12 patches change all locations which access ->pid and > > > ->tgid to use the inlined functions. The last patch actually > > > introduces task_pid() and task_tgid(), and renames ->pid and ->tgid > > > to __pid and __tgid to make sure any uncaught users error out. > > > > > > Does something like this, presumably after much working over, seem > > > mergeable? > > > > These patches build on top of serge's posted patches (if necessary > > we can repost them here). > > > > PID Virtualization is based on the concept of a container. > > The ultimate goal is to checkpoint/restart containers. > > > > The mechanism to start a container > > is to 'echo "container_name" > /proc/container' which creates a new > > container and associates the calling process with it. All subsequently > > forked tasks then belong to that container. > > There is a separate pid space associated with each container. > > Only processes/task belonging to the same container "see" each other. > > The exception is an implied default system container that has > > a global view. > I think perhaps this could also be the basis for a CKRM "class" > grouping as well. Rather than maintaining an independent class > affiliation for tasks, why not have a class devolve (evolve?) into > a "container" as described here. The container provides much of > the same grouping capabilities as a class as far as I can see. The > right information would be availble for scheduling and IO resource > management. The memory component of CKRM is perhaps a bit tricky > still, but an overall strategy (can I use that word here? ;-) might > be to use these "containers" as the single intrinsic grouping mechanism > for vserver, openvz, application checkpoint/restart, resource > management, and possibly others? > > Opinions, especially from the CKRM folks? This might even be useful > to the PAGG folks as a grouping mechanism, similar to their jobs or > containers. > > "This patchset solves multiple problems". > > gerrit CKRM classes seem too different from containers to merge the two concepts: - Classes don't assign class-unique pids to tasks. - Tasks can move between classes. - Tasks move between classes without any need for checkpoint/restart. - Classes show up in a filesystem interface rather that using a file in /proc to create them. (trivial interface difference) - There are no "visibility boundaries" to enforce between tasks in different classes. - Classes are hierarchial. - Unless I am mistaken, a container groups processes (Can one thread run in container A and another in container B?) while a class groups tasks. Since a task represents a thread or a process one thread could be in class A and another in class B. Cheers, -Matt Helsley From gh@us.ibm.com Thu Dec 15 19:32:55 2005 Received: with ECARTIS (v1.0.0; list pagg); Thu, 15 Dec 2005 19:33:02 -0800 (PST) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.141]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBG3Wn8n012656 for ; Thu, 15 Dec 2005 19:32:55 -0800 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e1.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBG3T7bC007990 for ; Thu, 15 Dec 2005 22:29:07 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBG3T7bF123918 for ; Thu, 15 Dec 2005 22:29:07 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBG3T7Tv026858 for ; Thu, 15 Dec 2005 22:29:07 -0500 Received: from w-gerrit.beaverton.ibm.com (sig-9-65-26-83.mts.ibm.com [9.65.26.83]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBG3T60s026850; Thu, 15 Dec 2005 22:29:06 -0500 Received: from localhost ([127.0.0.1] helo=us.ibm.com ident=gerrit) by w-gerrit.beaverton.ibm.com with esmtp (Exim 3.36 #1 (Debian)) id 1En6H2-0005ok-00; Thu, 15 Dec 2005 19:28:48 -0800 To: Matt Helsley cc: Hubertus Franke , CKRM-Tech , LKML , lse-tech@lists.sourceforge.net, vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com Reply-To: Gerrit Huizenga From: Gerrit Huizenga Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches In-reply-to: Your message of Thu, 15 Dec 2005 18:20:52 PST. <1134699652.10396.161.camel@stark> Date: Thu, 15 Dec 2005 19:28:48 -0800 Message-Id: X-archive-position: 175 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: gh@us.ibm.com Precedence: bulk X-list: pagg On Thu, 15 Dec 2005 18:20:52 PST, Matt Helsley wrote: > On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote: > > On Thu, 15 Dec 2005 09:35:57 EST, Hubertus Franke wrote: > > > PID Virtualization is based on the concept of a container. > > > The ultimate goal is to checkpoint/restart containers. > > > > > > The mechanism to start a container > > > is to 'echo "container_name" > /proc/container' which creates a new > > > container and associates the calling process with it. All subsequently > > > forked tasks then belong to that container. > > > There is a separate pid space associated with each container. > > > Only processes/task belonging to the same container "see" each other. > > > The exception is an implied default system container that has > > > a global view. > > > > > I think perhaps this could also be the basis for a CKRM "class" > > grouping as well. Rather than maintaining an independent class > > affiliation for tasks, why not have a class devolve (evolve?) into > > a "container" as described here. The container provides much of > > the same grouping capabilities as a class as far as I can see. The > > right information would be availble for scheduling and IO resource > > management. The memory component of CKRM is perhaps a bit tricky > > still, but an overall strategy (can I use that word here? ;-) might > > be to use these "containers" as the single intrinsic grouping mechanism > > for vserver, openvz, application checkpoint/restart, resource > > management, and possibly others? > > > > Opinions, especially from the CKRM folks? This might even be useful > > to the PAGG folks as a grouping mechanism, similar to their jobs or > > containers. > > > > "This patchset solves multiple problems". > > > > gerrit > > CKRM classes seem too different from containers to merge the two > concepts: I agree that the implementation of pid virtualization and classes have different characteristics. However, you bring up interesting points about the differences... But I question whether or not they are relevent to an implementation of resource management. I'm going out on a limb here looking at a possibly radical change which might simplify things so there is only one grouping mechanism in kernel. I could be wrong but... > - Classes don't assign class-unique pids to tasks. What part of this is important to resource management? A container ID is like a class ID. Yes, I think container ID's are assigned to processes rather than tasks, but is that really all that important? > - Tasks can move between classes. In the pid virtualization, I would think that tasks can move between containers as well, although it isn't all that useful for most things. For instance, checkpoint/restart needs to checkpoint a process and all of its threads if it wants to restart it. So there may be restrictions on what you can checkpoint/restart. Vserver probably wants isolation at a process boundary, rather than a task boundary. Most resource management, e.g. Java, probably doesn't care about task vs. process. > - Tasks move between classes without any need for checkpoint/restart. That *should* be possible with a generalized container solution. For instance, just like with classes, you have to move things into containers in the first place. And, you could in theory have a classification engine that helped choose which container to put a task/process in at creation/instantiation/significant event... > - Classes show up in a filesystem interface rather that using a file > in /proc to create them. (trivial interface difference) Yep - there will probably be a /proc or /configfs interface to containers at some point, I would expect. No significant difference there. > - There are no "visibility boundaries" to enforce between tasks in > different classes. Are there in virtualized pids? There *can* be - e.g. ps can distinguish, but it is possible for tasks to interact across container boundaries. Not ideal for vserver, checkpoint/restart, for instance (makes c/r a little harder or more limited - signals heading outside the container may "disappear" when you checkpoint/restart but for apps that c/r, that probably isn't all that likely). > - Classes are hierarchial. Conceptually they are. But are they in the CKRM f series? I thought that was one area for simplification. And, how important is that *really* for most applications? > - Unless I am mistaken, a container groups processes (Can one thread run > in container A and another in container B?) while a class groups tasks. > Since a task represents a thread or a process one thread could be in > class A and another in class B. Definitely useful, and one question is whether pid virtualization is container isolation, or simply virtualization to enable container isolation. If it is an enabling technology, perhaps it doesn't have that restriction and could be used either way based on resource management needs or based on vserver or c/r needs... Debate away... ;-) gerrit From haveblue@us.ibm.com Fri Dec 16 09:39:20 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 09:39:31 -0800 (PST) Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.145]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBGHdK8n004893 for ; Fri, 16 Dec 2005 09:39:20 -0800 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBGHZe48016272 for ; Fri, 16 Dec 2005 12:35:40 -0500 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBGHZd7O077026 for ; Fri, 16 Dec 2005 12:35:39 -0500 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBGHZY5m006952 for ; Fri, 16 Dec 2005 12:35:34 -0500 Received: from [10.6.0.76] (sig-9-49-143-45.mts.ibm.com [9.49.143.45]) by d01av04.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBGHZWP1006458; Fri, 16 Dec 2005 12:35:32 -0500 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Dave Hansen To: Gerrit Huizenga Cc: Matt Helsley , Hubertus Franke , CKRM-Tech , LKML , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Date: Fri, 16 Dec 2005 09:35:19 -0800 Message-Id: <1134754519.19403.6.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-archive-position: 176 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: haveblue@us.ibm.com Precedence: bulk X-list: pagg On Thu, 2005-12-15 at 19:28 -0800, Gerrit Huizenga wrote: > In the pid virtualization, I would think that tasks can move between > containers as well, I don't think tasks can not be permitted to move between containers. As a simple exercise, imagine that you have two processes with the same pid, one in container A and one in container B. You wish to have them both run in container A. They can't both have the same pid. What do you do? I've been talking a lot lately about how important filesystem isolation between containers is to implement containers properly. Isolating the filesystem namespaces makes it much easier to do things like fs-based shared memory during a checkpoint/resume. If we want to allow tasks to move around, we'll have to throw out this entire concept. That means that a _lot_ of things get a notch closer to the too-costly-to-implement category. -- Dave From gh@us.ibm.com Fri Dec 16 12:49:46 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 12:49:49 -0800 (PST) Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBGKnj8n004653 for ; Fri, 16 Dec 2005 12:49:46 -0800 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e36.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBGKk8s9004815 for ; Fri, 16 Dec 2005 15:46:08 -0500 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBGKlnkP098668 for ; Fri, 16 Dec 2005 13:47:49 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jBGKk6bV001485 for ; Fri, 16 Dec 2005 13:46:07 -0700 Received: from w-gerrit.beaverton.ibm.com (sig-9-65-26-83.mts.ibm.com [9.65.26.83]) by d03av01.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jBGKk51N001228; Fri, 16 Dec 2005 13:46:06 -0700 Received: from localhost ([127.0.0.1] helo=us.ibm.com ident=gerrit) by w-gerrit.beaverton.ibm.com with esmtp (Exim 3.36 #1 (Debian)) id 1EnMSU-0004pH-00; Fri, 16 Dec 2005 12:45:42 -0800 To: Dave Hansen cc: Matt Helsley , Hubertus Franke , CKRM-Tech , LKML , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com Reply-To: Gerrit Huizenga From: Gerrit Huizenga Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches In-reply-to: Your message of Fri, 16 Dec 2005 09:35:19 PST. <1134754519.19403.6.camel@localhost> Date: Fri, 16 Dec 2005 12:45:42 -0800 Message-Id: X-archive-position: 177 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: gh@us.ibm.com Precedence: bulk X-list: pagg On Fri, 16 Dec 2005 09:35:19 PST, Dave Hansen wrote: > On Thu, 2005-12-15 at 19:28 -0800, Gerrit Huizenga wrote: > > In the pid virtualization, I would think that tasks can move between > > containers as well, > > I don't think tasks can not be permitted to move between containers. As > a simple exercise, imagine that you have two processes with the same > pid, one in container A and one in container B. You wish to have them > both run in container A. They can't both have the same pid. What do > you do? > > I've been talking a lot lately about how important filesystem isolation > between containers is to implement containers properly. Isolating the > filesystem namespaces makes it much easier to do things like fs-based > shared memory during a checkpoint/resume. If we want to allow tasks to > move around, we'll have to throw out this entire concept. That means > that a _lot_ of things get a notch closer to the too-costly-to-implement > category. Interesting... So how to tasks get *into* a container? And can they ever get back "out" of a container? Are most processes on the system initially not in a container? And then they can be stuffed in a container? And then containers can be moved around or be isolated from each other? And, is pid virtualization the point where this happens? Or is that a slightly higher level? In other words, is pid virtualization the full implementation of container isolation? Or is it a significant element on which additional policy, restrictions, and usage models can be built? gerrit From haveblue@us.ibm.com Fri Dec 16 13:14:37 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 13:14:43 -0800 (PST) Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.152]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBGLEa8n007986 for ; Fri, 16 Dec 2005 13:14:37 -0800 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBGLAxfo016105 for ; Fri, 16 Dec 2005 16:10:59 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBGLCekP041084 for ; Fri, 16 Dec 2005 14:12:40 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jBGLAwgb023934 for ; Fri, 16 Dec 2005 14:10:58 -0700 Received: from [10.6.0.76] (sig-9-49-143-45.mts.ibm.com [9.49.143.45]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jBGLAvss023848; Fri, 16 Dec 2005 14:10:57 -0700 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Dave Hansen To: Gerrit Huizenga Cc: Matt Helsley , Hubertus Franke , CKRM-Tech , LKML , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Date: Fri, 16 Dec 2005 13:10:54 -0800 Message-Id: <1134767454.19403.12.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-archive-position: 178 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: haveblue@us.ibm.com Precedence: bulk X-list: pagg On Fri, 2005-12-16 at 12:45 -0800, Gerrit Huizenga wrote: > Interesting... So how to tasks get *into* a container? Only by inheritance. > And can they ever get back "out" of a container? No. Think of the pids again. Even the "outside" of a container, things like the real init, have to have unique pids. What if the process's pid is the same as one in use in the default container? > Are most processes on the system > initially not in a container? And then they can be stuffed in a container? > And then containers can be moved around or be isolated from each other? The current idea is that processes are assigned at fork-time. The isolation is for the lifetime of the process. > And, is pid virtualization the point where this happens? Or is that > a slightly higher level? In other words, is pid virtualization the > full implementation of container isolation? Or is it a significant > element on which additional policy, restrictions, and usage models > can be built? pid virtualization is simply the one that's easiest to understand, and the one that demonstrates the largest number of issues. It is a small piece of the puzzle, but an important one. -- Dave From frankeh@watson.ibm.com Fri Dec 16 15:43:44 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 15:43:59 -0800 (PST) Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.144]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBGNhi8n031121 for ; Fri, 16 Dec 2005 15:43:44 -0800 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e4.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBGNe6DA006534 for ; Fri, 16 Dec 2005 18:40:06 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBGNe5b1115028 for ; Fri, 16 Dec 2005 18:40:06 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBGNe5m2020202 for ; Fri, 16 Dec 2005 18:40:05 -0500 Received: from elg11.watson.ibm.com (elg11.watson.ibm.com [9.2.142.11]) by d01av01.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBGNe5xt020188; Fri, 16 Dec 2005 18:40:05 -0500 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Hubertus Franke To: Dave Hansen Cc: Gerrit Huizenga , Matt Helsley , CKRM-Tech , LKML , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: <1134767454.19403.12.camel@localhost> References: <1134767454.19403.12.camel@localhost> Content-Type: text/plain Date: Fri, 16 Dec 2005 18:40:04 -0500 Message-Id: <1134776404.28779.12.camel@elg11.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-22) Content-Transfer-Encoding: 7bit X-archive-position: 179 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: frankeh@watson.ibm.com Precedence: bulk X-list: pagg On Fri, 2005-12-16 at 13:10 -0800, Dave Hansen wrote: > On Fri, 2005-12-16 at 12:45 -0800, Gerrit Huizenga wrote: > > Interesting... So how to tasks get *into* a container? > > Only by inheritance. That is only true today. There is no reason (other then introducing some heavy code complexity (haven't thought about that) why we can't at some point move a process group/tree into a container. The reason for this is that for the global container V=R in pid space terms (read the vpid=realpid). Moving an entire group into a container requires to assign new kernel pids to each task, while keeping the the vpid part constant. Lots of kpid related references though.. Don't know whether that's worth the trouble, particularly at this stage. > > > And can they ever get back "out" of a container? > > No. Think of the pids again. Even the "outside" of a container, things > like the real init, have to have unique pids. What if the process's pid > is the same as one in use in the default container? Correct..look at my answer above moving from global to container can be accomplished because in a fresh container all pids are available, so we can simply reoccupy the same vpids in the new pidspace. This keeps all user level "references" and pid values valid. The only way we could EVER go back is if we could guarantee that the pids the global space are free, hence they would have to be reserved. NOWAY.... particularly if migration is involved later on.. > > > Are most processes on the system > > initially not in a container? And then they can be stuffed in a container? > > And then containers can be moved around or be isolated from each other? > > The current idea is that processes are assigned at fork-time. The > isolation is for the lifetime of the process. > > > And, is pid virtualization the point where this happens? Or is that > > a slightly higher level? In other words, is pid virtualization the > > full implementation of container isolation? Or is it a significant > > element on which additional policy, restrictions, and usage models > > can be built? > > pid virtualization is simply the one that's easiest to understand, and > the one that demonstrates the largest number of issues. It is a small > piece of the puzzle, but an important one. > Ditto.. > -- Dave > > -- Hubertus Franke From frankeh@watson.ibm.com Fri Dec 16 15:51:27 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 15:51:30 -0800 (PST) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.142]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBGNpQ8n000325 for ; Fri, 16 Dec 2005 15:51:27 -0800 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBGNljTf020477 for ; Fri, 16 Dec 2005 18:47:45 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBGNlj7O114058 for ; Fri, 16 Dec 2005 18:47:45 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBGNlitv018975 for ; Fri, 16 Dec 2005 18:47:45 -0500 Received: from elg11.watson.ibm.com (elg11.watson.ibm.com [9.2.142.11]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBGNlinN018969; Fri, 16 Dec 2005 18:47:44 -0500 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Hubertus Franke To: Dave Hansen Cc: Gerrit Huizenga , Matt Helsley , CKRM-Tech , LKML , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: <1134754519.19403.6.camel@localhost> References: <1134754519.19403.6.camel@localhost> Content-Type: text/plain Date: Fri, 16 Dec 2005 18:47:44 -0500 Message-Id: <1134776864.28779.19.camel@elg11.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-22) Content-Transfer-Encoding: 7bit X-archive-position: 180 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: frankeh@watson.ibm.com Precedence: bulk X-list: pagg On Fri, 2005-12-16 at 09:35 -0800, Dave Hansen wrote: > On Thu, 2005-12-15 at 19:28 -0800, Gerrit Huizenga wrote: > > In the pid virtualization, I would think that tasks can move between > > containers as well, > > I don't think tasks can not be permitted to move between containers. As > a simple exercise, imagine that you have two processes with the same > pid, one in container A and one in container B. You wish to have them > both run in container A. They can't both have the same pid. What do > you do? > Dave, I think you meant "I don't think tasks can not be permitted"... Anyway, you make the constraints very clear, unless one can guarantee that the pidspaces don't have any overlaps in vpid usage, there is NOWAY that we can allow this. Otherwise vpids that have been handed to to userspace (think sys_getpid()) need to be revoked (think coherence here). That violates the transparency requirements. > I've been talking a lot lately about how important filesystem isolation > between containers is to implement containers properly. Isolating the > filesystem namespaces makes it much easier to do things like fs-based > shared memory during a checkpoint/resume. If we want to allow tasks to > move around, we'll have to throw out this entire concept. That means > that a _lot_ of things get a notch closer to the too-costly-to-implement > category. > Not only that, as the example of pids already show, while at the surface these might seem as desirable features ( particular since they came up wrt to the CKRM discussion ), there are significant technical limitation to these. -- Hubertus Franke From matthltc@us.ibm.com Fri Dec 16 17:29:30 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 17:29:36 -0800 (PST) Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.141]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBH1TN8n018413 for ; Fri, 16 Dec 2005 17:29:29 -0800 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e1.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBH1Pfm3029268 for ; Fri, 16 Dec 2005 20:25:41 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBH1Pf7O120092 for ; Fri, 16 Dec 2005 20:25:41 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBH1PfC3010448 for ; Fri, 16 Dec 2005 20:25:41 -0500 Received: from dyn9047017090.beaverton.ibm.com (dyn9047017090.beaverton.ibm.com [9.47.17.90]) by d01av01.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBH1PdaC010384; Fri, 16 Dec 2005 20:25:40 -0500 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Matt Helsley To: Hubertus Franke Cc: Dave Hansen , Gerrit Huizenga , CKRM-Tech , LKML , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: <1134776864.28779.19.camel@elg11.watson.ibm.com> References: <1134754519.19403.6.camel@localhost> <1134776864.28779.19.camel@elg11.watson.ibm.com> Content-Type: text/plain Date: Fri, 16 Dec 2005 17:18:18 -0800 Message-Id: <1134782298.10396.337.camel@stark> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-archive-position: 181 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: matthltc@us.ibm.com Precedence: bulk X-list: pagg On Fri, 2005-12-16 at 18:47 -0500, Hubertus Franke wrote: > On Fri, 2005-12-16 at 09:35 -0800, Dave Hansen wrote: > > I've been talking a lot lately about how important filesystem isolation > > between containers is to implement containers properly. Isolating the > > filesystem namespaces makes it much easier to do things like fs-based > > shared memory during a checkpoint/resume. If we want to allow tasks to > > move around, we'll have to throw out this entire concept. That means > > that a _lot_ of things get a notch closer to the too-costly-to-implement > > category. > > > > Not only that, as the example of pids already show, while at the surface > these might seem as desirable features ( particular since they came up > wrt to the CKRM discussion ), there are significant technical limitation > to these. Perhaps merging the container process grouping functionality is not a good idea. However, I think CKRM could be made minimally consistent with containers using a few small modifications. I suspect all that is necessary is: 1) Expanding the pid syntax accepted and reported when accessing the members file to include an optional container id: # classify init in container 0 to a class echo 0:1 >> ${RCFS}/class_foo/members echo :1 >> ${RCFS}/class_foo/members # while in container 0 classify init in container 0 to a class echo 1 >> ${RCFS}/class_foo/members # while in container 0 classify init in container 3 to a class echo 3:1 >> ${RCFS}/class_foo/bar_class/members Then pids in container 0 would show up as cid:pid $ cat ${RCFS}/class_foo/members 0:1 5:2 ... 3:4 Processes listing members in container n would only see the pid and only pids in that container. 2) Limiting the pids and container ids accepted as input to the members file from processes doing classification from within containers: # classify init in the current container to a class echo :1 >> ${RCFS}/class_foo/members echo 1 >> ${RCFS}/class_foo/members # returns an error when not in container 0 echo 0:1 >> ${RCFS}/class_foo/members # returns an error when not in container 1 echo 1:1 >> ${RCFS}/class_foo/members ... (Incidentally these kind of details are what I was referring to earlier in this thread as "visibility boundaries") I think this would be sufficient to make CKRM and containers play nicely with each other. I suspect further kernel-enforced constraints between CKRM and containers may constitute policy and not functionality. I also suspect that with the right userspace classification engine a wide variety of useful container resource management policies could be enforced based on these simple modifications. Cheers, -Matt Helsley From matthltc@us.ibm.com Fri Dec 16 17:49:22 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 17:49:29 -0800 (PST) Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBH1nM8n021334 for ; Fri, 16 Dec 2005 17:49:22 -0800 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBH1jiWp007999 for ; Fri, 16 Dec 2005 20:45:44 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBH1iotq093940 for ; Fri, 16 Dec 2005 18:44:50 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jBH1jhUV011812 for ; Fri, 16 Dec 2005 18:45:44 -0700 Received: from dyn9047017090.beaverton.ibm.com (dyn9047017090.beaverton.ibm.com [9.47.17.90]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jBH1jgC6011778; Fri, 16 Dec 2005 18:45:43 -0700 Subject: Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Matt Helsley To: Gerrit Huizenga Cc: Hubertus Franke , CKRM-Tech , LKML , lse-tech@lists.sourceforge.net, vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Date: Fri, 16 Dec 2005 17:38:08 -0800 Message-Id: <1134783488.10396.349.camel@stark> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 Content-Transfer-Encoding: 7bit X-archive-position: 182 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: matthltc@us.ibm.com Precedence: bulk X-list: pagg On Thu, 2005-12-15 at 19:28 -0800, Gerrit Huizenga wrote: > On Thu, 15 Dec 2005 18:20:52 PST, Matt Helsley wrote: > > On Thu, 2005-12-15 at 11:49 -0800, Gerrit Huizenga wrote: > > > On Thu, 15 Dec 2005 09:35:57 EST, Hubertus Franke wrote: > > > > PID Virtualization is based on the concept of a container. > > > > The ultimate goal is to checkpoint/restart containers. > > > > > > > > The mechanism to start a container > > > > is to 'echo "container_name" > /proc/container' which creates a new > > > > container and associates the calling process with it. All subsequently > > > > forked tasks then belong to that container. > > > > There is a separate pid space associated with each container. > > > > Only processes/task belonging to the same container "see" each other. > > > > The exception is an implied default system container that has > > > > a global view. > > > > > > > > > I think perhaps this could also be the basis for a CKRM "class" > > > grouping as well. Rather than maintaining an independent class > > > affiliation for tasks, why not have a class devolve (evolve?) into > > > a "container" as described here. The container provides much of > > > the same grouping capabilities as a class as far as I can see. The > > > right information would be availble for scheduling and IO resource > > > management. The memory component of CKRM is perhaps a bit tricky > > > still, but an overall strategy (can I use that word here? ;-) might > > > be to use these "containers" as the single intrinsic grouping mechanism > > > for vserver, openvz, application checkpoint/restart, resource > > > management, and possibly others? > > > > > > Opinions, especially from the CKRM folks? This might even be useful > > > to the PAGG folks as a grouping mechanism, similar to their jobs or > > > containers. > > > > > > "This patchset solves multiple problems". > > > > > > gerrit > > > > CKRM classes seem too different from containers to merge the two > > concepts: > > I agree that the implementation of pid virtualization and classes have > different characteristics. However, you bring up interesting points > about the differences... But I question whether or not they are > relevent to an implementation of resource management. I'm going out > on a limb here looking at a possibly radical change which might > simplify things so there is only one grouping mechanism in kernel. > I could be wrong but... > > - Classes don't assign class-unique pids to tasks. > > What part of this is important to resource management? A container > ID is like a class ID. Yes, I think container ID's are assigned to > processes rather than tasks, but is that really all that important? Perhaps you misunderstood my point. Upon inserting a task into a container you must assign it a pid unique within the container. Inserting a task into a class requires no analogous operation. While there is no conflict here neither is there commonality. > For instance, checkpoint/restart needs to checkpoint a process and all > of its threads if it wants to restart it. So there may be restrictions > on what you can checkpoint/restart. Vserver probably wants isolation > at a process boundary, rather than a task boundary. Most resource > management, e.g. Java, probably doesn't care about task vs. process. I really don't see how Java itself is a good example of most resource management. As I see it Java tries to present a runtime environment for applications and it is the applications administrators are concerned with. A process could allocate different roles to each thread or dole out uniform pieces of work to each thread. Being able to manage the resource usage of these threads could be useful -- so while Java may not "care" about task vs. process an administrator might. > > - Tasks move between classes without any need for checkpoint/restart. > > That *should* be possible with a generalized container solution. > For instance, just like with classes, you have to move things into > containers in the first place. And, you could in theory have a classification > engine that helped choose which container to put a task/process in > at creation/instantiation/significant event... Since arbitrary movement (time, source, and destination) is not possible the classification analogy does not fit. This is one very big difference between classes and containers that suggests merging the two might not be best. > > - There are no "visibility boundaries" to enforce between tasks in > > different classes. > > Are there in virtualized pids? There *can* be - e.g. ps can distinguish, > but it is possible for tasks to interact across container boundaries. Right. I didn't say they were entirely invisible to each other. If they were entirely visible to each other then these boundaries I'm talking about wouldn't exist and a container would be more similar to a class. These boundaries are probably delineated in miscellaneous areas of the kernel like getpid(), kill(), any /proc file that shows a set of pids, etc. Each of these would have to correctly limit the set of pids displayed and/or accepted as input. A CKRM class on the other hand has no such boundaries to present to userspace and hence does not alter code in such diverse places. I think this is a consequence of the fact it doesn't virtualize resources for the purposes of checkpoint/restart (esp. well-known and user-visible resources like pids, filehandles, etc). > > - Classes are hierarchial. > > Conceptually they are. But are they in the CKRM f series? I thought > that was one area for simplification. And, how important is that *really* > for most applications? Hiearchy still exists in f-series. It's something Chandra has been considering removing in order to simplify the code. I think hierarchy offers a chance for administrators to better organize their classes. I think the goal should be to enable administrators to let users manage a class and/or subclasses of their own -- though implementing rcfs via configfs limits config items to root currently. Perhaps this could be useful for CKRM inside containers if each container had a virtual root user id of its own with a corresponding non-zero id in container 0... > > - Unless I am mistaken, a container groups processes (Can one thread run > > in container A and another in container B?) while a class groups tasks. > > Since a task represents a thread or a process one thread could be in > > class A and another in class B. > > Definitely useful, and one question is whether pid virtualization is Above you suggested that most resource management ("e.g. Java") doesn't care about process vs. threads. Here you say it could be useful. > container isolation, or simply virtualization to enable container > isolation. If it is an enabling technology, perhaps it doesn't have > that restriction and could be used either way based on resource management > needs or based on vserver or c/r needs... I thought that the point of pid virtualization was to enable checkpoint/restart and that, as a consequence, moving processes to other containers is impossible. > Debate away... ;-) > > gerrit The strongest disimilarity between the two I can see is the lack of task movement between containers. The core similarity is the ability to group. However, they don't group quite the same things -- from what I can see containers group _trees of tasks_ with process (thread group) granularity while classes group _tasks_ with thread granularity. At the very least I think we need to know the full extent of isolation and interaction that are planned/necessary for containers before further considering any merge proposals. Cheers, -Matt Helsley From frankeh@watson.ibm.com Fri Dec 16 19:07:16 2005 Received: with ECARTIS (v1.0.0; list pagg); Fri, 16 Dec 2005 19:07:29 -0800 (PST) Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.142]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id jBH37F8n030635 for ; Fri, 16 Dec 2005 19:07:16 -0800 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jBH33XR8023180 for ; Fri, 16 Dec 2005 22:03:33 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jBH33X7O111166 for ; Fri, 16 Dec 2005 22:03:33 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jBH33XxK013225 for ; Fri, 16 Dec 2005 22:03:33 -0500 Received: from elg11.watson.ibm.com (elg11.watson.ibm.com [9.2.142.11]) by d01av01.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jBH33WsS013222; Fri, 16 Dec 2005 22:03:33 -0500 Subject: Re: [Lse-tech] Re: [ckrm-tech] Re: [RFC][patch 00/21] PID Virtualization: Overview and Patches From: Hubertus Franke To: Matt Helsley Cc: Dave Hansen , Gerrit Huizenga , CKRM-Tech , LKML , LSE , vserver@list.linux-vserver.org, Andrew Morton , Rik van Riel , pagg@oss.sgi.com In-Reply-To: <1134782298.10396.337.camel@stark> References: <1134754519.19403.6.camel@localhost> <1134776864.28779.19.camel@elg11.watson.ibm.com> <1134782298.10396.337.camel@stark> Content-Type: text/plain Date: Fri, 16 Dec 2005 22:03:32 -0500 Message-Id: <1134788612.28779.45.camel@elg11.watson.ibm.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-22) Content-Transfer-Encoding: 7bit X-archive-position: 183 X-ecartis-version: Ecartis v1.0.0 Sender: pagg-bounce@oss.sgi.com Errors-to: pagg-bounce@oss.sgi.com X-original-sender: frankeh@watson.ibm.com Precedence: bulk X-list: pagg On Fri, 2005-12-16 at 17:18 -0800, Matt Helsley wrote: > On Fri, 2005-12-16 at 18:47 -0500, Hubertus Franke wrote: > > On Fri, 2005-12-16 at 09:35 -0800, Dave Hansen wrote: > > > > I've been talking a lot lately about how important filesystem isolation > > > between containers is to implement containers properly. Isolating the > > > filesystem namespaces makes it much easier to do things like fs-based > > > shared memory during a checkpoint/resume. If we want to allow tasks to > > > move around, we'll have to throw out this entire concept. That means > > > that a _lot_ of things get a notch closer to the too-costly-to-implement > > > category. > > > > > > > Not only that, as the example of pids already show, while at the surface > > these might seem as desirable features ( particular since they came up > > wrt to the CKRM discussion ), there are significant technical limitation > > to these. > > Perhaps merging the container process grouping functionality is not a > good idea. > > However, I think CKRM could be made minimally consistent with > containers using a few small modifications. I suspect all that is > necessary is: > > I think this would be sufficient to make CKRM and containers play > nicely with each other. I suspect further kernel-enforced constraints > between CKRM and containers may constitute policy and not functionality. > I think that as a first step mutual coexistence is already quite useful. Once I containerize applications, having the ability to actually constrain and manage the resources consumed by that application would be a real plus. In that sense a container and CKRM class coincide. So even enforcing that "alignment" at a higher level through some awareness in the classification engine for instance would be quite useful. Are they the same kernel object .. NO .. because of the life cycle management of a process, namely once moved into a container it stays there... > > Cheers, > -Matt Helsley Prost ... Hubertus Franke