pcp
[Top] [All Lists]

Re: [pcp] hotproc rfc

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] hotproc rfc
From: Martins Innus <minnus@xxxxxxxxxxx>
Date: Mon, 12 May 2014 10:53:20 -0400
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1139662762.4765310.1399862104653.JavaMail.zimbra@xxxxxxxxxx>
References: <536D28B4.6010504@xxxxxxxxxxx> <1139662762.4765310.1399862104653.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
Nathan,

On 5/11/14 10:35 PM, Nathan Scott wrote:
[...]
On Linux, the cgroup concept could provide a more flexible mechanism.  The
kernel support means the "hot process" evaluation side of things becomes
optional (IOW, one could deem processes worthy of logging or whatever-one-
would-otherwise-use-hotproc-metrics-for simply by ensuring the processes
to be monitored start in a specific cgroup, using existing cgroup tools.
Which, for some potential use-cases, is a big win cos walking the complete
list of processes can be prohibitively expensive, for very large process
counts.  A handy mechanism for obtaining only the processes in a specific
cgroup is provided by the kernel already, and the Linux pmdaproc already
knows how to use it.

I'm not too familiar with actually modifying cgroups, but we use them here extensively with various software that supports them natively: slurm, various MPIs, etc. I had not thought about using cgroups to manage a set of hotprocs. As I've started to look into this, its not clear to me if moving processes around between cgroups would break these existing tools. Slurm for instance, starts in a cgroup, and then spawns tasks which then live under that same hierarchy. If one, but not all, of these new processes becomes "hot", I can imagine that there would be some ramifications to moving it to a different cgroup, without its siblings, parent, etc. This will require some testing. Let me know if i've misunderstood or if you have thoughts on this.

[...]
the "process hotness" evaluation functions, we could use the existing proc
PMDA (-r option, and/or the proc.control.perclient.cgroups metric) for the
metrics, and a separate hunk of code for the cgroup evaluation.

Yeah, I had planned to use a similar methodology to segregate the "hot" processes.

Possibly
pmie could be used for this classification side of things, then we'd not
need a new tool for that either - worth experimenting with I think.

There are other good advantages to using cgroups for this task too.  The
kernel tracks memory utilisation, CPU utilisation, hardware performance
counters, and now it seems even some I/O stats, at the cgroup level.  So,
hopefully we can get a richer set of stats, and more cheaply, than the
traditional pmdahotproc provided.
Those would be great to have, and likely would provide a lot of good metrics for validation of whether you have the right "hot" processes.


And then there's the whole PMDA source code & namespace management issues
that you'll have come across in looking into pmdahotproc so far ... it'd
be nice to dodge all of that and simply have the one pmdaproc, if we can.

Right.  My goal in all this was to try to get it into one pmda.

Thanks

Martins

<Prev in Thread] Current Thread [Next in Thread>