pcp
[Top] [All Lists]

Re: [pcp] hotproc rfc

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] hotproc rfc
From: Martins Innus <minnus@xxxxxxxxxxx>
Date: Wed, 17 Dec 2014 13:27:56 -0500
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <905561536.62723741.1412657864635.JavaMail.zimbra@xxxxxxxxxx>
References: <536D28B4.6010504@xxxxxxxxxxx> <1139662762.4765310.1399862104653.JavaMail.zimbra@xxxxxxxxxx> <54230FAF.2080201@xxxxxxxxxxx> <905561536.62723741.1412657864635.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
Nathan,
    I think this is in a pretty good state now.  You can grab from here:

https://github.com/ubccr/pcp/tree/hotproc_review

I left it in it's own branch for now in case you think there is more work to do before merging. The git history is pretty ugly as I have been updating it to keep up with dev for a while. But it should merge cleanly. I just include the diffstat against dev here:

 qa/800                                  |   71 +++
 qa/800.out                              |   69 +++
 qa/group                                |    1 +
 src/pmdas/linux_proc/GNUmakefile        |   24 +-
 src/pmdas/linux_proc/clusters.h         |   14 +-
 src/pmdas/linux_proc/config.c           |  570 ++++++++++++++++++++++
 src/pmdas/linux_proc/config.h           |   59 +++
 src/pmdas/linux_proc/contexts.c         |   16 +
 src/pmdas/linux_proc/contexts.h         |    1 +
 src/pmdas/linux_proc/error.c            |   40 ++
 src/pmdas/linux_proc/gram.y             |  168 +++++++
 src/pmdas/linux_proc/gram_node.c        |  201 ++++++++
 src/pmdas/linux_proc/gram_node.h        |   69 +++
 src/pmdas/linux_proc/hotproc.c          |   32 ++
 src/pmdas/linux_proc/hotproc.h          |   70 +++
 src/pmdas/linux_proc/indom.h            |    4 +-
 src/pmdas/linux_proc/lex.l              |  114 +++++
 src/pmdas/linux_proc/pmda.c             |  442 ++++++++++++++++--
 src/pmdas/linux_proc/pmdahotproc.1      |  315 +++++++++++++
 src/pmdas/linux_proc/proc_dynamic.c     |  201 ++++-----
src/pmdas/linux_proc/proc_pid.c | 786 +++++++++++++++++++++++++++++--
 src/pmdas/linux_proc/proc_pid.h         |   12 +
 src/pmdas/linux_proc/root_proc          |   44 ++
 src/pmdas/linux_proc/samplehotproc.conf |    4 +
 25 files changed, 3129 insertions(+), 209 deletions(-)


Requires Ken's dbpmda changes which are not included here.

Comments below.

On 10/7/14 12:57 AM, Nathan Scott wrote:
I think the approach of merging the two PMDAs has worked well.  Since
hotproc was originally written, the dynamic metrics concept has come
along and I think it'll be a better match here than any namespace file
script approach.  Some more notes in the review - but, if I can help
with coding this aspect, just let me know (I have some of this fresh
in-mind from other recent hacking).
It has now been all updated to use my dynamic metric changes.


There is already a pmStore
interface to allow dynamic changes (we should add some access controls
to those, see "have_access" in proc_store - use of that is missing for
the new metrics).  In this case, checking for zero uid attribute would
be a good option, I think (in the absence of any more involved ACLs,
which could be added later).
Just did the root check for now.

have some metrics available or should it be disabled (timer doesn't run,
etc)
if the user does not provide a config file?  I have provided a
Yes, disabled and no-values by default I think.
I provide a sample config, but hotproc is disabled by default. You can move the config in place and restart the pmda or use pmstore to set a config.

sample that is a simplification of what we are currently using.
That should be named sample.conf and hotproc.conf should not
exist by default, since we have no way of knowing up-front what
any individual site might want to use.  pmdashping does something
like this, btw, might provide a reference - oh, the old pmdahotproc
Install script implements the same logic too actually.  This could
be added into the pmdaproc Install.
I did not do this yet. Since it would break the interface to "Install" in case someone is scripting against that. I can add if you think the benefit outweighs possible breakage. Right now, you would just move the config in place and restart the pmda.

5. Some stats I couldn't find per process yet: syscalls (systemtap seems
to be the only way to get these?) ,  and  schedwait (doesn't seem to be
available at all).
Yeah, IRIX had a fair bit more per-process instrumentation.
Schedwait is there, but no syscalls. I left some commented code in for that since my hope is to get syscalls in there at some point, somehow.

Good stuff.  Other missing things would be docs and QA testing.  The
original man page for pmdahotproc(1) is currently over in man/retired
in the pcp git tree - maybe go through that and pull out the relevant
bits (config format, pmstore interface, etc)?

 From a QA POV, I'm pretty sure there were some hotproc tests originally
- but, I cannot find any currently in the tree.  I've sent a note to our
friends at SGI to see if they can help us out there.  I'll send through
details once I hear back, but in general we should start thinking about
automated tests now that we have the basics in place, to support the next
rounds of development (and ongoing, into the future, of course).
Man page and some QA included. I looked around and couldn't find any of the original QA scripts. Any word from SGI?



diff --git a/src/pmdas/linux_proc/config.c b/src/pmdas/linux_proc/config.c
new file mode 100644
index 0000000..498b5f9
--- /dev/null
+++ b/src/pmdas/linux_proc/config.c
@@ -0,0 +1,561 @@
[...]
+       (void)fprintf(stderr, "%s: Unable to open configuration file \"%s\": 
%s\n",
+           pmProgname, hotproc_configfile, osstrerror());
+       exit(1);

This, and quite a few other cases of calling exit() here, should
be reworked to not fail quite so drastically.
OK, these should all be fixed.

+read_test_var(char *line, config_vars *vars)

There's helper code here for testing, we should look into how to
automate its use.
I'd like to rework config.c completely in the future ( remove tmp file writing, etc), but some of this is entwined with this apparent testing code. I'll hold off for a little bit in case any of that turns up, and do it all at once.

diff --git a/src/pmdas/linux_proc/proc_pid.c b/src/pmdas/linux_proc/proc_pid.c
index c0bab14..6a7e4ec 100644
--- a/src/pmdas/linux_proc/proc_pid.c
+++ b/src/pmdas/linux_proc/proc_pid.c

(there's several different coding styles in this file, makes it
a bit jarring to read for me)
I tried to fix this up. Some tab/space issues and formatting due to my copy and past of some of the original code I think.


I would expect more QA needs and other fixes as other people test.

Thanks for the help.

Martins

<Prev in Thread] Current Thread [Next in Thread>