pcp
[Top] [All Lists]

Re: Suggested way of monitoring processes?

To: Alan Bailey <abailey@xxxxxxxxxxxxx>
Subject: Re: Suggested way of monitoring processes?
From: Ken McDonell <kenmcd@xxxxxxxxxxxxxxxxx>
Date: Fri, 3 Nov 2000 08:53:48 +1100
Cc: pcp@xxxxxxxxxxx
In-reply-to: <Pine.LNX.4.10.10011021318410.11300-100000@osage.ncsa.uiuc.edu>
Reply-to: kenmcd@xxxxxxx
Sender: owner-pcp@xxxxxxxxxxx
The shping PMDA is a solution out of the box for this type of monitoring
(it is a variant of 1.).  At the end you'll find the help text and
some sample values.

Unfortunately this is part of the value-add part of PCP that is not open
sourced ... it is in pcp-pro for Linux and SC4-PCP for IRIX.

The creative de Bono solution is to use (or acquire) an SGI workstation
running IRIX 6.5.5 or later) ... install pcp_eoe.sw.espping ... the
espping PMDA is a clone of the shping PMDA that could be hijacked to
do what you want.

I would be willing to consider the case for moving the shping PMDA to
open source if I had some justification from the community ... if this
is of interest to you, drop me a note affressing questions like:
    - why you want it?
    - how would you use it?
    - is any SGI hardware involved?
    - would this be a make-or-break issue for you embracing PCP in your
      environment?

On Thu, 2 Nov 2000, Alan Bailey wrote:

> I'm looking to monitor a few things that aren't currently a part of the
> PCP pmdas.  I want to check to make sure that a few important processes
> are running on the host being logged.  These would be things like inetd,
> sshd, and other processes.  I figure there might be two ways of doing
> this.
> 
> 1. I could write my own pmdas for these, that just returns a 0 or 1
> depending on if the process is running.  This would take some time (for
> me) because I haven't done work with processes in C before or with writing
> pmdas.
> 
> 2. I could utilize the current proc.psinfo.pid pmda that returns the whole
> process tree, and see if they are running by parsing through that.
> 
> I also want to do a similar thing with NFS mounts, to make sure that
> remote disks are properly mounted.
> 
> I'm leaning toward 1 just to keep everything being monitored on the same
> level, and so there isn't another layer needed (like in 2).  However, it
> might be a lot of work.  Any suggestions, or offers to write the pmda?
> ;-)

$ pminfo -tT shping
shping.status [command execution status for shping PMDA]
Help:
As each command is executed, the success or failure is encoded in
shping.status, using the following values:

   -1   PMDA is initializing and command has not been run yet
    0   command completed and exit status was 0
    1   command completed and exit status was non-zero
    2   command was run but terminated by a signal
    3   command was run but did not complete (usually a timeout)
    4   command was not run due to some system error or resource
        availability

shping.error [command execution error code for shping PMDA]
Help:
As each command is executed, if there is a problem, the error
code or cause is stored in shping.error.

The interpretation of the value for shping.error depends on
shping.status as follows:

    If shping.status is 1 (the command was run but returned a non-zero
    exit status) then shping.error is the exit status.

    If shping.status is 2 (the command was run but was terminated by
    a signal) then shping.error is the signal number.

    If shping.status is 3 (the command did not complete) then
    shping.error is a PCP error codes: see pmerr(1).  Of particular
    relevance is -1008 (PM_ERR_TIMEOUT) when the command failed to
    complete in the time specified by shping.control.timeout.

    If shping.status is 4 (the commands was not run) then shping.error
    is the value of errno.

    Otherwise shping.error will be zero.

shping.cmd [commands run by shping PMDA]
Help:
The text of each sh(1) command run by the shping PMDA.

shping.time.real [elapsed time for a command]
Help:
This metric records the elapsed time in milliseconds for the most recent
execution of each command to be run by the shping PMDA.

Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion.  If the command timed out, shping.time.real will be -1.

shping.time.cpu_usr [user mode CPU time for a command]
Help:
This metric records the user mode CPU time in milliseconds for the most
recent execution of each command to be run by the shping PMDA.

Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion.  If the command timed out, shping.time.cpu_usr will be -1.

shping.time.cpu_sys [system mode CPU time for a command]
Help:
This metric records the system mode CPU time in milliseconds for the most
recent execution of each command to be run by the shping PMDA.

Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion.  If the command timed out, shping.time.cpu_sys will be -1.

shping.control.numcmd [number of commands in the group to be run by the shping 
PMDA]
Help:
number of commands in the group to be run by the shping PMDA

shping.control.cycles [number of times the command group has been run by the 
shping PMDA]
Help:
number of times the command group has been run by the shping PMDA

shping.control.cycletime [shping PMDA cycle time]
Help:
All commands are run by the shping PMDA are executed one after another
in a group, and the group is run once per "cycle" time.  This metric
reports the cycle time in seconds.

The cycle time may be changed dynamically by modifying this metric
with pmstore(1).

shping.control.timeout [shping PMDA timeout period]
Help:
The number of seconds the shping PMDA is willing to wait before
considering a single command to have timed out and killing it off.

The time out interval may be changed dynamically by modifying this
metric with pmstore(1).

shping.control.debug [shping PMDA debug flag]
Help:
The debug flag for the shping PMDA (see pmdbg(1)).  All trace and
diagnostic files are created in /var/adm/pcplog (unless $PCP_LOGDIR
is sent in the environment, see PMAPI(3)).

The debug flags DBG_TRACE_APPL0 (2048) and DBG_TRACE_APPL1 (4096)
may be used as follows:

DBG_TRACE_APPL0 - additional trace messages associated with the running
                  of each command appear in shping.log

DBG_TRACE_APPL1 - the standard output and standard error of each command
                  is appended to shping.out (instead of the default
                  /dev/null)

The debug flags may be changed dynamically by modifying this
metric with pmstore(1), e.g.
        $ pmstore shping.control.debug 6144
would enable both of the diagnostic traces associated with
DBG_TRACE_APPL0 and DBG_TRACE_APPL1.

$ pminfo -f shping

shping.status
    inst [0 or "null"] value 0
    inst [1 or "date"] value 0
    inst [2 or "sum"] value 0
    inst [3 or "cc"] value 0
    inst [4 or "dns"] value 0
    inst [5 or "dns-self"] value 0
    inst [6 or "dns-err"] value 0
    inst [7 or "ypserv"] value 1
    inst [8 or "rpcbind"] value 0
    inst [9 or "smtp"] value 0
    inst [10 or "nntp"] value 0
    inst [11 or "hippi"] value 0
    inst [12 or "autofsd"] value 0

shping.error
    inst [0 or "null"] value 0
    inst [1 or "date"] value 0
    inst [2 or "sum"] value 0
    inst [3 or "cc"] value 0
    inst [4 or "dns"] value 0
    inst [5 or "dns-self"] value 0
    inst [6 or "dns-err"] value 0
    inst [7 or "ypserv"] value 1
    inst [8 or "rpcbind"] value 0
    inst [9 or "smtp"] value 0
    inst [10 or "nntp"] value 0
    inst [11 or "hippi"] value 0
    inst [12 or "autofsd"] value 0

shping.cmd
    inst [0 or "null"] value "exit 0"
    inst [1 or "date"] value "/sbin/date"
    inst [2 or "sum"] value "sum /unix"
    inst [3 or "cc"] value "cd /tmp; rm -f $$.[oc] $$; echo 
"main(){printf(\"g'day world\\\\n\");}" >/tmp/$$.c; cc -o $$ $$.c; ./$$; rm -f 
$$.[oc] $$"
    inst [4 or "dns"] value "nslookup - 134.14.52.130 </dev/null"
    inst [5 or "dns-self"] value "nslookup `/usr/bsd/hostname`"
    inst [6 or "dns-err"] value "nslookup foo.bar.no.host.com"
    inst [7 or "ypserv"] value "ypcat hosts | grep `/usr/bsd/hostname`"
    inst [8 or "rpcbind"] value "/usr/etc/rpcinfo -p"
    inst [9 or "smtp"] value "( echo "expn root" ; echo quit ) | telnet 
localhost 25 | cat"
    inst [10 or "nntp"] value "( echo "listgroup comp.sys.sgi"; echo quit ) | 
telnet tokyo.engr.sgi.com 119 | cat"
    inst [11 or "hippi"] value "/usr/pcp/bin/hipprobe"
    inst [12 or "autofsd"] value "/usr/pcp/bin/autofsd-probe"

shping.time.real
    inst [0 or "null"] value 16.388
    inst [1 or "date"] value 26.872
    inst [2 or "sum"] value 276.36899
    inst [3 or "cc"] value 534.67603
    inst [4 or "dns"] value 52.859001
    inst [5 or "dns-self"] value 47.425999
    inst [6 or "dns-err"] value 38.957001
    inst [7 or "ypserv"] value 58.299999
    inst [8 or "rpcbind"] value 60.308998
    inst [9 or "smtp"] value 1241.401
    inst [10 or "nntp"] value 5799.2271
    inst [11 or "hippi"] value 89.153999
    inst [12 or "autofsd"] value 39.005001

shping.time.cpu_usr
    inst [0 or "null"] value 1.796
    inst [1 or "date"] value 3.2119999
    inst [2 or "sum"] value 46.474998
    inst [3 or "cc"] value 89.279999
    inst [4 or "dns"] value 3.8770001
    inst [5 or "dns-self"] value 5.3049998
    inst [6 or "dns-err"] value 3.7950001
    inst [7 or "ypserv"] value 6.257
    inst [8 or "rpcbind"] value 5.1869998
    inst [9 or "smtp"] value 3.8759999
    inst [10 or "nntp"] value 4.0170002
    inst [11 or "hippi"] value 15.375
    inst [12 or "autofsd"] value 3.7490001

shping.time.cpu_sys
    inst [0 or "null"] value 9.7119999
    inst [1 or "date"] value 16.097
    inst [2 or "sum"] value 188.849
    inst [3 or "cc"] value 208.659
    inst [4 or "dns"] value 19.781
    inst [5 or "dns-self"] value 30.115999
    inst [6 or "dns-err"] value 19.612
    inst [7 or "ypserv"] value 37.375
    inst [8 or "rpcbind"] value 22.334999
    inst [9 or "smtp"] value 21.353001
    inst [10 or "nntp"] value 21.399
    inst [11 or "hippi"] value 55.088001
    inst [12 or "autofsd"] value 22.391001

shping.control.numcmd
    value 13

shping.control.cycles
    value 6963

shping.control.cycletime
    value 120

shping.control.timeout
    value 20

shping.control.debug
    value 0


<Prev in Thread] Current Thread [Next in Thread>