The shping PMDA is a solution out of the box for this type of monitoring
(it is a variant of 1.). At the end you'll find the help text and
some sample values.
Unfortunately this is part of the value-add part of PCP that is not open
sourced ... it is in pcp-pro for Linux and SC4-PCP for IRIX.
The creative de Bono solution is to use (or acquire) an SGI workstation
running IRIX 6.5.5 or later) ... install pcp_eoe.sw.espping ... the
espping PMDA is a clone of the shping PMDA that could be hijacked to
do what you want.
I would be willing to consider the case for moving the shping PMDA to
open source if I had some justification from the community ... if this
is of interest to you, drop me a note affressing questions like:
- why you want it?
- how would you use it?
- is any SGI hardware involved?
- would this be a make-or-break issue for you embracing PCP in your
environment?
On Thu, 2 Nov 2000, Alan Bailey wrote:
> I'm looking to monitor a few things that aren't currently a part of the
> PCP pmdas. I want to check to make sure that a few important processes
> are running on the host being logged. These would be things like inetd,
> sshd, and other processes. I figure there might be two ways of doing
> this.
>
> 1. I could write my own pmdas for these, that just returns a 0 or 1
> depending on if the process is running. This would take some time (for
> me) because I haven't done work with processes in C before or with writing
> pmdas.
>
> 2. I could utilize the current proc.psinfo.pid pmda that returns the whole
> process tree, and see if they are running by parsing through that.
>
> I also want to do a similar thing with NFS mounts, to make sure that
> remote disks are properly mounted.
>
> I'm leaning toward 1 just to keep everything being monitored on the same
> level, and so there isn't another layer needed (like in 2). However, it
> might be a lot of work. Any suggestions, or offers to write the pmda?
> ;-)
$ pminfo -tT shping
shping.status [command execution status for shping PMDA]
Help:
As each command is executed, the success or failure is encoded in
shping.status, using the following values:
-1 PMDA is initializing and command has not been run yet
0 command completed and exit status was 0
1 command completed and exit status was non-zero
2 command was run but terminated by a signal
3 command was run but did not complete (usually a timeout)
4 command was not run due to some system error or resource
availability
shping.error [command execution error code for shping PMDA]
Help:
As each command is executed, if there is a problem, the error
code or cause is stored in shping.error.
The interpretation of the value for shping.error depends on
shping.status as follows:
If shping.status is 1 (the command was run but returned a non-zero
exit status) then shping.error is the exit status.
If shping.status is 2 (the command was run but was terminated by
a signal) then shping.error is the signal number.
If shping.status is 3 (the command did not complete) then
shping.error is a PCP error codes: see pmerr(1). Of particular
relevance is -1008 (PM_ERR_TIMEOUT) when the command failed to
complete in the time specified by shping.control.timeout.
If shping.status is 4 (the commands was not run) then shping.error
is the value of errno.
Otherwise shping.error will be zero.
shping.cmd [commands run by shping PMDA]
Help:
The text of each sh(1) command run by the shping PMDA.
shping.time.real [elapsed time for a command]
Help:
This metric records the elapsed time in milliseconds for the most recent
execution of each command to be run by the shping PMDA.
Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion. If the command timed out, shping.time.real will be -1.
shping.time.cpu_usr [user mode CPU time for a command]
Help:
This metric records the user mode CPU time in milliseconds for the most
recent execution of each command to be run by the shping PMDA.
Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion. If the command timed out, shping.time.cpu_usr will be -1.
shping.time.cpu_sys [system mode CPU time for a command]
Help:
This metric records the system mode CPU time in milliseconds for the most
recent execution of each command to be run by the shping PMDA.
Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion. If the command timed out, shping.time.cpu_sys will be -1.
shping.control.numcmd [number of commands in the group to be run by the shping
PMDA]
Help:
number of commands in the group to be run by the shping PMDA
shping.control.cycles [number of times the command group has been run by the
shping PMDA]
Help:
number of times the command group has been run by the shping PMDA
shping.control.cycletime [shping PMDA cycle time]
Help:
All commands are run by the shping PMDA are executed one after another
in a group, and the group is run once per "cycle" time. This metric
reports the cycle time in seconds.
The cycle time may be changed dynamically by modifying this metric
with pmstore(1).
shping.control.timeout [shping PMDA timeout period]
Help:
The number of seconds the shping PMDA is willing to wait before
considering a single command to have timed out and killing it off.
The time out interval may be changed dynamically by modifying this
metric with pmstore(1).
shping.control.debug [shping PMDA debug flag]
Help:
The debug flag for the shping PMDA (see pmdbg(1)). All trace and
diagnostic files are created in /var/adm/pcplog (unless $PCP_LOGDIR
is sent in the environment, see PMAPI(3)).
The debug flags DBG_TRACE_APPL0 (2048) and DBG_TRACE_APPL1 (4096)
may be used as follows:
DBG_TRACE_APPL0 - additional trace messages associated with the running
of each command appear in shping.log
DBG_TRACE_APPL1 - the standard output and standard error of each command
is appended to shping.out (instead of the default
/dev/null)
The debug flags may be changed dynamically by modifying this
metric with pmstore(1), e.g.
$ pmstore shping.control.debug 6144
would enable both of the diagnostic traces associated with
DBG_TRACE_APPL0 and DBG_TRACE_APPL1.
$ pminfo -f shping
shping.status
inst [0 or "null"] value 0
inst [1 or "date"] value 0
inst [2 or "sum"] value 0
inst [3 or "cc"] value 0
inst [4 or "dns"] value 0
inst [5 or "dns-self"] value 0
inst [6 or "dns-err"] value 0
inst [7 or "ypserv"] value 1
inst [8 or "rpcbind"] value 0
inst [9 or "smtp"] value 0
inst [10 or "nntp"] value 0
inst [11 or "hippi"] value 0
inst [12 or "autofsd"] value 0
shping.error
inst [0 or "null"] value 0
inst [1 or "date"] value 0
inst [2 or "sum"] value 0
inst [3 or "cc"] value 0
inst [4 or "dns"] value 0
inst [5 or "dns-self"] value 0
inst [6 or "dns-err"] value 0
inst [7 or "ypserv"] value 1
inst [8 or "rpcbind"] value 0
inst [9 or "smtp"] value 0
inst [10 or "nntp"] value 0
inst [11 or "hippi"] value 0
inst [12 or "autofsd"] value 0
shping.cmd
inst [0 or "null"] value "exit 0"
inst [1 or "date"] value "/sbin/date"
inst [2 or "sum"] value "sum /unix"
inst [3 or "cc"] value "cd /tmp; rm -f $$.[oc] $$; echo
"main(){printf(\"g'day world\\\\n\");}" >/tmp/$$.c; cc -o $$ $$.c; ./$$; rm -f
$$.[oc] $$"
inst [4 or "dns"] value "nslookup - 134.14.52.130 </dev/null"
inst [5 or "dns-self"] value "nslookup `/usr/bsd/hostname`"
inst [6 or "dns-err"] value "nslookup foo.bar.no.host.com"
inst [7 or "ypserv"] value "ypcat hosts | grep `/usr/bsd/hostname`"
inst [8 or "rpcbind"] value "/usr/etc/rpcinfo -p"
inst [9 or "smtp"] value "( echo "expn root" ; echo quit ) | telnet
localhost 25 | cat"
inst [10 or "nntp"] value "( echo "listgroup comp.sys.sgi"; echo quit ) |
telnet tokyo.engr.sgi.com 119 | cat"
inst [11 or "hippi"] value "/usr/pcp/bin/hipprobe"
inst [12 or "autofsd"] value "/usr/pcp/bin/autofsd-probe"
shping.time.real
inst [0 or "null"] value 16.388
inst [1 or "date"] value 26.872
inst [2 or "sum"] value 276.36899
inst [3 or "cc"] value 534.67603
inst [4 or "dns"] value 52.859001
inst [5 or "dns-self"] value 47.425999
inst [6 or "dns-err"] value 38.957001
inst [7 or "ypserv"] value 58.299999
inst [8 or "rpcbind"] value 60.308998
inst [9 or "smtp"] value 1241.401
inst [10 or "nntp"] value 5799.2271
inst [11 or "hippi"] value 89.153999
inst [12 or "autofsd"] value 39.005001
shping.time.cpu_usr
inst [0 or "null"] value 1.796
inst [1 or "date"] value 3.2119999
inst [2 or "sum"] value 46.474998
inst [3 or "cc"] value 89.279999
inst [4 or "dns"] value 3.8770001
inst [5 or "dns-self"] value 5.3049998
inst [6 or "dns-err"] value 3.7950001
inst [7 or "ypserv"] value 6.257
inst [8 or "rpcbind"] value 5.1869998
inst [9 or "smtp"] value 3.8759999
inst [10 or "nntp"] value 4.0170002
inst [11 or "hippi"] value 15.375
inst [12 or "autofsd"] value 3.7490001
shping.time.cpu_sys
inst [0 or "null"] value 9.7119999
inst [1 or "date"] value 16.097
inst [2 or "sum"] value 188.849
inst [3 or "cc"] value 208.659
inst [4 or "dns"] value 19.781
inst [5 or "dns-self"] value 30.115999
inst [6 or "dns-err"] value 19.612
inst [7 or "ypserv"] value 37.375
inst [8 or "rpcbind"] value 22.334999
inst [9 or "smtp"] value 21.353001
inst [10 or "nntp"] value 21.399
inst [11 or "hippi"] value 55.088001
inst [12 or "autofsd"] value 22.391001
shping.control.numcmd
value 13
shping.control.cycles
value 6963
shping.control.cycletime
value 120
shping.control.timeout
value 20
shping.control.debug
value 0
|