pcp
[Top] [All Lists]

Re: nfs monitoring (also shping)

To: Alan Bailey <abailey@xxxxxxxxxxxxx>
Subject: Re: nfs monitoring (also shping)
From: kenmcd@xxxxxxxxxxxxxxxxx
Date: Sat, 6 Jan 2001 09:08:23 +1100 (EST)
Cc: pcp@xxxxxxxxxxx
In-reply-to: <Pine.LNX.4.10.10011141139230.26061-100000@xxxxxxxxxxxxxxxxxxx>
Reply-to: kenmcd@xxxxxxxxxxxxxxxxx
Sender: owner-pcp@xxxxxxxxxxx
On Tue, 14 Nov 2000, Alan Bailey wrote:

> What is the suggested way to monitor nfs mounts?  I want to make sure that
> nfs mounted directories are still mounted and up on the remote host.  I
> couldn't find anything by looking at the nfs.* or nfs3.* metrics.

The nfs*.* metrics will only show NFS client activity NFS server, from
which it is difficult to intuit if NFS mounts are still working.

I'd suggest the best way to monitor this is to use shping (when it
becomes available), and periodically

        - try to read a file that is known to exist on the NFS mounted FS
        - try to write to the NFS mounted FS (assuming you can arrange
          the protections appropriately)

When the NFS mount goes away, the read/write will either fail or be
timed out by shping ... in either case this is reflected in the value
of shping.status and/or shping.error.

Then use pmie to monitor the shping.status metric.

FYI, here is the help text for the shping metrics.

[root@kenj-pc shping]# pminfo -T shping

shping.status
Help:
As each command is executed, the success or failure is encoded in
shping.status, using the following values:

   -1   PMDA is initializing and command has not been run yet
    0   command completed and exit status was 0
    1   command completed and exit status was non-zero
    2   command was run but terminated by a signal
    3   command was run but did not complete (usually a timeout)
    4   command was not run due to some system error or resource
        availability

shping.error
Help:
As each command is executed, if there is a problem, the error
code or cause is stored in shping.error.

The interpretation of the value for shping.error depends on
shping.status as follows:

    If shping.status is 1 (the command was run but returned a non-zero
    exit status) then shping.error is the exit status.

    If shping.status is 2 (the command was run but was terminated by
    a signal) then shping.error is the signal number.

    If shping.status is 3 (the command did not complete) then
    shping.error is a PCP error codes: see pmerr(1).  Of particular
    relevance is -1008 (PM_ERR_TIMEOUT) when the command failed to
    complete in the time specified by shping.control.timeout.

    If shping.status is 4 (the commands was not run) then shping.error
    is the value of errno.

    Otherwise shping.error will be zero.

shping.cmd
Help:
The text of each sh(1) command run by the shping PMDA.

shping.time.real
Help:
This metric records the elapsed time in milliseconds for the most recent
execution of each command to be run by the shping PMDA.

Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion.  If the command timed out, shping.time.real will be -1.

shping.time.cpu_usr
Help:
This metric records the user mode CPU time in milliseconds for the most
recent execution of each command to be run by the shping PMDA.

Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion.  If the command timed out, shping.time.cpu_usr will be -1.

shping.time.cpu_sys
Help:
This metric records the system mode CPU time in milliseconds for the most
recent execution of each command to be run by the shping PMDA.

Care should be used when interpreting the value if the corresponding
value for shping.status is non-zero, as the command may not have run to
completion.  If the command timed out, shping.time.cpu_sys will be -1.

shping.control.numcmd
Help:
number of commands in the group to be run by the shping PMDA

shping.control.cycles
Help:
number of times the command group has been run by the shping PMDA

shping.control.cycletime
Help:
All commands are run by the shping PMDA are executed one after another
in a group, and the group is run once per "cycle" time.  This metric
reports the cycle time in seconds.

The cycle time may be changed dynamically by modifying this metric
with pmstore(1).

shping.control.timeout
Help:
The number of seconds the shping PMDA is willing to wait before
considering a single command to have timed out and killing it off.

The time out interval may be changed dynamically by modifying this
metric with pmstore(1).

shping.control.debug
Help:
The debug flag for the shping PMDA (see pmdbg(1)).  All trace and
diagnostic files are created in /var/adm/pcplog (unless $PCP_LOGDIR
is sent in the environment, see PMAPI(3)).

The debug flags DBG_TRACE_APPL0 (2048) and DBG_TRACE_APPL1 (4096)
may be used as follows:

DBG_TRACE_APPL0 - additional trace messages associated with the running
                  of each command appear in shping.log

DBG_TRACE_APPL1 - the standard output and standard error of each command
                  is appended to shping.out (instead of the default
                  /dev/null)

The debug flags may be changed dynamically by modifying this
metric with pmstore(1), e.g.
        $ pmstore shping.control.debug 6144
would enable both of the diagnostic traces associated with
DBG_TRACE_APPL0 and DBG_TRACE_APPL1.



<Prev in Thread] Current Thread [Next in Thread>
  • Re: nfs monitoring (also shping), kenmcd <=