pcp
[Top] [All Lists]

Re: Suggested way of monitoring processes?

To: pcp@xxxxxxxxxxx
Subject: Re: Suggested way of monitoring processes?
From: Alan Bailey <abailey@xxxxxxxxxxxxx>
Date: Tue, 7 Nov 2000 12:30:59 -0600 (CST)
In-reply-to: <10011060917.ZM115267@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-pcp@xxxxxxxxxxx
I've been messing around with pmie now.  As a first try, I'm writing a
little rule to monitor an sshd process.  Here it is:

delta = 3 seconds;
sshd =
some_inst match_inst "sshd" (
  proc.psinfo.pid > 0
) -> shell 60 seconds "echo 'it exists' | mail -s 'it exists' abailey"

I've been running pmie from the command line, and the output sometimes
looks like this:

[root@lanner pmie]# pmie -v ./config.default 
sshd: true

sshd: ?

sshd: ?

sshd: true

sshd: true

sshd: true

sshd: true

sshd: true
                       <-  I killed the process here
sshd: ?

sshd: false

sshd: false

sshd: false

sshd: false
                       <- I started the process again here
sshd: ?

sshd: true

sshd: true

sshd: ?

sshd: true

So, there are ?'s appearing in places where I think they shouldn't.
First, do ?'s occur when the instance that I'm looking for does not exist?
Why is there always one during each transition, and why do they appear in
the middle of streams of 'true's?

Does anyone have any insight in this problem, and possibly how I could get
around it?

Alan

On Mon, 6 Nov 2000, Nathan Scott wrote:

> hi,
> 
> On Nov 3,  8:13am, The Lemming wrote:
> > Subject: Re: Suggested way of monitoring processes?
> > ...
> >   I must say that I don't use PCP for this. We have a web portal, so we use 
> > PCP
> > only for performance monitoring. For availability monitoring, we use Spong. 
> > It
> > not only checks for processes, but also for disk, CPU load, ... Other part 
> > of it
> > does remote monitoring that checks ping, http server function (via trying 
> > GET),
> > smtp server (checks for welcome message) and many others.
> > 
> >   Spong allows you to define whom to page (send email) for which server 
> > and/or
> > service, even depending on the time of the event, allows you to delay 
> > message
> > for some time to prevent false alarms, it can send alarm message repeatedly
> > until problem is acknowledged via interface and so on. (I didn't 
> > investigated
> > pmie, so I don't know whether it has these functions.)
> > ...
> 
> Yes, pmie has all of these functions.  Used in conjuction with
> the (not yet opensource, but maybe one day?) shping PMDA, or
> a more specific PMDA like httpd/cisco/..., its very useful for
> remote service availability and response-time monitoring.  It
> has been used in base-IRIX to do exactly that for some time now.
> 
> cheers.
> 
> 
> NAME
>      pmie - inference engine for performance metrics
> 
> DESCRIPTION
>      pmie accepts a collection of arithmetic, logical, and rule expressions to
>      be evaluated at specified frequencies.  The base data for the expressions
>      consists of performance metrics values delivered in real-time from any
>      host running the Performance Metrics Collection Daemon (PMCD), or using
>      historical data from Performance Co-Pilot (PCP) archive logs.
> 
>      As well as computing arithmetic and logical values, pmie can execute
>      actions (popup alarms, write system log messages, and launch programs) in
>      response to specified conditions.  Such actions are extremely useful in
>      detecting, monitoring and correcting performance related problems.
> 
> 
> -- 
> Nathan
> 

-- 
 Alan Bailey


<Prev in Thread] Current Thread [Next in Thread>