pcp
[Top] [All Lists]

Re: Suggested way of monitoring processes?

To: Alan Bailey <abailey@xxxxxxxxxxxxx>
Subject: Re: Suggested way of monitoring processes?
From: "Nathan Scott" <nathans@xxxxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 8 Nov 2000 10:02:45 -0400
Cc: pcp@xxxxxxxxxxx
In-reply-to: Alan Bailey <abailey@xxxxxxxxxxxxx> "Re: Suggested way of monitoring processes?" (Nov 7, 12:30pm)
References: <Pine.LNX.4.10.10011071218140.21391-100000@xxxxxxxxxxxxxxxxxxx>
Sender: owner-pcp@xxxxxxxxxxx
hi Alan,

On Nov 7, 12:30pm, Alan Bailey wrote:
> Subject: Re: Suggested way of monitoring processes?
> I've been messing around with pmie now.  As a first try, I'm writing a
> little rule to monitor an sshd process.  Here it is:
> 

very nice!

> delta = 3 seconds;
> sshd =
> some_inst match_inst "sshd" (
>   proc.psinfo.pid > 0
> ) -> shell 60 seconds "echo 'it exists' | mail -s 'it exists' abailey"
> 
> I've been running pmie from the command line, and the output sometimes
> looks like this:
> [snip]
> 
> So, there are ?'s appearing in places where I think they shouldn't.
> First, do ?'s occur when the instance that I'm looking for does not exist?
> Why is there always one during each transition, and why do they appear in
> the middle of streams of 'true's?
> 
> Does anyone have any insight in this problem, and possibly how I could get
> around it?
> 

pmie -v prints a '?' when it believes it doesn't have enough
information to completely evaluate the expression.  i've
usually come across it when evaluating counter metrics (rate
conversion requires two values), but that isn't the case here.

in this case, what i think is happening (from some experiments
using "sleep" in place of "sshd") is that whenever the set of
instances coming back from match_inst changes, pmie throws its
hands up in disgust, resets itself for the next metric fetch
and gives up on the current one.

this (i believe, Ken knows this code better than i do ;) is why
we get one '?' after each state change (sshd stop/start) and then
good data.  i don't really agree this is the correct behavior for
this situation, but i'll defer to Ken - perhaps there's something
i've missed.

for the second case where you see a '?' in a string of 'true's -
the only way I could reproduce that one was to have one "sleep"
running and then to start another (which is the same problem as
above - the set of instances coming back from match_inst changes)
- is it possible you had one sshd running & then started another?

so, i don't think theres any situation where pmie is lying to you,
its just a little indecisive at times :-)... it may be possible to
improve this.  for the purpose of tracking long-running processes
this shouldn't be too much of a problem (with relatively small
metric fetch deltas), but its certainly annoying though.

cheers.

-- 
Nathan

<Prev in Thread] Current Thread [Next in Thread>