pcp
[Top] [All Lists]

Re: Missing pcp.env.sh, and some questions.

To: lstepml@xxxxxxx, pcp@xxxxxxxxxxx
Subject: Re: Missing pcp.env.sh, and some questions.
From: kenmcd@xxxxxxxxxxxxxxxxxxxxxxx (Ken McDonell)
Date: Fri, 24 Dec 1999 11:18:11 +1100
In-reply-to: Luc Stepniewski <lstep@free.fr> "Missing pcp.env.sh, and some questions." (Dec 24, 10:10)
References: <38629D99.119AD90B@free.fr>
Sender: owner-pcp@xxxxxxxxxxx
On Dec 24, 10:10, Luc Stepniewski wrote:
> Subject: Missing pcp.env.sh, and some questions.
> Hello,
>
> I installed PCP on my machine. I find it is really wonderful. ...

That's nice to hear.

> ... I just got
> one
> problem: It is when I tried to launch pmie, it complained about not
> finding the
> file named /etc/pcp.conf.sh (tried on Debian and Redhat) I looked for it
> everywhere but I couldn't find it.
>
> So:
> I copied the pcp.conf file as pcp.conf.sh, added the '#!/bin/sh' at the
> beginning,
> and added to the /etc/init.d/pmie file the following statement (because
> IS_ON
> is nowhere :-):
>
> IS_ON=false
>
> Then I modified the first statement of /etc/init.d/pmie which seems to
> have a bug:
> I modified:
> if [ -fi ${PCP_CONF:-/etc/pcp.conf}.sh ] ; then
> to:
> if [ -f ${PCP_CONF:-/etc/pcp.conf}.sh ] ; then
>
> (removed the 'i').
>
> It now works really fine (even pmie :-).

Sorry about this.  We had tested pmie in interactive mode, but the *rc* support
for starting pmie instances automagically was clearly broken (you found 3
problems immediately).  We had similar issues with the pmcd *rc* script that
required a lot of re-writing to migrate from Irix to work in both Irix and
Linux.  Similar work remains to be done on the pmie *rc* script and we'll fix
it before the next release on oss.sgi.com.

> There is an important variable which is missing from the standard ones,
> (coming
> from /proc/sys/fs/file-nr). It is the number of currently open files
> (and the
> maximum assigned file number, and the maximum allocatable file number).
>
> Their description is the following:
> "The three values in file-nr denote the number of allocated file
> handles, the
> number of used file handles, and the maximum number of file handles.
> When the
> allocated file handles come close to the maximum, but the number of
> actually
> used ones is far behind, you've encountered a peak in your usage of file
> handles
> and you don't need to increase the maximum. "
> (From http://www.bb-zone.com/Proc/chapter2.html#section2.1)
>
> I think I can patch pcp to add it myself, but I don't know in which
> hierarchy I must put it into. In filesys.*, there are already variables
> named
> filesys.maxfiles, filesys.usedfiles and filesys.freefiles, but they are
> totally
> unrelated to the /proc/sys/fs/file-nr variables.

Good observation.  If you have the fix, we'd be glad to incorporate it.  As to
the name of the metric,  filesys.* may not be the right place as these metrics
are all enumerated with one value per local mounted filesystem.

I'd suggest
        - filesys.kernel.openfiles and filesys.kernel.maxfiles
or
        - kernel.openfiles and kernel.maxfiles

> About practical use of pmie, I'm looking for an example on how to
> monitor
> the presence of a daemon on a system. For example, I have an apache
> daemon, and
> I'd like to get an alert if it dies, or if there are too many instances
> of
> it.
> The only near useful variable that I see is proc.psinfo.pid. I used it
> like this:
>
> eurythro:/home/enlight# pmie -v
> val = proc.psinfo.pid #'/usr/sbin/apache';
> val: 254
>
> val: 254
>
> val: 254
>
> val: 254
>
> val: ?
>
> val: ?
>
> val: ?
>
> val: ?
>
>
> At the fifth iteration, I stopped the daemon. And it stops returning its
> pid, which is cool, but it never get a pid again, even when I restart
> apache. I don't understand why...Another problem with this, is that I
> don't have the number of running daemons :-(
> Any ideas ?

This is a hard problem to solve with pmie.  pmie tries (but does not always
succeed) to re-evaluate instance domains that change undernearth it ... I'll
need a bit longer to investigate this one, because the behaviour of pmie to
find the process at all from the instance "/usr/sbin/apache" is unexpected ...
the proper instance name would be "254 /usr/sbin/apache".

To count processes by name, this will work better ...

        count_inst match_inst "/usr/sbin/apache" proc.psinfo.pid > 0;

Note the proc.psinfo.pid > 0 is universally true and produces a set result for
all processes, to which the (regular expression) instance matching and counting
predicates are applied.

<Prev in Thread] Current Thread [Next in Thread>