pcp
[Top] [All Lists]

Re: pmie spawning more than 1 instance per host

To: Jan-Frode Myklebust <janfrode@xxxxxxxxx>
Subject: Re: pmie spawning more than 1 instance per host
From: Nathan Scott <nscott@xxxxxxxxxx>
Date: Thu, 31 May 2007 15:38:31 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <20070530082218.GA6332@lc4eb6380248654.ibm.com>
Organization: Aconex
References: <slrnf5nl5l.1lc.mykleb@lc4eb6380248654.ibm.com> <1180484426.6273.748.camel@edge> <20070530082218.GA6332@lc4eb6380248654.ibm.com>
Reply-to: nscott@xxxxxxxxxx
Sender: pcp-bounce@xxxxxxxxxxx
On Wed, 2007-05-30 at 10:22 +0200, Jan-Frode Myklebust wrote:
> 
> 
> The duplicates are on dhcp1tv, dhcp1voip, dhcp2voip, hermes, m1dhcp2,
> mobileprov, ns2, prov1, tvservices, ztc1 and ztc2. When I run a 
> /etc/init.d/pmie stop, these 11 will not stop and I have to kill them
> manually. After I start pmie from the initscript again, there's only
> one instance for each host, so I'm pretty confident it's the
> pmie_check that's mistakingly spawning these.

The /etc/init.d/pmie start script actually calls pmie_check to
do the work of stopping and starting the pmies, so I agree its
very likely the problem lies in pmie_check.

>From reviewing the pmie_check code, a few things stand out.
Firstly, this script hasn't been updated to use the platform
independent _get_pids_by_name like the pmlogger_check script
(and in fact, like the pmie start script has too).  Not does
it have the PWD fix from SGI PV #595416 that pmlogger_check
does, whatever that bug was (I dunno, its s3krit SGI stuff :).

I've switched the script over to have these now, and also added
the additional "very verbose" (-V -V) diagnostics that the
pmlogger_check script has - could you try out the attached
script, in place of your current /usr/share/pcp/bin/pmie_check?

(Note that you can call this by hand, at any time, to stop/start
your pmie instances - and it also has a "show me" mode (-S) that
wont stop/start any, but will tell you if anything would have
been changed and what commands would have been run).

If you still see the problem with this script, can you capture
the ps -ef (ps -efw, ideally, cos thats what _get_pids_by_name
does) output, and also the contents of /var/tmp/pmie (if you
could make a tarball, that'd be great).  And any additional
diagnostics (from using "-V -V" options) that give hints as to
why the pmie processes were started or not stopped.

cheers.

-- 
Nathan

Attachment: pmie_check.sh
Description: application/shellscript

<Prev in Thread] Current Thread [Next in Thread>