On Thu, May 31, 2007 at 03:38:31PM +1000, Nathan Scott wrote:
>
> I've switched the script over to have these now, and also added
> the additional "very verbose" (-V -V) diagnostics that the
> pmlogger_check script has - could you try out the attached
> script, in place of your current /usr/share/pcp/bin/pmie_check?
I didn't replace /usr/share/pcp/bin/pmie_check, but rather put
your script in /etc/cron.hourly/pmie_check.sh. Unfortunately
it also leaks out new instances for already running pmie's.
I killed all pmie's and restarted them using your script at
10:45AM, and everything was working fine until 8:00PM when
the pmie_check.sh seems to have launched 5 duplicates:
$ ps -ef|grep pmie | awk '{print $11}' | sort | uniq -c|grep -v " 1 "
2 dhcp1isp.mydomain.com
2 ldapm1.mydomain.com
2 ldapm2.mydomain.com
2 ns1.mydomain.com
2 tvservices.mydomain.com
The duplicates were all started at 8:01-8:02PM.
>
> If you still see the problem with this script, can you capture
> the ps -ef (ps -efw, ideally, cos thats what _get_pids_by_name
> does) output, and also the contents of /var/tmp/pmie (if you
> could make a tarball, that'd be great). And any additional
> diagnostics (from using "-V -V" options) that give hints as to
> why the pmie processes were started or not stopped.
I'll send a tarball privately..
Thanks for helping out!
-jf
|