pcp
[Top] [All Lists]

pmcd gets stuck with pmda kill

To: pcp@xxxxxxxxxxx
Subject: pmcd gets stuck with pmda kill
From: Martins Innus <minnus@xxxxxxxxxxx>
Date: Tue, 27 Jan 2015 16:13:10 -0500
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
Hi,
I am trying to work around the "slow proc pmda gets killed by pmcd" issue by using pmie and have run into an issue with pmcd getting "stuck" in some way.

I have the following rule for pmie:

*************
delta = 2 min;
pmcd.agent.status #'proc' > 1
-> shell "/etc/pcp/pmie/procpmda_check.sh";
*************

Where for now, all the shell script does is log an event. This is kind of convoluted, but I have boiled the failure down to the following:


1. kill <pid_of_proc_pmda>

2. The following appears in the pmie.log when the pmie check is supposed to occur: [Tue Jan 27 15:30:12] pmie(16399) Error: pmFetch from d13n01 failed: IPC protocol failure [Tue Jan 27 15:30:12] pmie(16399) Info: Lost connection to pmcd on host d13n01 [Tue Jan 27 15:30:17] pmie(16399) Info: Re-established connection to pmcd on host d13n01

3. The procpmda_check.sh never gets called. Even after waiting 10 minutes, so several 2 minute cycles should have occurred.



If instead I do the following:

1. kill <pid_of_proc_pmda>

2. pminfo
      Error: Broken pipe

3. pminfo
      <correct pminfo output>

4. procpmda_check.sh gets called properly to signal that the pmda has died at the appropriate time.


As another point, there is no pmlogger running when I do this. Nothing interesting in the pmcd.log or proc.log. The pmcd process and all other pmda processes are running the whole time. I know that the process by which the proc_pmda is killed is not the same as pmcd would do it in practice, but it was the only way I could think of simulating the behavior. Any thoughts on the best way to debug this?

Thanks

Martins

<Prev in Thread] Current Thread [Next in Thread>