pcp
[Top] [All Lists]

Re: [pcp] pmcd gets stuck with pmda kill

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, pcp@xxxxxxxxxxx
Subject: Re: [pcp] pmcd gets stuck with pmda kill
From: Martins Innus <minnus@xxxxxxxxxxx>
Date: Wed, 28 Jan 2015 15:40:35 -0500
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <54C93DED.9020601@xxxxxxxxxxxxxxxx>
References: <54C7FF66.5090503@xxxxxxxxxxx> <54C80E1F.1010909@xxxxxxxxxxxxxxxx> <54C93BFD.5090803@xxxxxxxxxxx> <54C93DED.9020601@xxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
Ken,

On 1/28/15 2:52 PM, Ken McDonell wrote:
On 29/01/15 06:43, Martins Innus wrote:
...
The main issue I'm trying to solve is that for us, when a system gets
heavily loaded (seems to correlate to high I/O) and we have pmlogger
grabbing metrics from the proc pmda at regular intervals, we get the
following in the pmcd.log:


[Thu Jan 15 10:29:25] pmcd(15873) Warning: pduread: timeout (after 5.000
sec) while attempting to read 12 bytes out of 12 in HDR on fd=11
[Thu Jan 15 10:29:25] pmcd(15873) Info: CleanupAgent ...
Cleanup "proc" agent (dom 3): protocol failure for fd=11


I'd like to not increase the timeout since then we are reporting
incorrect timestamps for collected data, so I was going to use pmie to
restart pmcd when the pmda dies.

OK, but you don't need to restart pmcd (that is expensive and disrupts the data stream for the other PMDAs that you might be logging). Sending pmcd a SIGHUP will restart the proc PMDA.
Thanks, I wasn't aware of that but it doesn't seem to work in my case. Similar setup to my other email. No other pcp services running ( pmlogger, pmie, pmwebd, pmmgr all stopped).

[vagrant@centos65 root]$ sudo /sbin/service pmcd restart
Waiting for pmcd to terminate ...
Starting pmcd ...

[vagrant@centos65 root]$ sudo killall -v pmdaproc
Killed pmdaproc(502) with signal 15

[vagrant@centos65 root]$ ps -ef |grep pcp
pcp        494     1  0 20:10 ?        00:00:00 /usr/libexec/pcp/bin/pmcd
root 507 494 0 20:10 ? 00:00:00 /var/lib/pcp/pmdas/xfs/pmdaxfs -d 11 pcp 509 494 0 20:10 ? 00:00:00 /var/lib/pcp/pmdas/sample/pmdasample -d 29 root 511 494 0 20:10 ? 00:00:00 /var/lib/pcp/pmdas/linux/pmdalinux
vagrant    515 25392  0 20:11 pts/1    00:00:00 grep pcp

[vagrant@centos65 root]$ sudo kill -HUP 494

[vagrant@centos65 root]$ ps -ef |grep pcp
pcp        494     1  0 20:10 ?        00:00:00 /usr/libexec/pcp/bin/pmcd
root 507 494 0 20:10 ? 00:00:00 /var/lib/pcp/pmdas/xfs/pmdaxfs -d 11 pcp 509 494 0 20:10 ? 00:00:00 /var/lib/pcp/pmdas/sample/pmdasample -d 29 root 511 494 0 20:10 ? 00:00:00 /var/lib/pcp/pmdas/linux/pmdalinux
pcp        518   494  0 20:11 ?        00:00:00 [pmdaproc] <defunct>
vagrant    520 25392  0 20:11 pts/1    00:00:00 grep pcp



The following in pmcd.log

[Wed Jan 28 20:11:19] pmcd(494) Info: CleanupAgent ...
Cleanup "proc" agent (dom 3): protocol failure for fd=9, signal(15)
Configuration file '/etc/pcp/pmcd/pmcd.conf' unchanged
Restarting any deceased agents:
    "proc" agent

pmcd: unexpected end-of-file at initial exchange with proc PMDA


I assume the cleanup agent is from the "killall -v pmdaproc" and expected?




Also, just got your email on the gdb trick. Tried that with the same results:

from pmcd.log:

[Wed Jan 28 20:35:42] pmcd(1407) Warning: pduread: timeout (after 5.000 sec) while attempting to read 12 bytes out of 12 in HDR on fd=9
[Wed Jan 28 20:35:42] pmcd(1407) Info: CleanupAgent ...
Cleanup "proc" agent (dom 3): protocol failure for fd=9
[Wed Jan 28 20:36:08] pmcd(1407) Info:

pmcd RESTARTED at Wed Jan 28 20:36:08 2015


Current PMCD clients ...
fd client connection from ipc ver operations denied == ======================================== ======= =================

Configuration file '/etc/pcp/pmcd/pmcd.conf' unchanged
Restarting any deceased agents:
    "proc" agent

pmcd: unexpected end-of-file at initial exchange with proc PMDA


The "RESTARTED" message is the time that I sent the kill -HUP

Thanks

Martins

<Prev in Thread] Current Thread [Next in Thread>