Ken,
On 1/28/15 2:52 PM, Ken McDonell wrote:
On 29/01/15 06:43, Martins Innus wrote:
...
The main issue I'm trying to solve is that for us, when a system gets
heavily loaded (seems to correlate to high I/O) and we have pmlogger
grabbing metrics from the proc pmda at regular intervals, we get the
following in the pmcd.log:
[Thu Jan 15 10:29:25] pmcd(15873) Warning: pduread: timeout (after 5.000
sec) while attempting to read 12 bytes out of 12 in HDR on fd=11
[Thu Jan 15 10:29:25] pmcd(15873) Info: CleanupAgent ...
Cleanup "proc" agent (dom 3): protocol failure for fd=11
I'd like to not increase the timeout since then we are reporting
incorrect timestamps for collected data, so I was going to use pmie to
restart pmcd when the pmda dies.
OK, but you don't need to restart pmcd (that is expensive and disrupts
the data stream for the other PMDAs that you might be logging).
Sending pmcd a SIGHUP will restart the proc PMDA.
Thanks, I wasn't aware of that but it doesn't seem to work in my case.
Similar setup to my other email. No other pcp services running (
pmlogger, pmie, pmwebd, pmmgr all stopped).
[vagrant@centos65 root]$ sudo /sbin/service pmcd restart
Waiting for pmcd to terminate ...
Starting pmcd ...
[vagrant@centos65 root]$ sudo killall -v pmdaproc
Killed pmdaproc(502) with signal 15
[vagrant@centos65 root]$ ps -ef |grep pcp
pcp 494 1 0 20:10 ? 00:00:00 /usr/libexec/pcp/bin/pmcd
root 507 494 0 20:10 ? 00:00:00
/var/lib/pcp/pmdas/xfs/pmdaxfs -d 11
pcp 509 494 0 20:10 ? 00:00:00
/var/lib/pcp/pmdas/sample/pmdasample -d 29
root 511 494 0 20:10 ? 00:00:00
/var/lib/pcp/pmdas/linux/pmdalinux
vagrant 515 25392 0 20:11 pts/1 00:00:00 grep pcp
[vagrant@centos65 root]$ sudo kill -HUP 494
[vagrant@centos65 root]$ ps -ef |grep pcp
pcp 494 1 0 20:10 ? 00:00:00 /usr/libexec/pcp/bin/pmcd
root 507 494 0 20:10 ? 00:00:00
/var/lib/pcp/pmdas/xfs/pmdaxfs -d 11
pcp 509 494 0 20:10 ? 00:00:00
/var/lib/pcp/pmdas/sample/pmdasample -d 29
root 511 494 0 20:10 ? 00:00:00
/var/lib/pcp/pmdas/linux/pmdalinux
pcp 518 494 0 20:11 ? 00:00:00 [pmdaproc] <defunct>
vagrant 520 25392 0 20:11 pts/1 00:00:00 grep pcp
The following in pmcd.log
[Wed Jan 28 20:11:19] pmcd(494) Info: CleanupAgent ...
Cleanup "proc" agent (dom 3): protocol failure for fd=9, signal(15)
Configuration file '/etc/pcp/pmcd/pmcd.conf' unchanged
Restarting any deceased agents:
"proc" agent
pmcd: unexpected end-of-file at initial exchange with proc PMDA
I assume the cleanup agent is from the "killall -v pmdaproc" and expected?
Also, just got your email on the gdb trick. Tried that with the same
results:
from pmcd.log:
[Wed Jan 28 20:35:42] pmcd(1407) Warning: pduread: timeout (after 5.000
sec) while attempting to read 12 bytes out of 12 in HDR on fd=9
[Wed Jan 28 20:35:42] pmcd(1407) Info: CleanupAgent ...
Cleanup "proc" agent (dom 3): protocol failure for fd=9
[Wed Jan 28 20:36:08] pmcd(1407) Info:
pmcd RESTARTED at Wed Jan 28 20:36:08 2015
Current PMCD clients ...
fd client connection from ipc ver operations
denied
== ======================================== =======
=================
Configuration file '/etc/pcp/pmcd/pmcd.conf' unchanged
Restarting any deceased agents:
"proc" agent
pmcd: unexpected end-of-file at initial exchange with proc PMDA
The "RESTARTED" message is the time that I sent the kill -HUP
Thanks
Martins
|