pcp
[Top] [All Lists]

Re: pmcd gets stuck with pmda kill

To: Martins Innus <minnus@xxxxxxxxxxx>
Subject: Re: pmcd gets stuck with pmda kill
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Tue, 23 Feb 2016 15:17:12 -0500
Cc: Nathan Scott <nathans@xxxxxxxxxx>, Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, Lukas Berk <lberk@xxxxxxxxxx>, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <56CB5A03.5030008@xxxxxxxxxxx> (Martins Innus's message of "Mon, 22 Feb 2016 13:57:07 -0500")
References: <54C7FF66.5090503@xxxxxxxxxxx> <54C80E1F.1010909@xxxxxxxxxxxxxxxx> <54C93BFD.5090803@xxxxxxxxxxx> <54C93DED.9020601@xxxxxxxxxxxxxxxx> <54C94943.4040108@xxxxxxxxxxx> <54C95BAB.9050806@xxxxxxxxxxxxxxxx> <168392226.3529756.1422567735855.JavaMail.zimbra@xxxxxxxxxx> <54CAB7A1.1030204@xxxxxxxxxxx> <349735125.21806467.1455774818823.JavaMail.zimbra@xxxxxxxxxx> <56CB5A03.5030008@xxxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
Martins Innus <minnus@xxxxxxxxxxx> writes:

> [...]
> #################
> delta = 1 min;
>
> some_inst (
>     pmcd.agent.status != 0
> ) -> shell 10 min "pmsignal -s HUP -a pmcd"
>   & syslog 10 min "Restart unresponsive PMDAs" " pmda%i[%v]";
> #################

FWIW I'm not a fan of this approach, for a couple of reasons.

- it requires a separate process to be running & polling

- the polling implies a relatively slow response time, and a
  low-rate-limited response

- it cannot operate remotely (since pmsignal doesn't work across
  the network), thus can't be default-on in pmieconf

- should it happen that there are other pmcds running, for testing
  or whatever reasons, pmsignal will signal them all; we already
  have similar problems with the testsuite's and the rc.d scripts' 
  pmsignal calls killing unintended processes

I wonder why this seems in any way preferable to teaching pmcd or
pmdaroot to auto-restart failing pmdas?  They're at the right
place at the right time.


- FChE

<Prev in Thread] Current Thread [Next in Thread>