pcp
[Top] [All Lists]

Re: [pcp] pmcd gets stuck with pmda kill

To: Nathan Scott <nathans@xxxxxxxxxx>, Martins Innus <minnus@xxxxxxxxxxx>
Subject: Re: [pcp] pmcd gets stuck with pmda kill
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Thu, 29 Jan 2015 07:18:38 +1100
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1902595642.1770600.1422398645794.JavaMail.zimbra@xxxxxxxxxx>
References: <54C7FF66.5090503@xxxxxxxxxxx> <1902595642.1770600.1422398645794.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
On 28/01/15 09:44, Nathan Scott wrote:
> ....
> That makes sense, I think - pmcd is only noticing that the PMDA is
> gone once you request a metric from the failed PMDA.

But in the original scenario, pmcd KNOWS the PMDA is gone (it terminated the 
PMDA after the timeout) ... is it possible we have a permissions issue here?  
What are the effective uids of pmcd and the proc pmda process at the point of 
the timeout?  On my system, it looks like this ...

kenj@bozo:~$ ps -ef | egrep '/[p](mcd|mdaproc)'
pcp      26238  9047  0 Jan28 ?        00:00:02 /usr/lib/pcp/bin/pmcd -T 3
root     26253 26238  0 Jan28 ?        00:00:31 
/var/lib/pcp/pmdas/proc/pmdaproc -d 3

which means pmcd cannot kill the proc PMDA ... but we're OK!  I checked the 
pmcd code and we don't kill the timedout PMDA, we just close all the IPC 
channels (pipes in the this case) to it, which will cause it to shutdown of its 
own accord.

I tested this by pausing the proc PMDA with gdb, running a pminfo -v proc and 
waiting for the timeout.  pmcd and the proc PMDA both behaved as expected, and 
after this

kenj@bozo:~$ pminfo -f pmcd.agent.status

pmcd.agent.status
    inst [1 or "root"] value 0
    inst [2 or "pmcd"] value 0
    inst [3 or "proc"] value 8  <====== correct
    inst [11 or "xfs"] value 0
    inst [29 or "sample"] value 0
    inst [30 or "sampledso"] value 0
    inst [60 or "linux"] value 0
    inst [70 or "mmv"] value 0
    inst [122 or "jbd2"] value 0
    inst [253 or "simple"] value 0

So, I am having trouble understanding this "extra fetch" line of reasoning, 
except in the case where you kill (as opposed to suspend) the PMDA process, 
which is not the original scenario.

<Prev in Thread] Current Thread [Next in Thread>