pcp
[Top] [All Lists]

Re: [pcp] PCP Updates: Allow Connection to PMCD via Unix Domain Sockets

To: Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: [pcp] PCP Updates: Allow Connection to PMCD via Unix Domain Sockets
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 16 Jul 2013 19:56:19 -0400 (EDT)
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <51E5C15C.8060909@xxxxxxxxxx>
References: <51D5E449.7010304@xxxxxxxxxx> <1032942944.13639792.1373005231784.JavaMail.root@xxxxxxxxxx> <51DAD4FE.30408@xxxxxxxxxx> <51E33482.4050801@xxxxxxxxxx> <2109826205.1342704.1373931754469.JavaMail.root@xxxxxxxxxx> <51E5C15C.8060909@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: djQ69YVO6eXJGfy3mDcQXCtHHifV4A==
Thread-topic: PCP Updates: Allow Connection to PMCD via Unix Domain Sockets
Hey Dave,

----- Original Message -----
> On 07/15/2013 07:42 PM, Nathan Scott wrote:
> > To prove this, we need to know what metric is being requested. The
> > failure diagnostic should be telling us this but its not - I'll extend
> > the message to include that shortly, could you then re-run the test
> > and send through the new failure messages?
> New output attached. If you need more and can explain to me what you're
> looking for, it may be worth siccing systemtap on pmcd.

Heh, that previous long mail was meant to be the explanation.  :)

Anyway ... eureka! ... the new output is very revealing:

[DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error from metric PMID 
60.5.9[3]: Permission denied
[DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error from metric PMID 
60.5.7[3]: Permission denied
[DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error from metric PMID 
60.5.1[3]: Permission denied
[DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error from metric PMID 
60.5.8[3]: Permission denied


Domain number 60 there points the boney Finger Of Blame toward the Linux
kernel PMDA.  In particular something-external-to-the-test is requesting
values for the following metrics...

$ pminfo -m | egrep '60.5.9$|60.5.7$|60.5.1$|60.5.8$'
filesys.capacity PMID: 60.5.1
filesys.mountdir PMID: 60.5.7
filesys.full PMID: 60.5.8
filesys.blocksize PMID: 60.5.9

$ grep -A2 filesys /etc/pcp/pmie/config.default
// 1 filesys.filling
delta = 4 mins;
filesys.filling = 
some_host (
    some_inst (
        ( 100 * filesys.used  / filesys.capacity  ) > 95
        && filesys.used  + 
            20 min * ( rate filesys.used  ) >
                filesys.capacity 
    )
) -> syslog 10 min "File system is filling up" " %v%used[%i]@%h";

So ... yes, git bisect did not lie, it is indeed the fault of the change
to enable pmie and pmlogger!  But, not in a bad way, they are actually
running and doing what they are supposed to be doing - all good.  We need
to handle this in the test - it needs to be deterministic - even a remote
possibility of these requests arrive during the test needs to be removed.

For now, we can simply add pmie to the test 067 list of tools to signal
before running (done).  Longer term, perhaps this test could be directed
to make use of an alternate PMCD_PORT which would be a better solution.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>