pcp
[Top] [All Lists]

Re: [pcp] PCP Updates: Allow Connection to PMCD via Unix Domain Sockets

To: Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: [pcp] PCP Updates: Allow Connection to PMCD via Unix Domain Sockets
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Mon, 15 Jul 2013 19:42:34 -0400 (EDT)
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <51E33482.4050801@xxxxxxxxxx>
References: <51D5E449.7010304@xxxxxxxxxx> <1032942944.13639792.1373005231784.JavaMail.root@xxxxxxxxxx> <51DAD4FE.30408@xxxxxxxxxx> <51E33482.4050801@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: MprBRJ+ylfjgfU6GLCA6LjO9s0z4Fg==
Thread-topic: PCP Updates: Allow Connection to PMCD via Unix Domain Sockets
Hi Dave,

----- Original Message -----
> On 07/08/2013 11:04 AM, Dave Brolley wrote:
> ...
> I spent quite a bit of time trying to track this down last week without
> much success. I finally resorted to 'git bisect' and the commit for
> which things go bad is:
> 
> commit 9cdfde093a6a2db48c049055267d2c92cdc62541
> Author: Nathan Scott <nathans@xxxxxxxxxx>
> Date:   Thu Jun 27 19:24:24 2013 +1000
> 
>      Generate the default pmlogger and pmie configuration files

That commit appears to be the root of all evil.  :|

> I'm not convinced that this commit introduced a bug. My feeling is that
> the change in configuration has exposed some existing problem. That's,
> unfortunately all I have to report on this.

No problem.  So, the failure is these unexpected lines:

qa$ diff 067.out.4 /tmp/067.out.bad 30a31,34
> [DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error: Permission denied
> [DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error: Permission denied
> [DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error: Permission denied
> [DATE] pmcd(PID) Error: pmdaFetch: Fetch callback error: Permission denied

Looking at test 067, it creates a pmcd.conf with 3 agents - the platform
kernel agent, the pmcd agent, and a socket test agent that listens on an
inet port.

The error is coming from libpcp_pmda, and it is telling us one of the PMDAs
was asked to fetch a value for some metric/instance pair, and instead of an
actual value, its returned EACCESS.

Unfortunately, we do not know which PMDA is giving the error, nor do we know
which metric.  Also, the test doesn't actually fetch any values, AFAICT!  So
I suspect we are getting some request from outside while the test runs (thus
an intermittent failure).  We can see this kind of thing has happened in the
past too, because the test starts out by killing any local pmchart/pmgadgets
/pmview processes.

We can immediately discount the pmcd metrics - because pmdapmcd.so does not
use libpcp_pmda.  I believe we can discount the test agent, for two reasons:
it is not a dso (hence its log messages would not be in pmcd.log) and it also
has no fetch callback - it doesn't even enter the usual PDU-processing loop
(that's part of what its testing).

So, unless there's something I'm missing in the socket-PMDA handling code in
pmcd, we can presume the metric being requested is a kernel metric.  This'd
strengthen our theory that the request is not coming from the test itself,
but rather some other client tool talking to pmcd.

To prove this, we need to know what metric is being requested.  The failure
diagnostic should be telling us this but its not - I'll extend the message
to include that shortly, could you then re-run the test and send through the
new failure messages?

> I hope your move went well.

Yes, very smoothly thanks - all settled in now.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>