pcp
[Top] [All Lists]

Re: [pcp] Dynamic metric rework

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] Dynamic metric rework
From: Martins Innus <minnus@xxxxxxxxxxx>
Date: Wed, 10 Dec 2014 13:59:01 -0500
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <428025601.13619915.1418163615228.JavaMail.zimbra@xxxxxxxxxx>
References: <5481E4D7.8050700@xxxxxxxxxxx> <991616924.12928901.1418084187235.JavaMail.zimbra@xxxxxxxxxx> <54876B8F.2050106@xxxxxxxxxxx> <428025601.13619915.1418163615228.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
Nathan,
    OK, you can grab this commit on top of the previous one.

https://github.com/ubccr/pcp/tree/martins_working

commit 6116a3ec76ca327a0fb7de6c9ba3498a8979e53c
Author: Martins Innus <minnus@xxxxxxxxxxx>
Date:   Wed Dec 10 10:33:28 2014 -0500

    Cleanups for dynamic metrics
    Fix 660 qa failure
    Update function names and exports file for newly exposed function

 src/include/pcp/pmda.h        |    2 +-
 src/libpcp_pmda/src/dynamic.c |   14 ++++++++------
 src/libpcp_pmda/src/exports   |    5 +++++
 src/libpcp_pmda/src/tree.c    |    8 ++++----
 4 files changed, 18 insertions(+), 11 deletions(-)


Explanation below.

On 12/9/14 5:20 PM, Nathan Scott wrote:
I assume you mean add a new version (PCP_PMDA_3.4) at the bottom for
this new export?
Yep.
Done

I'm seeing a QA failure in test 660 - pmwebd is not seeing all of the names
for the interrupts metrics... (need to edit src/test_webapi.py to add debug
statements back in - commented out - this test needs some love to make it a
bit easier to diagnose these kinds of problems).

Is that one failing for you?  If not, I can dig more - I guess its not in
either of those two test groups (linux/proc) above.
Yeah, I wasn't building pmwebd on this host.  Got that up and running.
Looks like the error is coming from:

pmwebapi.cxx -> metric_list_traverse

The call to pmLookupDesc is failing on dynamic metrics. But the PMNS has
already been traversed successfully for them. Ran out of time today, I
will dig deeper tomorrow.  My guess is I missed a call to populate the
metric table for some case.

I assume there is some order that is different in terms of what the
web-api does from the standard command line tools.
Yeah that'd be my guess too.  If you're up for it, 660 could be split into
two - one that does the python part and another that does the rest - it'll
help with narrowing this down (the failure is in the python part, but you
get to wait the full ~30 seconds or so on each test iteration).

The python code also needs to be modified to always create the diagnostic
files needed to triage this class of failure (actually, I've got that fix
locally already from my initial look into this - I'll merge that shortly).

But I've got a note to return to this test to further split it up, if you
don't get to it as part of this work then don't worry about it.

  pminfo/pmval appear to work fine.
Yes, pmwebd is quite unusual (& unfortunately quite inefficient) in terms
of the pcp protocol requests it makes - eg see the pmLookupDesc/pmNameID
calls within the fetch decoding loop - its unique in at least this area.

OK, I fixed this. Took a while to figure out since it was not really triggered by pmwebd directly, but by the order of calls it was making. The short answer is: my mistake was not guarding against a "text" query being the first interaction a client has with a dynamic pmda. I had assumed there would be an instance or name query first for any dynamic metric.

The long answer is that the above assumption actually seems to be true as long as you are looking at just dynamic metrics. I didn't understand that the pmdaText callback was called for all metrics whether or not they were dynamic, as opposed to the pmdaPMID(), pmdaName() and pmdaChildren() calls that are only dispatched for dynamic metrics. The error occurred in 660, because it happened that a non dynamic metric asked for help text, yet pmdaDynamicLookupText was called. Because of my error, the pmns was setup but not the metric table. Then subsequent calls for dynamic metrics had incomplete initialization.

It is slightly curious that this only happened in local context, but I suppose that in the host context everything had already been setup properly from previous tests to the linux pmda. I only traced the local codepath since that is where the error showed up. Maybe the host code path deals with this differently.

My guess is that there should be some sort of low level qa test for this, but I'm not sure what it would do.

Thanks

Martins

<Prev in Thread] Current Thread [Next in Thread>