pcp
[Top] [All Lists]

Re: [pcp] Dynamic Proc PMDA metrics

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] Dynamic Proc PMDA metrics
From: Martins Innus <minnus@xxxxxxxxxxx>
Date: Wed, 12 Nov 2014 15:53:28 -0500
Cc: pcp <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <2023629481.12445911.1415770455874.JavaMail.zimbra@xxxxxxxxxx>
References: <545D2CFA.1050600@xxxxxxxxxxx> <1477043892.10880496.1415600819766.JavaMail.zimbra@xxxxxxxxxx> <54627B7D.6040906@xxxxxxxxxxx> <2023629481.12445911.1415770455874.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
Nathan,

Sorry for the long email below. I have become completely confused with dynamic metrics and it seems like I'm trying to use the API in a way that was not intended. #2 below is the most troubling. You likely want to junk all my dynamic proc code until we get this figured out.

I'd be happy to jump on the phone if my explanations don't make sense. It took me a while to wind through this, so maybe I've got it all wrong.


On 11/12/14 12:34 AM, Nathan Scott wrote:
Hi Martins,

----- Original Message -----
[...]
You can ignore all of this until I am done. I just wanted to get this
out there since I don't know where in your cgroup work you are.  I think
the only conflict might be in proc_pid.c with my addition of an extra
parameter into some of the existing methods as I mentioned before.
Ah - OK - thanks for getting it out.  cgroups is progressing nicely, I'm
hoping end-of-this-week-ish it should be all revised.  So far, the rework
and refactoring is proving a neat improvement.

Just realized this is incomplete. Sorry, will need some more work and you shouldn't use it.

One thing I noticed and am trying to track down:

In the:

static void
refresh_metrictable(pmdaMetric *source, pmdaMetric *dest, int id)

callback that I provide.  The domain doesn't appear to be set in the
"source" at the time this call is made.  I check, and it is "0". But
everything seems to work.  I assume this gets set at some later time?
Yeah, hmm, can't remember exactly how this works - but the domain number
usually gets stamped into the static metrictable at pmdaInit time.  ISTR
some need to pass the domain number around, in order to stamp it into
the dynamic metric PMIDs as well ... but its a vague memory.  This may be
an area where the dynamic metrics libpcp_pmda code could be improved too,
BTW - not sure, but keep that in mind (might be easier to pull the domain
number out of the pmdaExt structure and writing it into the source, if it
is missing at the time the callback is called?  I don't think that could
harm any existing PMDA code)

Still no progress on this, except that with a dso pmda, the domain number is correct at this point. Illustrated below in the interrupts case.



While researching this, I've run into a few issues that stem from calling pmdaDynamicPMNS multiple times:

1. In the "size_metrictable" callback there is no context as to which dynamic tree is being queried if we use the same callback for all trees. So this is going to way overestimate the storage space needed, since we need to return the size of the largest sub tree if we use a common callback. The alternative is one callback per sub tree as far as I can see.



2. This is a larger problem. In dynamic.c, for the function:

static pmdaMetric *
dynamic_metric_table(int index, pmdaMetric *offset)

When running through the initial metric table, the only check that is done is to see if the cluster matches when doing "mtabupdate" and no check is done against the item or name. This will cause multiple calls for the same metric if we have clusters that span multiple trees (since pmdaDynamicMetricTable calls mtabupdate for each pmdaDynamicPMNS call we made), like we have in proc with CLUSTER_PID_STATUS.

I don't see a solution to this problem. The initial metric table has no name to compare against. When we call pmdaDynamicPMNS we don't provide a list of item ids that are valid. So neither of those checks can be done.

The only solution I can think of would be to allow mtabupdate to return a success code if the metric was added, and only then would dynamic_metric_table update its internal status. The callback would know whether the cluster/id combo was a valid dynamic metric. This would break any code that uses this interface.

I think there is a likelihood of buffer overflow here now. If the number of metrics in extra clusters is greater than the extra size introduced by the padding due to #1 above, i think there will be memory corruption.

Strangely, I haven't found any negative impacts from this. My guess is somewhere there are just multiple definitions of the same metric taking up space? Or maybe this has something to do with the one QA failure I couldn't debug.




3. In general, its not clear to me the proper way to generate the dynamic metrics from the initial "templates". For example, take the existing case in the linux pmda. For interrupts we have:

    /* kernel.percpu.interrupts.line[<N>] */
    { NULL, { PMDA_PMID(CLUSTER_INTERRUPT_LINES, 0), PM_TYPE_U32,
    CPU_INDOM, PM_SEM_COUNTER, PMDA_PMUNITS(0,0,1,0,0,PM_COUNT_ONE) }, },

    /* kernel.percpu.interrupts.[<other>] */
    { NULL, { PMDA_PMID(CLUSTER_INTERRUPT_OTHER, 0), PM_TYPE_U32,
    CPU_INDOM, PM_SEM_COUNTER, PMDA_PMUNITS(0,0,1,0,0,PM_COUNT_ONE) }, },

These are used as templates to create the dynamic metrics. But interrupts.c provides in "size_metrictable" the total number of dynamic metrics, so we end up with at least 2 extra metrics which as far as I can tell are not used.

From pmcd.log:

interrupts size_metrictable: 2 total x 11 trees
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae5f0) metric ID dup: 60.49.0 -> 60.49.1 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae610) metric ID dup: 60.49.0 -> 60.49.2 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae630) metric ID dup: 60.49.0 -> 60.49.3 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae650) metric ID dup: 60.49.0 -> 60.49.4 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae670) metric ID dup: 60.49.0 -> 60.49.5 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae690) metric ID dup: 60.49.0 -> 60.49.6 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae6b0) metric ID dup: 60.49.0 -> 60.49.7 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae6d0) metric ID dup: 60.49.0 -> 60.49.8 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae6f0) metric ID dup: 60.49.0 -> 60.49.9 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae710) metric ID dup: 60.49.0 -> 60.49.10 interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae730) metric ID dup: 60.49.0 -> 60.49.11 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae750) metric ID dup: 60.50.0 -> 60.50.1 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae770) metric ID dup: 60.50.0 -> 60.50.2 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae790) metric ID dup: 60.50.0 -> 60.50.3 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae7b0) metric ID dup: 60.50.0 -> 60.50.4 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae7d0) metric ID dup: 60.50.0 -> 60.50.5 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae7f0) metric ID dup: 60.50.0 -> 60.50.6 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae810) metric ID dup: 60.50.0 -> 60.50.7 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae830) metric ID dup: 60.50.0 -> 60.50.8 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae850) metric ID dup: 60.50.0 -> 60.50.9 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae870) metric ID dup: 60.50.0 -> 60.50.10 interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae890) metric ID dup: 60.50.0 -> 60.50.11 [Wed Nov 12 16:08:24] pmcd(13810) Debug: pmdaRehash: PMDA linux DSO: successful rebuild

But:

[vagrant@pcptest testsuite]$ pminfo -M kernel.percpu.interrupts
kernel.percpu.interrupts.MCP PMID: 60.50.10 = 251709450 = 0xf00c80a
kernel.percpu.interrupts.MCE PMID: 60.50.9 = 251709449 = 0xf00c809
kernel.percpu.interrupts.THR PMID: 60.50.8 = 251709448 = 0xf00c808
kernel.percpu.interrupts.TRM PMID: 60.50.7 = 251709447 = 0xf00c807
kernel.percpu.interrupts.TLB PMID: 60.50.6 = 251709446 = 0xf00c806
kernel.percpu.interrupts.CAL PMID: 60.50.5 = 251709445 = 0xf00c805
kernel.percpu.interrupts.RES PMID: 60.50.4 = 251709444 = 0xf00c804
kernel.percpu.interrupts.IWI PMID: 60.50.3 = 251709443 = 0xf00c803
kernel.percpu.interrupts.PMI PMID: 60.50.2 = 251709442 = 0xf00c802
kernel.percpu.interrupts.SPU PMID: 60.50.1 = 251709441 = 0xf00c801
kernel.percpu.interrupts.LOC PMID: 60.50.0 = 251709440 = 0xf00c800
kernel.percpu.interrupts.line20 PMID: 60.49.9 = 251708425 = 0xf00c409
kernel.percpu.interrupts.line19 PMID: 60.49.8 = 251708424 = 0xf00c408
kernel.percpu.interrupts.line16 PMID: 60.49.7 = 251708423 = 0xf00c407
kernel.percpu.interrupts.line15 PMID: 60.49.6 = 251708422 = 0xf00c406
kernel.percpu.interrupts.line14 PMID: 60.49.5 = 251708421 = 0xf00c405
kernel.percpu.interrupts.line12 PMID: 60.49.4 = 251708420 = 0xf00c404
kernel.percpu.interrupts.line9 PMID: 60.49.3 = 251708419 = 0xf00c403
kernel.percpu.interrupts.line8 PMID: 60.49.2 = 251708418 = 0xf00c402
kernel.percpu.interrupts.line1 PMID: 60.49.1 = 251708417 = 0xf00c401
kernel.percpu.interrupts.line0 PMID: 60.49.0 = 251708416 = 0xf00c400

So both of the .11 and one of the .10 metrics is orphaned? Does that cause any issue? Or since the pmns doesn't reference them, no harm except memory used. If that's the case, for my issue #2 above I can just set dummy cluster or item values for the metrics that are sent through twice(or more), but that's pretty wasteful and I'm sure will have other unintended consequences.

Thanks

Martins

<Prev in Thread] Current Thread [Next in Thread>