Nathan,
Sorry for the long email below. I have become completely confused
with dynamic metrics and it seems like I'm trying to use the API in a
way that was not intended. #2 below is the most troubling. You likely
want to junk all my dynamic proc code until we get this figured out.
I'd be happy to jump on the phone if my explanations don't make sense.
It took me a while to wind through this, so maybe I've got it all wrong.
On 11/12/14 12:34 AM, Nathan Scott wrote:
Hi Martins,
----- Original Message -----
[...]
You can ignore all of this until I am done. I just wanted to get this
out there since I don't know where in your cgroup work you are. I think
the only conflict might be in proc_pid.c with my addition of an extra
parameter into some of the existing methods as I mentioned before.
Ah - OK - thanks for getting it out. cgroups is progressing nicely, I'm
hoping end-of-this-week-ish it should be all revised. So far, the rework
and refactoring is proving a neat improvement.
Just realized this is incomplete. Sorry, will need some more work and
you shouldn't use it.
One thing I noticed and am trying to track down:
In the:
static void
refresh_metrictable(pmdaMetric *source, pmdaMetric *dest, int id)
callback that I provide. The domain doesn't appear to be set in the
"source" at the time this call is made. I check, and it is "0". But
everything seems to work. I assume this gets set at some later time?
Yeah, hmm, can't remember exactly how this works - but the domain number
usually gets stamped into the static metrictable at pmdaInit time. ISTR
some need to pass the domain number around, in order to stamp it into
the dynamic metric PMIDs as well ... but its a vague memory. This may be
an area where the dynamic metrics libpcp_pmda code could be improved too,
BTW - not sure, but keep that in mind (might be easier to pull the domain
number out of the pmdaExt structure and writing it into the source, if it
is missing at the time the callback is called? I don't think that could
harm any existing PMDA code)
Still no progress on this, except that with a dso pmda, the domain
number is correct at this point. Illustrated below in the interrupts case.
While researching this, I've run into a few issues that stem from
calling pmdaDynamicPMNS multiple times:
1. In the "size_metrictable" callback there is no context as to which
dynamic tree is being queried if we use the same callback for all trees.
So this is going to way overestimate the storage space needed, since we
need to return the size of the largest sub tree if we use a common
callback. The alternative is one callback per sub tree as far as I can see.
2. This is a larger problem. In dynamic.c, for the function:
static pmdaMetric *
dynamic_metric_table(int index, pmdaMetric *offset)
When running through the initial metric table, the only check that
is done is to see if the cluster matches when doing "mtabupdate" and no
check is done against the item or name. This will cause multiple calls
for the same metric if we have clusters that span multiple trees (since
pmdaDynamicMetricTable calls mtabupdate for each pmdaDynamicPMNS call we
made), like we have in proc with CLUSTER_PID_STATUS.
I don't see a solution to this problem. The initial metric table
has no name to compare against. When we call pmdaDynamicPMNS we don't
provide a list of item ids that are valid. So neither of those checks
can be done.
The only solution I can think of would be to allow mtabupdate to
return a success code if the metric was added, and only then would
dynamic_metric_table update its internal status. The callback would
know whether the cluster/id combo was a valid dynamic metric. This
would break any code that uses this interface.
I think there is a likelihood of buffer overflow here now. If the
number of metrics in extra clusters is greater than the extra size
introduced by the padding due to #1 above, i think there will be memory
corruption.
Strangely, I haven't found any negative impacts from this. My guess is
somewhere there are just multiple definitions of the same metric taking
up space? Or maybe this has something to do with the one QA failure I
couldn't debug.
3. In general, its not clear to me the proper way to generate the
dynamic metrics from the initial "templates". For example, take the
existing case in the linux pmda. For interrupts we have:
/* kernel.percpu.interrupts.line[<N>] */
{ NULL, { PMDA_PMID(CLUSTER_INTERRUPT_LINES, 0), PM_TYPE_U32,
CPU_INDOM, PM_SEM_COUNTER, PMDA_PMUNITS(0,0,1,0,0,PM_COUNT_ONE) }, },
/* kernel.percpu.interrupts.[<other>] */
{ NULL, { PMDA_PMID(CLUSTER_INTERRUPT_OTHER, 0), PM_TYPE_U32,
CPU_INDOM, PM_SEM_COUNTER, PMDA_PMUNITS(0,0,1,0,0,PM_COUNT_ONE) }, },
These are used as templates to create the dynamic metrics. But
interrupts.c provides in "size_metrictable" the total number of dynamic
metrics, so we end up with at least 2 extra metrics which as far as I
can tell are not used.
From pmcd.log:
interrupts size_metrictable: 2 total x 11 trees
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae5f0)
metric ID dup: 60.49.0 -> 60.49.1
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae610)
metric ID dup: 60.49.0 -> 60.49.2
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae630)
metric ID dup: 60.49.0 -> 60.49.3
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae650)
metric ID dup: 60.49.0 -> 60.49.4
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae670)
metric ID dup: 60.49.0 -> 60.49.5
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae690)
metric ID dup: 60.49.0 -> 60.49.6
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae6b0)
metric ID dup: 60.49.0 -> 60.49.7
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae6d0)
metric ID dup: 60.49.0 -> 60.49.8
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae6f0)
metric ID dup: 60.49.0 -> 60.49.9
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae710)
metric ID dup: 60.49.0 -> 60.49.10
interrupts refresh_metrictable: (0x7f7c092010a0 -> 0x7f7c0d2ae730)
metric ID dup: 60.49.0 -> 60.49.11
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae750)
metric ID dup: 60.50.0 -> 60.50.1
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae770)
metric ID dup: 60.50.0 -> 60.50.2
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae790)
metric ID dup: 60.50.0 -> 60.50.3
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae7b0)
metric ID dup: 60.50.0 -> 60.50.4
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae7d0)
metric ID dup: 60.50.0 -> 60.50.5
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae7f0)
metric ID dup: 60.50.0 -> 60.50.6
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae810)
metric ID dup: 60.50.0 -> 60.50.7
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae830)
metric ID dup: 60.50.0 -> 60.50.8
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae850)
metric ID dup: 60.50.0 -> 60.50.9
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae870)
metric ID dup: 60.50.0 -> 60.50.10
interrupts refresh_metrictable: (0x7f7c092010c0 -> 0x7f7c0d2ae890)
metric ID dup: 60.50.0 -> 60.50.11
[Wed Nov 12 16:08:24] pmcd(13810) Debug: pmdaRehash: PMDA linux DSO:
successful rebuild
But:
[vagrant@pcptest testsuite]$ pminfo -M kernel.percpu.interrupts
kernel.percpu.interrupts.MCP PMID: 60.50.10 = 251709450 = 0xf00c80a
kernel.percpu.interrupts.MCE PMID: 60.50.9 = 251709449 = 0xf00c809
kernel.percpu.interrupts.THR PMID: 60.50.8 = 251709448 = 0xf00c808
kernel.percpu.interrupts.TRM PMID: 60.50.7 = 251709447 = 0xf00c807
kernel.percpu.interrupts.TLB PMID: 60.50.6 = 251709446 = 0xf00c806
kernel.percpu.interrupts.CAL PMID: 60.50.5 = 251709445 = 0xf00c805
kernel.percpu.interrupts.RES PMID: 60.50.4 = 251709444 = 0xf00c804
kernel.percpu.interrupts.IWI PMID: 60.50.3 = 251709443 = 0xf00c803
kernel.percpu.interrupts.PMI PMID: 60.50.2 = 251709442 = 0xf00c802
kernel.percpu.interrupts.SPU PMID: 60.50.1 = 251709441 = 0xf00c801
kernel.percpu.interrupts.LOC PMID: 60.50.0 = 251709440 = 0xf00c800
kernel.percpu.interrupts.line20 PMID: 60.49.9 = 251708425 = 0xf00c409
kernel.percpu.interrupts.line19 PMID: 60.49.8 = 251708424 = 0xf00c408
kernel.percpu.interrupts.line16 PMID: 60.49.7 = 251708423 = 0xf00c407
kernel.percpu.interrupts.line15 PMID: 60.49.6 = 251708422 = 0xf00c406
kernel.percpu.interrupts.line14 PMID: 60.49.5 = 251708421 = 0xf00c405
kernel.percpu.interrupts.line12 PMID: 60.49.4 = 251708420 = 0xf00c404
kernel.percpu.interrupts.line9 PMID: 60.49.3 = 251708419 = 0xf00c403
kernel.percpu.interrupts.line8 PMID: 60.49.2 = 251708418 = 0xf00c402
kernel.percpu.interrupts.line1 PMID: 60.49.1 = 251708417 = 0xf00c401
kernel.percpu.interrupts.line0 PMID: 60.49.0 = 251708416 = 0xf00c400
So both of the .11 and one of the .10 metrics is orphaned? Does that
cause any issue? Or since the pmns doesn't reference them, no harm
except memory used. If that's the case, for my issue #2 above I can
just set dummy cluster or item values for the metrics that are sent
through twice(or more), but that's pretty wasteful and I'm sure will
have other unintended consequences.
Thanks
Martins
|