Hi Martins,
----- Original Message -----
> Nathan,
>
> Sorry for the long email below. I have become completely confused
> with dynamic metrics and it seems like I'm trying to use the API in a
> way that was not intended. #2 below is the most troubling. You likely
> want to junk all my dynamic proc code until we get this figured out.
No problem.
>
> Just realized this is incomplete. Sorry, will need some more work and
> you shouldn't use it.
S'ok (I've been tied up with QA duties and cgroups, nearly done and I'll
be able to help you out more here).
> Still no progress on this, except that with a dso pmda, the domain
> number is correct at this point. Illustrated below in the interrupts case.
Looks like a bug - in the cgroups case, the domain# is explicitly
stamped into the PMID for the new metric table entry each time, so
I suspect we've never seen this before as a result.
> While researching this, I've run into a few issues that stem from
> calling pmdaDynamicPMNS multiple times:
>
> 1. In the "size_metrictable" callback there is no context as to which
> dynamic tree is being queried if we use the same callback for all trees.
(*nod* - use different callbacks?) Thats the old cgroups model anyway.
The PMID cluster# was the point of division used there - each cluster is
managed separately IIRC.
> So this is going to way overestimate the storage space needed, since we
> need to return the size of the largest sub tree if we use a common
> callback. The alternative is one callback per sub tree as far as I can see.
Yep. And that's OK - there's not many proc.* sub-trees (4/5?).
> 2. This is a larger problem. In dynamic.c, for the function:
>
> static pmdaMetric *
> dynamic_metric_table(int index, pmdaMetric *offset)
>
> When running through the initial metric table, the only check that
> is done is to see if the cluster matches when doing "mtabupdate" and no
> check is done against the item or name. This will cause multiple calls
> for the same metric if we have clusters that span multiple trees (since
If we take away "clusters spanning trees" this all gets simpler I think?
Am I oversimplifying things there? (apologies if so - I'll look at it
more closely as soon as I get this cgroup stuff finished).
> 3. In general, its not clear to me the proper way to generate the
> dynamic metrics from the initial "templates". For example, take the
Yeah, the code here is pretty freaky and non-obvious as I recall.
> /* kernel.percpu.interrupts.line[<N>] */
> { NULL, { PMDA_PMID(CLUSTER_INTERRUPT_LINES, 0), PM_TYPE_U32,
> CPU_INDOM, PM_SEM_COUNTER, PMDA_PMUNITS(0,0,1,0,0,PM_COUNT_ONE) }, },
>
> /* kernel.percpu.interrupts.[<other>] */
> { NULL, { PMDA_PMID(CLUSTER_INTERRUPT_OTHER, 0), PM_TYPE_U32,
> CPU_INDOM, PM_SEM_COUNTER, PMDA_PMUNITS(0,0,1,0,0,PM_COUNT_ONE) }, },
>
> These are used as templates to create the dynamic metrics. But
> interrupts.c provides in "size_metrictable" the total number of dynamic
> metrics, so we end up with at least 2 extra metrics which as far as I
> can tell are not used.
That was not the original intention, and sounds like a bug - it was
intended this would be allocating the correct amount of space, but
it looks like there's a few extras. It'd be good to dig more deeply
into that and understand why.
> So both of the .11 and one of the .10 metrics is orphaned? Does that
> cause any issue? Or since the pmns doesn't reference them, no harm
> except memory used. If that's the case, for my issue #2 above I can
Yeah, I think that's the case.
> just set dummy cluster or item values for the metrics that are sent
> through twice(or more), but that's pretty wasteful and I'm sure will
> have other unintended consequences.
Not that I've seen to date, but yep, we should certainly fix that up.
cheers.
--
Nathan
|