Comment # 6
on bug 1133
from Ken McDonell
Looks like no need for API changes, no hacks, no performance regressions and no
arguments.
The plan is to move the registration of the two anon metrics needed for event
record decoding from the event record unpack code paths into the derived metric
initialization routine __dminit(). This makes event.missed and event.flags
effectively visible in the PMNS for everyone, along with their pmDesc metadata.
First, code analysis
pmLookupDesc(pmid, *desc)
- no impact; __dmdesc() only called when regular path fails to
find pmid
pmFetch(numpmid, pmidlist[], **result)
- __pmPrepareFetch() is called unconditionally
- __dmprefetch() is then called unconditionally
- loop over pmidlist[] with check for !IS_DERIVED()
- this is the part I was concerned about ... but see performance
discussion below
- __pmFinishResult() not called if no derived metrics in pmidlist[]
pmLookupName(numpmid, *namelist[], pmidlist[])
- no impact; __dmgetpmid() only called when regular path fails to
metric name(s)
pmGetChildrenStatus(*name, ***offspring, **statuslist)
- __dmchildren() is called unconditionally
- this will call strcmp() for _every_ derived metric x
every descendant metric
- this is potentially an issue but pmGetChildrenStatus() is only
likely to be called at client start up (if at all), and (b) the
effect will be smaller than the "no measurable difference"
observed below for TraversePMNS()
pmNameID(pmid, **name)
- no impact; __dmgetname() only called when regular path fails to
find pmid
pmNameAll(pmid, ***namelist)
- no impact; __dmgetname() only called when regular path fails to
find pmid
TraversePMNS(*name, (*func), (*func_r), *closure)
- __dmtraverse() called unconditonally for every metric returned
from a remote pmcd
- this will call strcmp() for _every_ derived metric x
every metric found in the recursion
- this is potentially an issue but (a) TraversePMNS() is only
likely to be called once (if at all), and (b) in experiments
using the sample PMDA's PMNS with 160 metrics, there is no
measurable CPU time spent in TraversePMNS() for either the
old or the new code
To explore the pmFetch() impact ...
- built a new QA app (fetchloop) to do a hard loop of pmFetch() and
pmFreeResult() calls in a timing loop and reports usec per iteration
- repeat each experiment 5 times and report the mean and variance
- fetchloop -L -s 200000 sampledso
this one is fetching all instances of all the sampledso metrics using
a PM_CONTEXT_LOCAL context, so no PDUs and no pmcd on the code path
old: ave 84.8 var 0.44
new: ave 84.7 var 1.06
- fetchloop -s 40000 -c fetch.config.default
this one is fetching all the instances of all the kernel PMDA metrics
named in the default pmlogger configuration (some 285 metrics)
old: ave 155.7 var 1.51
new: ave 156.3 var 6.30
So there is no statistically significant performance degradation.
Provided there is no unexpected QA fallout, I plan to commit these changes.