On 29/07/14 05:47, Martin Spier wrote:
Here it is:
kernel.pct.cpu.user = 100 * kernel.all.cpu.user / hinv.ncpu
kernel.pct.cpu.sys = 100 * kernel.all.cpu.sys / hinv.ncpu
Same definition Amer posted before. Think it came from:
http://www.performancecopilot.org/pcp.git/man/html/howto.cpuperf.html
As I suspected ...
Note in that web page, the table is headed "PCP equivalent (assuming
rate conversion)" ... we don't have any rate conversion in play here.
The expressions above will produce exactly the floating point precision
problem Martin has observed.
The options are ...
1. Revisit the design specs for pmwebd and see if it makes sense for
this daemon to be performing per-client rate conversion (so taking on
some of the role of a PMAPI client, like pmie, pmval, pmchart,
pmdumptext, ... and detecting the counter semantics of metrics and rate
converting them). In this case the formulae and derived metrics above
would work.
2. Push the rate conversion arithmetic out the the pmwebd clients ...
this involves keeping the last observed value and the last timestamp,
then computing delta(value) / delta(timestamp), and you could do the
*100 and /hinv at the same time. I am guessing this is not attractive
option.
3. Extend the derived metrics support. We already have delta() which
can be applied to counter metrics and returns the difference in value
between one pmFetch and the next. This is closer to the semantics
Martin needs, but does not include the divide by delta(timestamp) part.
I could add rate() as a new intrinsic function for derived metrics
that does the rate conversion.
With option 3. the derived metric definitions would be something like ...
kernel.pct.cpu.user = 100 * rate(kernel.all.cpu.user) / hinv.ncpu
Note that rate(kernel.all.cpu.user) would be a double precision number
but restricted to the (small) interval [0, hinv.ncpu].
Before jumping into 3., I'd like to hear feedback on options 1. and 2.
|