pcp
[Top] [All Lists]

Re: [pcp] Floating point problem

To: Martin Spier <mspier@xxxxxxxxxxx>
Subject: Re: [pcp] Floating point problem
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Tue, 29 Jul 2014 08:27:54 +1000
Cc: pcp@xxxxxxxxxxx, Amer Ather <aather@xxxxxxxxxxx>, Coburn Watson <cwatson@xxxxxxxxxxx>, Brendan Gregg <bgregg@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <CAEp4+dUH6fEQ2E=o5O2q8LKfR2xUypM-AeOwQhWy9sEntvO-AQ@xxxxxxxxxxxxxx>
References: <CAEp4+dU2kE9JJztBPc=N5oSyoEyBvN5Of19rohC3DxXGeomuRw@xxxxxxxxxxxxxx> <033501cfa8a4$fd091ed0$f71b5c70$@internode.on.net> <CAEp4+dUH6fEQ2E=o5O2q8LKfR2xUypM-AeOwQhWy9sEntvO-AQ@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0
On 29/07/14 05:47, Martin Spier wrote:
Here it is:

kernel.pct.cpu.user = 100 * kernel.all.cpu.user / hinv.ncpu
kernel.pct.cpu.sys  = 100 * kernel.all.cpu.sys / hinv.ncpu

Same definition Amer posted before. Think it came from:

http://www.performancecopilot.org/pcp.git/man/html/howto.cpuperf.html


As I suspected ...

Note in that web page, the table is headed "PCP equivalent (assuming rate conversion)" ... we don't have any rate conversion in play here.

The expressions above will produce exactly the floating point precision problem Martin has observed.

The options are ...

1. Revisit the design specs for pmwebd and see if it makes sense for this daemon to be performing per-client rate conversion (so taking on some of the role of a PMAPI client, like pmie, pmval, pmchart, pmdumptext, ... and detecting the counter semantics of metrics and rate converting them). In this case the formulae and derived metrics above would work.

2. Push the rate conversion arithmetic out the the pmwebd clients ... this involves keeping the last observed value and the last timestamp, then computing delta(value) / delta(timestamp), and you could do the *100 and /hinv at the same time. I am guessing this is not attractive option.

3. Extend the derived metrics support. We already have delta() which can be applied to counter metrics and returns the difference in value between one pmFetch and the next. This is closer to the semantics Martin needs, but does not include the divide by delta(timestamp) part. I could add rate() as a new intrinsic function for derived metrics that does the rate conversion.

With option 3. the derived metric definitions would be something like ...

kernel.pct.cpu.user = 100 * rate(kernel.all.cpu.user) / hinv.ncpu

Note that rate(kernel.all.cpu.user) would be a double precision number but restricted to the (small) interval [0, hinv.ncpu].

Before jumping into 3., I'd like to hear feedback on options 1. and 2.

<Prev in Thread] Current Thread [Next in Thread>