pcp
[Top] [All Lists]

Re: [pcp] Calculated/derived metrics?

To: pcp@xxxxxxxxxxx
Subject: Re: [pcp] Calculated/derived metrics?
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Thu, 07 May 2015 15:55:26 +1000
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <5549E4CD.5000408@xxxxxxxxxx>
References: <5534C680.2020709@xxxxxxxxxx> <493537984.3276058.1429528962326.JavaMail.zimbra@xxxxxxxxxx> <5534EBA8.4030509@xxxxxxxxxx> <1644393599.3651017.1429563442835.JavaMail.zimbra@xxxxxxxxxx> <55364606.1000503@xxxxxxxxxx> <55472B40.7050800@xxxxxxxxxx> <5547DE11.5050800@xxxxxxxxxxxxxxxx> <5549E4CD.5000408@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
G'day Marko.

On 06/05/15 19:54, Marko Myllynen wrote:
> ...
> right, I was blissfully unaware of this.

I suspect that might be a common occurrence ... not sure how/where we need to 
advertise the derived metrics capabilities so they are more visible and the 
degree of blissful unawareness is reduced for the wider audience.

Suggestions would be most welcome here.

> ..
> I see that pmRegisterDerived(3) describes both C API and the expressions
> used to construct derived metrics. There are also Python bindings for
> it. And pminfo(1) can read such a "dmfile" specified with -c,
> PCP_DERIVED_CONFIG should be used with tools like pmval(1).

Mostly this was intended to work _without_ client change which is why none of 
pmie, pmval, pmlogger, ... have any command line option to force derived 
metrics to be loaded ... they all rely on the environment variable mechanism so 
that derived metrics "just appear" like regular metrics.

pminfo is the odd one out because it is the swiss army knife that is used to 
provide access to all manner of PMAPI services that are not usually required by 
other, more useful (outside development and debugging) client applications.

> I don't see Perl bindings for this, not sure is that a biggie. ...

The Python bindings are probably there by accident (no Python code appears to 
be using them).  Perl bindings would be simple to add if the need arose.

> ... Usage
> seems otherwise pretty straightforward except when dealing with multiple
> instances. For example, I tried to derive the bytes-written/s for a
> process being monitored with hotproc. A process' lifetime can roughly be
> calculated with kernel uptime - the process' start_time / 100. But
> testing with something like:
> 
> write_per_sec = hotproc.io.write_bytes / (kernel.all.uptime -
> hotproc.psinfo.start_time/100)
> 
> leads to an (expected) error as there are several instances. ...

The error is not really expected, has nothing to do with instances and is 
confounded by close-to indecipherable error message !!

Instances are fully supported in derived metrics. Consider this example:

kenj@bozo:~/src/pcp/src$ cat /tmp/eek.derive 
mytest.bin = sample.bin + sample.bin
mytest.part_bin = sample.bin - sample.part_bin
kenj@bozo:~/src/pcp/src$ PCP_DERIVED_CONFIG=/tmp/eek.derive pminfo -df -h 
bozo-vm mytest

mytest.bin
    Data Type: 32-bit int  InDom: 29.2 0x7400002
    Semantics: instant  Units: none
    inst [100 or "bin-100"] value 200
    inst [200 or "bin-200"] value 400
    inst [300 or "bin-300"] value 600
    inst [400 or "bin-400"] value 800
    inst [500 or "bin-500"] value 1000
    inst [600 or "bin-600"] value 1200
    inst [700 or "bin-700"] value 1400
    inst [800 or "bin-800"] value 1600
    inst [900 or "bin-900"] value 1800

mytest.part_bin
    Data Type: 32-bit int  InDom: 29.2 0x7400002
    Semantics: instant  Units: none
    inst [100 or "bin-100"] value 0
    inst [300 or "bin-300"] value 0
    inst [500 or "bin-500"] value 0
    inst [700 or "bin-700"] value 0
    inst [900 or "bin-900"] value 0

For test.part_bin the values only appear for instances that appear in _both_ of 
the set-based values for the - operator.

The problem in your example is that the expression involves the counter 
hotproc.io.write_bytes and this is being divided by something with the units of 
time that is not a counter.  This is not allowed because the semantics of the 
resulting expression are not well defined ... is it a counter or instantaneous, 
and if a counter what does the divisor in units of seconds really mean?

I think what you want is to have hotproc.io.write_bytes treated as an 
instantaneous value (the value now, not a counter).  Is that correct?

This would require a new intrinsic, something like value(v).  To see that this 
would work, check the example at the end of this email.

And I'm not sure why the /100 is required, could you please explain that part?

> ...
> Otherwise this certainly looks very much I was asking for, thanks for
> the pointer.

Good.

Now the example.

Here is my derived metrics definitions:
mytest.rss = hotproc.psinfo.rss
mytest.start = hotproc.psinfo.start_time
mytest.uptime = kernel.all.uptime
mytest.divisor = hotproc.psinfo.start_time - kernel.all.uptime  
mytest.marko = hotproc.psinfo.rss / (hotproc.psinfo.start_time - 
kernel.all.uptime)

I've replaced hotproc.io.write_bytes by hotproc.psinfo.rss, not because this 
makes semantic sense, but because hotproc.psinfo.rss has instantaneous 
semantics which would be the same as value(hotproc.io.write_bytes) if that was 
implemented.  I've also reversed the operands in the divisor so the value is 
positive ... this expression is unsigned so the subtraction as you had it 
produces an astronomically large positive number.

Now set a low cpuburn threshold for hotproc:
kenj@bozo-vm:~$ sudo pmstore hotproc.control.config 'cpuburn > 0.1'

Wait a while for hotproc to notice, ... then

kenj@bozo-vm:/var/log/pcp/pmcd$ PCP_DERIVED_CONFIG=/tmp/eek.derive pminfo -df 
mytest

mytest.rss
    Data Type: 32-bit unsigned int  InDom: 3.39 0xc00027
    Semantics: instant  Units: Kbyte
    inst [7697 or "007697 pducheck -i 10000000"] value 913208
    inst [32730 or "032730 -bash"] value 6852

mytest.start
    Data Type: 32-bit unsigned int  InDom: 3.39 0xc00027
    Semantics: discrete  Units: sec
    inst [7697 or "007697 pducheck -i 10000000"] value 11452259
    inst [32730 or "032730 -bash"] value 11426930

mytest.uptime
    Data Type: 32-bit unsigned int  InDom: PM_INDOM_NULL 0xffffffff
    Semantics: instant  Units: sec
    value 114550

mytest.divisor
    Data Type: 32-bit unsigned int  InDom: 3.39 0xc00027
    Semantics: instant  Units: sec
    inst [7697 or "007697 pducheck -i 10000000"] value 11337709
    inst [32730 or "032730 -bash"] value 11312380

mytest.marko
    Data Type: double  InDom: 3.39 0xc00027
    Semantics: instant  Units: Kbyte / sec
    inst [7697 or "007697 pducheck -i 10000000"] value 0.08054607857725048
    inst [32730 or "032730 -bash"] value 0.0006057080826492745

Note we have 2 instances throughout and the expressions parse correctly.

Let me know if I've guessed your semantics correctly and I should add value(v) 
to my RFE queue.

Cheers, Ken.

<Prev in Thread] Current Thread [Next in Thread>