G'day Marko.
On 06/05/15 19:54, Marko Myllynen wrote:
> ...
> right, I was blissfully unaware of this.
I suspect that might be a common occurrence ... not sure how/where we need to
advertise the derived metrics capabilities so they are more visible and the
degree of blissful unawareness is reduced for the wider audience.
Suggestions would be most welcome here.
> ..
> I see that pmRegisterDerived(3) describes both C API and the expressions
> used to construct derived metrics. There are also Python bindings for
> it. And pminfo(1) can read such a "dmfile" specified with -c,
> PCP_DERIVED_CONFIG should be used with tools like pmval(1).
Mostly this was intended to work _without_ client change which is why none of
pmie, pmval, pmlogger, ... have any command line option to force derived
metrics to be loaded ... they all rely on the environment variable mechanism so
that derived metrics "just appear" like regular metrics.
pminfo is the odd one out because it is the swiss army knife that is used to
provide access to all manner of PMAPI services that are not usually required by
other, more useful (outside development and debugging) client applications.
> I don't see Perl bindings for this, not sure is that a biggie. ...
The Python bindings are probably there by accident (no Python code appears to
be using them). Perl bindings would be simple to add if the need arose.
> ... Usage
> seems otherwise pretty straightforward except when dealing with multiple
> instances. For example, I tried to derive the bytes-written/s for a
> process being monitored with hotproc. A process' lifetime can roughly be
> calculated with kernel uptime - the process' start_time / 100. But
> testing with something like:
>
> write_per_sec = hotproc.io.write_bytes / (kernel.all.uptime -
> hotproc.psinfo.start_time/100)
>
> leads to an (expected) error as there are several instances. ...
The error is not really expected, has nothing to do with instances and is
confounded by close-to indecipherable error message !!
Instances are fully supported in derived metrics. Consider this example:
kenj@bozo:~/src/pcp/src$ cat /tmp/eek.derive
mytest.bin = sample.bin + sample.bin
mytest.part_bin = sample.bin - sample.part_bin
kenj@bozo:~/src/pcp/src$ PCP_DERIVED_CONFIG=/tmp/eek.derive pminfo -df -h
bozo-vm mytest
mytest.bin
Data Type: 32-bit int InDom: 29.2 0x7400002
Semantics: instant Units: none
inst [100 or "bin-100"] value 200
inst [200 or "bin-200"] value 400
inst [300 or "bin-300"] value 600
inst [400 or "bin-400"] value 800
inst [500 or "bin-500"] value 1000
inst [600 or "bin-600"] value 1200
inst [700 or "bin-700"] value 1400
inst [800 or "bin-800"] value 1600
inst [900 or "bin-900"] value 1800
mytest.part_bin
Data Type: 32-bit int InDom: 29.2 0x7400002
Semantics: instant Units: none
inst [100 or "bin-100"] value 0
inst [300 or "bin-300"] value 0
inst [500 or "bin-500"] value 0
inst [700 or "bin-700"] value 0
inst [900 or "bin-900"] value 0
For test.part_bin the values only appear for instances that appear in _both_ of
the set-based values for the - operator.
The problem in your example is that the expression involves the counter
hotproc.io.write_bytes and this is being divided by something with the units of
time that is not a counter. This is not allowed because the semantics of the
resulting expression are not well defined ... is it a counter or instantaneous,
and if a counter what does the divisor in units of seconds really mean?
I think what you want is to have hotproc.io.write_bytes treated as an
instantaneous value (the value now, not a counter). Is that correct?
This would require a new intrinsic, something like value(v). To see that this
would work, check the example at the end of this email.
And I'm not sure why the /100 is required, could you please explain that part?
> ...
> Otherwise this certainly looks very much I was asking for, thanks for
> the pointer.
Good.
Now the example.
Here is my derived metrics definitions:
mytest.rss = hotproc.psinfo.rss
mytest.start = hotproc.psinfo.start_time
mytest.uptime = kernel.all.uptime
mytest.divisor = hotproc.psinfo.start_time - kernel.all.uptime
mytest.marko = hotproc.psinfo.rss / (hotproc.psinfo.start_time -
kernel.all.uptime)
I've replaced hotproc.io.write_bytes by hotproc.psinfo.rss, not because this
makes semantic sense, but because hotproc.psinfo.rss has instantaneous
semantics which would be the same as value(hotproc.io.write_bytes) if that was
implemented. I've also reversed the operands in the divisor so the value is
positive ... this expression is unsigned so the subtraction as you had it
produces an astronomically large positive number.
Now set a low cpuburn threshold for hotproc:
kenj@bozo-vm:~$ sudo pmstore hotproc.control.config 'cpuburn > 0.1'
Wait a while for hotproc to notice, ... then
kenj@bozo-vm:/var/log/pcp/pmcd$ PCP_DERIVED_CONFIG=/tmp/eek.derive pminfo -df
mytest
mytest.rss
Data Type: 32-bit unsigned int InDom: 3.39 0xc00027
Semantics: instant Units: Kbyte
inst [7697 or "007697 pducheck -i 10000000"] value 913208
inst [32730 or "032730 -bash"] value 6852
mytest.start
Data Type: 32-bit unsigned int InDom: 3.39 0xc00027
Semantics: discrete Units: sec
inst [7697 or "007697 pducheck -i 10000000"] value 11452259
inst [32730 or "032730 -bash"] value 11426930
mytest.uptime
Data Type: 32-bit unsigned int InDom: PM_INDOM_NULL 0xffffffff
Semantics: instant Units: sec
value 114550
mytest.divisor
Data Type: 32-bit unsigned int InDom: 3.39 0xc00027
Semantics: instant Units: sec
inst [7697 or "007697 pducheck -i 10000000"] value 11337709
inst [32730 or "032730 -bash"] value 11312380
mytest.marko
Data Type: double InDom: 3.39 0xc00027
Semantics: instant Units: Kbyte / sec
inst [7697 or "007697 pducheck -i 10000000"] value 0.08054607857725048
inst [32730 or "032730 -bash"] value 0.0006057080826492745
Note we have 2 instances throughout and the expressions parse correctly.
Let me know if I've guessed your semantics correctly and I should add value(v)
to my RFE queue.
Cheers, Ken.
|