Hi Ken,
On 2015-05-07 08:55, Ken McDonell wrote:
> On 06/05/15 19:54, Marko Myllynen wrote:
>> ... right, I was blissfully unaware of this.
>
> I suspect that might be a common occurrence ... not sure how/where we
> need to advertise the derived metrics capabilities so they are more
> visible and the degree of blissful unawareness is reduced for the
> wider audience.
>
> Suggestions would be most welcome here.
once I understand this a bit better myself I'll send a patch to add a
note about derived metrics in the PCP Quick Guide. I did search for
"calculate" and "derive" from the books but didn't find anything about
derived metrics from there.
>> .. I see that pmRegisterDerived(3) describes both C API and the
>> expressions used to construct derived metrics. There are also
>> Python bindings for it. And pminfo(1) can read such a "dmfile"
>> specified with -c, PCP_DERIVED_CONFIG should be used with tools
>> like pmval(1).
>
> Mostly this was intended to work _without_ client change which is why
> none of pmie, pmval, pmlogger, ... have any command line option to
> force derived metrics to be loaded ... they all rely on the
> environment variable mechanism so that derived metrics "just appear"
> like regular metrics.
>
> pminfo is the odd one out because it is the swiss army knife that is
> used to provide access to all manner of PMAPI services that are not
> usually required by other, more useful (outside development and
> debugging) client applications.
Ok, makes sense.
>> write_per_sec = hotproc.io.write_bytes / (kernel.all.uptime -
>> hotproc.psinfo.start_time/100)
>>
>> leads to an (expected) error as there are several instances. ...
>
> The error is not really expected, has nothing to do with instances
> and is confounded by close-to indecipherable error message !!
>
> Instances are fully supported in derived metrics.
Aha, sounds good.
> The problem in your example is that the expression involves the
> counter hotproc.io.write_bytes and this is being divided by something
> with the units of time that is not a counter. This is not allowed
> because the semantics of the resulting expression are not well
> defined ... is it a counter or instantaneous, and if a counter what
> does the divisor in units of seconds really mean?
>
> I think what you want is to have hotproc.io.write_bytes treated as an
> instantaneous value (the value now, not a counter). Is that
> correct?
Yes, correct (see below also).
> And I'm not sure why the /100 is required, could you please explain
> that part?
kernel.all.uptime is in seconds but hotproc.psinfo.start_time is in
jiffies and although jiffies-to-seconds is not necessarily a trivial
operation [1], /100 seemed to provide "close enough" results during my
tests. So it'd be "bytes written by the process during its lifetime /
the lifetime of the process in seconds" which should then illustrate a
average-bytes-written-per-second metric for the process.
1)
http://unix.stackexchange.com/questions/7870/how-to-check-how-long-a-process-has-been-running/7871#comment9851_7873
> Now the example.
>
> I've replaced hotproc.io.write_bytes by hotproc.psinfo.rss, not
> because this makes semantic sense, but because hotproc.psinfo.rss has
> instantaneous semantics which would be the same as
> value(hotproc.io.write_bytes) if that was implemented.
>
> kenj@bozo-vm:/var/log/pcp/pmcd$ PCP_DERIVED_CONFIG=/tmp/eek.derive
> pminfo -df mytest
>
> Note we have 2 instances throughout and the expressions parse
> correctly.
>
> Let me know if I've guessed your semantics correctly and I should add
> value(v) to my RFE queue.
It certainly sounds like it, hopefully the additional explanation above
also provides more context on what I was after.
Now, if we're in the business of adding more stuff to your RFE queue..
;-) Would you see it feasible to use derived metrics to derive metrics?
For example, in your example you had:
mytest.divisor = hotproc.psinfo.start_time - kernel.all.uptime
mytest.marko = hotproc.psinfo.rss / (hotproc.psinfo.start_time -
kernel.all.uptime)
It would be nice to be able to write this as:
mytest.divisor = hotproc.psinfo.start_time - kernel.all.uptime
mytest.marko = hotproc.psinfo.rss / mytest.divisor
In this kind of simple case it's not hugely helpful but it might quickly
become handy, I'd presume already with those few examples I mentioned in
my previous email both reading and writing of those derived metrics
would already get much easier.
Thanks a lot for your help, things are now certainly getting clearer.
Cheers,
--
Marko Myllynen
|