pcp
[Top] [All Lists]

Re: [pcp] Calculated/derived metrics?

To: pcp@xxxxxxxxxxx
Subject: Re: [pcp] Calculated/derived metrics?
From: Marko Myllynen <myllynen@xxxxxxxxxx>
Date: Thu, 07 May 2015 13:29:35 +0300
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <554AFE4E.80000@xxxxxxxxxxxxxxxx>
Organization: Red Hat
References: <5534C680.2020709@xxxxxxxxxx> <493537984.3276058.1429528962326.JavaMail.zimbra@xxxxxxxxxx> <5534EBA8.4030509@xxxxxxxxxx> <1644393599.3651017.1429563442835.JavaMail.zimbra@xxxxxxxxxx> <55364606.1000503@xxxxxxxxxx> <55472B40.7050800@xxxxxxxxxx> <5547DE11.5050800@xxxxxxxxxxxxxxxx> <5549E4CD.5000408@xxxxxxxxxx> <554AFE4E.80000@xxxxxxxxxxxxxxxx>
Reply-to: myllynen@xxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
Hi Ken,

On 2015-05-07 08:55, Ken McDonell wrote:
> On 06/05/15 19:54, Marko Myllynen wrote:
>> ... right, I was blissfully unaware of this.
> 
> I suspect that might be a common occurrence ... not sure how/where we
> need to advertise the derived metrics capabilities so they are more
> visible and the degree of blissful unawareness is reduced for the
> wider audience.
> 
> Suggestions would be most welcome here.

once I understand this a bit better myself I'll send a patch to add a
note about derived metrics in the PCP Quick Guide. I did search for
"calculate" and "derive" from the books but didn't find anything about
derived metrics from there.

>> .. I see that pmRegisterDerived(3) describes both C API and the
>> expressions used to construct derived metrics. There are also
>> Python bindings for it. And pminfo(1) can read such a "dmfile"
>> specified with -c, PCP_DERIVED_CONFIG should be used with tools
>> like pmval(1).
> 
> Mostly this was intended to work _without_ client change which is why
> none of pmie, pmval, pmlogger, ... have any command line option to
> force derived metrics to be loaded ... they all rely on the
> environment variable mechanism so that derived metrics "just appear"
> like regular metrics.
> 
> pminfo is the odd one out because it is the swiss army knife that is
> used to provide access to all manner of PMAPI services that are not
> usually required by other, more useful (outside development and
> debugging) client applications.

Ok, makes sense.

>> write_per_sec = hotproc.io.write_bytes / (kernel.all.uptime - 
>> hotproc.psinfo.start_time/100)
>> 
>> leads to an (expected) error as there are several instances. ...
> 
> The error is not really expected, has nothing to do with instances
> and is confounded by close-to indecipherable error message !!
> 
> Instances are fully supported in derived metrics.

Aha, sounds good.

> The problem in your example is that the expression involves the
> counter hotproc.io.write_bytes and this is being divided by something
> with the units of time that is not a counter.  This is not allowed
> because the semantics of the resulting expression are not well
> defined ... is it a counter or instantaneous, and if a counter what
> does the divisor in units of seconds really mean?
> 
> I think what you want is to have hotproc.io.write_bytes treated as an
> instantaneous value (the value now, not a counter).  Is that
> correct?

Yes, correct (see below also).

> And I'm not sure why the /100 is required, could you please explain
> that part?

kernel.all.uptime is in seconds but hotproc.psinfo.start_time is in
jiffies and although jiffies-to-seconds is not necessarily a trivial
operation [1], /100 seemed to provide "close enough" results during my
tests. So it'd be "bytes written by the process during its lifetime /
the lifetime of the process in seconds" which should then illustrate a
average-bytes-written-per-second metric for the process.

1)
http://unix.stackexchange.com/questions/7870/how-to-check-how-long-a-process-has-been-running/7871#comment9851_7873

> Now the example.
> 
> I've replaced hotproc.io.write_bytes by hotproc.psinfo.rss, not
> because this makes semantic sense, but because hotproc.psinfo.rss has
> instantaneous semantics which would be the same as
> value(hotproc.io.write_bytes) if that was implemented.
> 
> kenj@bozo-vm:/var/log/pcp/pmcd$ PCP_DERIVED_CONFIG=/tmp/eek.derive
> pminfo -df mytest
> 
> Note we have 2 instances throughout and the expressions parse
> correctly.
> 
> Let me know if I've guessed your semantics correctly and I should add
> value(v) to my RFE queue.

It certainly sounds like it, hopefully the additional explanation above
also provides more context on what I was after.

Now, if we're in the business of adding more stuff to your RFE queue..
;-) Would you see it feasible to use derived metrics to derive metrics?
For example, in your example you had:

mytest.divisor = hotproc.psinfo.start_time - kernel.all.uptime
mytest.marko = hotproc.psinfo.rss / (hotproc.psinfo.start_time -
kernel.all.uptime)

It would be nice to be able to write this as:

mytest.divisor = hotproc.psinfo.start_time - kernel.all.uptime
mytest.marko = hotproc.psinfo.rss / mytest.divisor

In this kind of simple case it's not hugely helpful but it might quickly
become handy, I'd presume already with those few examples I mentioned in
my previous email both reading and writing of those derived metrics
would already get much easier.

Thanks a lot for your help, things are now certainly getting clearer.

Cheers,

-- 
Marko Myllynen

<Prev in Thread] Current Thread [Next in Thread>