pcp
[Top] [All Lists]

Re: [pcp] per cpu utilisation from archive - pmval

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: [pcp] per cpu utilisation from archive - pmval
From: Allan McAleavy <allan.mcaleavy@xxxxxxxxx>
Date: Thu, 19 May 2016 12:04:55 +0100
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=HIi/NfLR3/6YRh0hVSLIAB8xv4DZfUwChk+afFYdOBg=; b=iuDI/GlC3YkzpyD/5XLRIoD7ZbA/SnP97zEe4tJgfNt+fk9OHRY+TOUyfJdPkxnNKX LGXc0D2ELKWE+EreeVSTDu0XJ0R0CtT3FAMI8m5h4ZlIaEUIlJHOYy0H71vBQI5zgVPa Mv+Nw8AZp+HTem3mRg0kIYCaIKxk6WYYwT9BnlfeioQ24uqY2FzzQxOxaqenHj4Xusi9 rzmGUaEuQsWTypMGP27NxLYYc0d5HobKskej101T29aZZVtZMgOgv0SR7JKPZGZwGbiG 5PmRMMP+wHFx/aFegsKHX1bNZsnShFUHqNx1UbX8CNcaLnyO+8RRV+HJ1SoRXAFOiyv3 VU8g==
In-reply-to: <573D0011.5090505@xxxxxxxxxxxxxxxx>
References: <CAF6XsOcfikRQ-EJ9cM+iuHhpHEinvKaKDWm3oB27gMPwF2vcOw@xxxxxxxxxxxxxx> <573D0011.5090505@xxxxxxxxxxxxxxxx>
Hi All

Ok so I have this now working, my pmlogger was running as follows, so the data was in 1 min samples.

pmlogger -P -r -T24h10m -c config.default -m pmlogger_check 20160519.10.50

Changing to -t 1 gives the appropriate result. What is the best practice for this? or should I have still been able to get similar results with the default option? version running isÂpcp-3.10.9-6.el6.x86_64

Thanks for everyone's help on this.

Al


On Thu, May 19, 2016 at 12:51 AM, Ken McDonell <kenj@xxxxxxxxxxxxxxxx> wrote:
On 17/05/16 21:45, Allan McAleavy wrote:
> Hi
>
> I am looking to get per cpu utilisation from an archive, I can run pmval
> in realtime with a derived config of 100 * val which matches the mpstat
> -P output, however when I look in my archive for the same timestamp I
> don't get the same values. Any pointers?

Al,

I'm afraid there's not enough excerpts from your email discussion with Frank on the list email to debug this.

Your derived metric _expression_

kernel.pct.cpu.user = 100 * kernel.percpu.cpu.user

looks OK.

This should produce a N values (for an N cpu system) in units of milliseconds. They are counters, so in consecutive samples T seconds appart ...

Each kernel.percpu.cpu.user counter will increase by a value in the range 0 to 1000 * T, and each kernel.pct.cpu.user will increase by a value in the range 0 to 100 * 1000 * T.

As they are counters, most PCP tools will "rate convert", so report (current(value) - prior(value)) / T (units of milliseconds/sec).

If the counters are in units of time (like these ones), pmval (and some other tools) will convert the rate to a "time utilization" by normalizing the "delta" and the interval to the same scale, so for millseconds this would report 1000*(current(value) - prior(value)) / T (units of sec/sec or "utilization").

So I tried to reproduce your use case (just looking at the per CPU user time) ...

raw data for first 3 samples from my archive (cputime):

kenj@bozo:~/src/pcp/qa/archives$ pmdumplog -T 3sec cputime kernel.percpu.cpu.user

09:24:07.217Â 60.0.0 (kernel.percpu.cpu.user):
        inst [0 or "cpu0"] value 66370010
        inst [1 or "cpu1"] value 66920740
        inst [2 or "cpu2"] value 66596310
        inst [3 or "cpu3"] value 68942310
        inst [4 or "cpu4"] value 68810160
        inst [5 or "cpu5"] value 66902070

09:24:08.217Â 60.0.0 (kernel.percpu.cpu.user):
        inst [0 or "cpu0"] value 66370130
        inst [1 or "cpu1"] value 66920800
        inst [2 or "cpu2"] value 66596350
        inst [3 or "cpu3"] value 68942340
        inst [4 or "cpu4"] value 68810210
        inst [5 or "cpu5"] value 66902090

09:24:09.217Â 60.0.0 (kernel.percpu.cpu.user):
        inst [0 or "cpu0"] value 66370270
        inst [1 or "cpu1"] value 66920880
        inst [2 or "cpu2"] value 66596420
        inst [3 or "cpu3"] value 68942390
        inst [4 or "cpu4"] value 68810300
        inst [5 or "cpu5"] value 66902150

raw data from pmval (with -r):

kenj@bozo:~/src/pcp/qa/archives$ pmval -r -w10 -f2 -S '@09:24:07.217' -T 3sec -a cputime kernel.percpu.cpu.user

metric:Â Â kernel.percpu.cpu.user
archive:Â Âcputime
host:Â Â Â bozo
start:Â Â ÂThu May 19 09:24:07 2016
end:Â Â Â ÂThu May 19 09:24:10 2016
semantics: cumulative counter
units:Â Â Âmillisec
samples:Â Â4
interval:Â 1.00 sec
09:24:07.216Â No values available

         cpu0   Âcpu1   Âcpu2   Âcpu3   Âcpu4   Âcpu5
09:24:08.216Â 66370130Â Â66920800Â Â66596350Â Â68942340Â Â68810210Â Â66902090
09:24:09.216Â 66370270Â Â66920880Â Â66596420Â Â68942390Â Â68810300Â Â66902150
09:24:10.216Â 66370540Â Â66921010Â Â66596500Â Â68942450Â Â68810370Â Â66902520

I needed -S '@09:24:07.217' to get the results aligned with the first sample in the archive, then there is nothing to report at 9:24:07.217 because the metric is a counter, so the first value reported is one sample later at 09:24:08.216.

Values match as expected.

Now without the -r:

kenj@bozo:~/src/pcp/qa/archives$ pmval -w10 -f2 -S '@09:24:07.217' -T 3sec -a cputime kernel.percpu.cpu.user

metric:Â Â kernel.percpu.cpu.user
archive:Â Âcputime
host:Â Â Â bozo
start:Â Â ÂThu May 19 09:24:07 2016
end:Â Â Â ÂThu May 19 09:24:10 2016
semantics: cumulative counter (converting to rate)
units:Â Â Âmillisec (converting to time utilization)
samples:Â Â4
interval:Â 1.00 sec
09:24:07.216Â No values available

         cpu0   Âcpu1   Âcpu2   Âcpu3   Âcpu4   Âcpu5
09:24:08.216Â No values available
09:24:09.216Â Â Â 0.14Â Â Â Â0.08Â Â Â Â0.07Â Â Â Â0.05Â Â Â Â0.09Â Â Â Â0.06
09:24:10.216Â Â Â 0.27Â Â Â Â0.13Â Â Â Â0.08Â Â Â Â0.06Â Â Â Â0.07Â Â Â Â0.37

and randomly picking cpu1 for the 09:24:09 to 09:24:10 interval I see the counter delta is 66921010-66920880 = 130msec = 130msec in 1000msec = 0.13 time utilization

Now onto the derived metric using the definition: kernel.pct.percpu.user = 100 * kernel.percpu.cpu.user

kenj@bozo:~/src/pcp/qa/archives$ pmval -r -w10 -f2 -S '@09:24:07.217' -T 3sec -a cputime kernel.pct.percpu.user

metric:Â Â kernel.pct.percpu.user
archive:Â Âcputime
host:Â Â Â bozo
start:Â Â ÂThu May 19 09:24:07 2016
end:Â Â Â ÂThu May 19 09:24:10 2016
semantics: cumulative counter
units:Â Â Âmillisec
samples:Â Â4
interval:Â 1.00 sec
09:24:07.216Â No values available

         cpu0   Âcpu1   Âcpu2   Âcpu3   Âcpu4   Âcpu5
09:24:08.2166637013000 6692080000 6659635000 6894234000 6881021000 6690209000
09:24:09.2166637027000 6692088000 6659642000 6894239000 6881030000 6690215000
09:24:10.2166637054000 6692101000 6659650000 6894245000 6881037000 6690252000

As expected.

And now rate and time utilization converted:

kenj@bozo:~/src/pcp/qa/archives$ pmval -w10 -f2 -S '@09:24:07.217' -T 3sec -a cputime kernel.pct.percpu.user

metric:Â Â kernel.pct.percpu.user
archive:Â Âcputime
host:Â Â Â bozo
start:Â Â ÂThu May 19 09:24:07 2016
end:Â Â Â ÂThu May 19 09:24:10 2016
semantics: cumulative counter (converting to rate)
units:Â Â Âmillisec (converting to time utilization)
samples:Â Â4
interval:Â 1.00 sec
09:24:07.216Â No values available

         cpu0   Âcpu1   Âcpu2   Âcpu3   Âcpu4   Âcpu5
09:24:08.216Â No values available
09:24:09.216Â Â Â14.00Â Â Â Â8.00Â Â Â Â7.00Â Â Â Â5.00Â Â Â Â9.00Â Â Â Â6.00
09:24:10.216Â Â Â27.00Â Â Â 13.00Â Â Â Â8.00Â Â Â Â6.00Â Â Â Â7.00Â Â Â 37.00

and using cpu1 for the 09:24:09 to 09:24:10 interval again 0.13 (utilization) = 13%.

I ran mpstat and sar at (about) the same time that pmlogger was run, and their results for the randomly selected time interval are:

kenj@bozo:~/src/pcp/qa/archives$ grep 09:24:10 cputime.sar
09:24:10    all  Â15.11   0.00   3.57   3.90   0.00  Â77.42
09:24:10Â Â Â Â Â 0Â Â Â24.49Â Â Â 0.00Â Â Â 3.06Â Â Â15.31Â Â Â 0.00Â Â Â57.14
09:24:10Â Â Â Â Â 1Â Â Â11.11Â Â Â 0.00Â Â Â 4.04Â Â Â 0.00Â Â Â 0.00Â Â Â84.85
09:24:10Â Â Â Â Â 2Â Â Â 6.06Â Â Â 0.00Â Â Â 2.02Â Â Â 0.00Â Â Â 0.00Â Â Â91.92
09:24:10Â Â Â Â Â 3Â Â Â 8.16Â Â Â 0.00Â Â Â 3.06Â Â Â 0.00Â Â Â 0.00Â Â Â88.78
09:24:10Â Â Â Â Â 4Â Â Â 6.12Â Â Â 0.00Â Â Â 2.04Â Â Â 0.00Â Â Â 0.00Â Â Â91.84
09:24:10Â Â Â Â Â 5Â Â Â34.65Â Â Â 0.00Â Â Â 8.91Â Â Â 8.91Â Â Â 0.00Â Â Â47.52
09:24:10    CPU  Â%user  Â%nice Â%system Â%iowait  %steal  Â%idle

kenj@bozo:~/src/pcp/qa/archives$ grep 09:24:10 cputime.mpstat
09:24:10  Âall Â14.94  0.00  3.57  3.90  0.00  0.00  0.00  0.17  0.00 Â77.42
09:24:10Â Â Â Â0Â Â24.49Â Â 0.00Â Â 3.06Â Â15.31Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â57.14
09:24:10Â Â Â Â1Â Â11.11Â Â 0.00Â Â 4.04Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â84.85
09:24:10Â Â Â Â2Â Â 6.12Â Â 0.00Â Â 1.02Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â92.86
09:24:10Â Â Â Â3Â Â 8.08Â Â 0.00Â Â 3.03Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â88.89
09:24:10Â Â Â Â4Â Â 5.15Â Â 0.00Â Â 2.06Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 1.03Â Â 0.00Â Â91.75
09:24:10Â Â Â Â5Â Â34.65Â Â 0.00Â Â 8.91Â Â 8.91Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â 0.00Â Â47.52
09:24:10  ÂCPU  %usr Â%nice  %sys %iowait  %irq Â%soft %steal %guest %gnice Â%idle


So within the accuracy of this crude experiment, 13% from PCP equals 11.11% from sar equals 11.11% from mpstat.

If this does not help, you'll probably need to send me your archive, your drived metrics config and the exact commands you're using.




<Prev in Thread] Current Thread [Next in Thread>