pcp
[Top] [All Lists]

Re: [pcp] per cpu utilisation from archive - pmval

To: Allan McAleavy <allan.mcaleavy@xxxxxxxxx>, pcp@xxxxxxxxxxx
Subject: Re: [pcp] per cpu utilisation from archive - pmval
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Thu, 19 May 2016 09:51:45 +1000
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <CAF6XsOcfikRQ-EJ9cM+iuHhpHEinvKaKDWm3oB27gMPwF2vcOw@xxxxxxxxxxxxxx>
References: <CAF6XsOcfikRQ-EJ9cM+iuHhpHEinvKaKDWm3oB27gMPwF2vcOw@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2
On 17/05/16 21:45, Allan McAleavy wrote:
> Hi
> 
> I am looking to get per cpu utilisation from an archive, I can run pmval 
> in realtime with a derived config of 100 * val which matches the mpstat 
> -P output, however when I look in my archive for the same timestamp I 
> don't get the same values. Any pointers?

Al,

I'm afraid there's not enough excerpts from your email discussion with Frank on 
the list email to debug this.

Your derived metric expression

kernel.pct.cpu.user = 100 * kernel.percpu.cpu.user

looks OK.

This should produce a N values (for an N cpu system) in units of milliseconds.  
They are counters, so in consecutive samples T seconds appart ...

Each kernel.percpu.cpu.user counter will increase by a value in the range 0 to 
1000 * T, and each kernel.pct.cpu.user will increase by a value in the range 0 
to 100 * 1000 * T.

As they are counters, most PCP tools will "rate convert", so report 
(current(value) - prior(value)) / T (units of milliseconds/sec).

If the counters are in units of time (like these ones), pmval (and some other 
tools) will convert the rate to a "time utilization" by normalizing the "delta" 
and the interval to the same scale, so for millseconds this would report 
1000*(current(value) - prior(value)) / T (units of sec/sec or "utilization").

So I tried to reproduce your use case (just looking at the per CPU user time) 
...

raw data for first 3 samples from my archive (cputime):

kenj@bozo:~/src/pcp/qa/archives$ pmdumplog -T 3sec cputime 
kernel.percpu.cpu.user

09:24:07.217  60.0.0 (kernel.percpu.cpu.user):
                inst [0 or "cpu0"] value 66370010
                inst [1 or "cpu1"] value 66920740
                inst [2 or "cpu2"] value 66596310
                inst [3 or "cpu3"] value 68942310
                inst [4 or "cpu4"] value 68810160
                inst [5 or "cpu5"] value 66902070

09:24:08.217  60.0.0 (kernel.percpu.cpu.user):
                inst [0 or "cpu0"] value 66370130
                inst [1 or "cpu1"] value 66920800
                inst [2 or "cpu2"] value 66596350
                inst [3 or "cpu3"] value 68942340
                inst [4 or "cpu4"] value 68810210
                inst [5 or "cpu5"] value 66902090

09:24:09.217  60.0.0 (kernel.percpu.cpu.user):
                inst [0 or "cpu0"] value 66370270
                inst [1 or "cpu1"] value 66920880
                inst [2 or "cpu2"] value 66596420
                inst [3 or "cpu3"] value 68942390
                inst [4 or "cpu4"] value 68810300
                inst [5 or "cpu5"] value 66902150

raw data from pmval (with -r):

kenj@bozo:~/src/pcp/qa/archives$ pmval -r -w10 -f2 -S '@09:24:07.217' -T 3sec 
-a cputime kernel.percpu.cpu.user

metric:    kernel.percpu.cpu.user
archive:   cputime
host:      bozo
start:     Thu May 19 09:24:07 2016
end:       Thu May 19 09:24:10 2016
semantics: cumulative counter
units:     millisec
samples:   4
interval:  1.00 sec
09:24:07.216  No values available

                  cpu0       cpu1       cpu2       cpu3       cpu4       cpu5 
09:24:08.216  66370130   66920800   66596350   68942340   68810210   66902090 
09:24:09.216  66370270   66920880   66596420   68942390   68810300   66902150 
09:24:10.216  66370540   66921010   66596500   68942450   68810370   66902520 

I needed -S '@09:24:07.217' to get the results aligned with the first sample in 
the archive, then there is nothing to report at 9:24:07.217 because the metric 
is a counter, so the first value reported is one sample later at 09:24:08.216.

Values match as expected.

Now without the -r:

kenj@bozo:~/src/pcp/qa/archives$ pmval -w10 -f2 -S '@09:24:07.217' -T 3sec -a 
cputime kernel.percpu.cpu.user

metric:    kernel.percpu.cpu.user
archive:   cputime
host:      bozo
start:     Thu May 19 09:24:07 2016
end:       Thu May 19 09:24:10 2016
semantics: cumulative counter (converting to rate)
units:     millisec (converting to time utilization)
samples:   4
interval:  1.00 sec
09:24:07.216  No values available

                  cpu0       cpu1       cpu2       cpu3       cpu4       cpu5 
09:24:08.216  No values available
09:24:09.216      0.14       0.08       0.07       0.05       0.09       0.06 
09:24:10.216      0.27       0.13       0.08       0.06       0.07       0.37

and randomly picking cpu1 for the 09:24:09 to 09:24:10 interval I see the 
counter delta is 66921010-66920880 = 130msec = 130msec in 1000msec = 0.13 time 
utilization

Now onto the derived metric using the definition: kernel.pct.percpu.user = 100 
* kernel.percpu.cpu.user

kenj@bozo:~/src/pcp/qa/archives$ pmval -r -w10 -f2 -S '@09:24:07.217' -T 3sec 
-a cputime kernel.pct.percpu.user

metric:    kernel.pct.percpu.user
archive:   cputime
host:      bozo
start:     Thu May 19 09:24:07 2016
end:       Thu May 19 09:24:10 2016
semantics: cumulative counter
units:     millisec
samples:   4
interval:  1.00 sec
09:24:07.216  No values available

                  cpu0       cpu1       cpu2       cpu3       cpu4       cpu5 
09:24:08.2166637013000 6692080000 6659635000 6894234000 6881021000 6690209000 
09:24:09.2166637027000 6692088000 6659642000 6894239000 6881030000 6690215000 
09:24:10.2166637054000 6692101000 6659650000 6894245000 6881037000 6690252000 

As expected.

And now rate and time utilization converted:

kenj@bozo:~/src/pcp/qa/archives$ pmval -w10 -f2 -S '@09:24:07.217' -T 3sec -a 
cputime kernel.pct.percpu.user

metric:    kernel.pct.percpu.user
archive:   cputime
host:      bozo
start:     Thu May 19 09:24:07 2016
end:       Thu May 19 09:24:10 2016
semantics: cumulative counter (converting to rate)
units:     millisec (converting to time utilization)
samples:   4
interval:  1.00 sec
09:24:07.216  No values available

                  cpu0       cpu1       cpu2       cpu3       cpu4       cpu5 
09:24:08.216  No values available
09:24:09.216     14.00       8.00       7.00       5.00       9.00       6.00 
09:24:10.216     27.00      13.00       8.00       6.00       7.00      37.00 

and using cpu1 for the 09:24:09 to 09:24:10 interval again 0.13 (utilization) = 
13%.

I ran mpstat and sar at (about) the same time that pmlogger was run, and their 
results for the randomly selected time interval are:

kenj@bozo:~/src/pcp/qa/archives$ grep 09:24:10 cputime.sar
09:24:10        all     15.11      0.00      3.57      3.90      0.00     77.42
09:24:10          0     24.49      0.00      3.06     15.31      0.00     57.14
09:24:10          1     11.11      0.00      4.04      0.00      0.00     84.85
09:24:10          2      6.06      0.00      2.02      0.00      0.00     91.92
09:24:10          3      8.16      0.00      3.06      0.00      0.00     88.78
09:24:10          4      6.12      0.00      2.04      0.00      0.00     91.84
09:24:10          5     34.65      0.00      8.91      8.91      0.00     47.52
09:24:10        CPU     %user     %nice   %system   %iowait    %steal     %idle

kenj@bozo:~/src/pcp/qa/archives$ grep 09:24:10 cputime.mpstat
09:24:10     all   14.94    0.00    3.57    3.90    0.00    0.00    0.00    
0.17    0.00   77.42
09:24:10       0   24.49    0.00    3.06   15.31    0.00    0.00    0.00    
0.00    0.00   57.14
09:24:10       1   11.11    0.00    4.04    0.00    0.00    0.00    0.00    
0.00    0.00   84.85
09:24:10       2    6.12    0.00    1.02    0.00    0.00    0.00    0.00    
0.00    0.00   92.86
09:24:10       3    8.08    0.00    3.03    0.00    0.00    0.00    0.00    
0.00    0.00   88.89
09:24:10       4    5.15    0.00    2.06    0.00    0.00    0.00    0.00    
1.03    0.00   91.75
09:24:10       5   34.65    0.00    8.91    8.91    0.00    0.00    0.00    
0.00    0.00   47.52
09:24:10     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  
%guest  %gnice   %idle


So within the accuracy of this crude experiment, 13% from PCP equals 11.11% 
from sar equals 11.11% from mpstat.

If this does not help, you'll probably need to send me your archive, your 
drived metrics config and the exact commands you're using.



<Prev in Thread] Current Thread [Next in Thread>