On 17/05/16 21:45, Allan McAleavy wrote:
> Hi
>
> I am looking to get per cpu utilisation from an archive, I can run pmval
> in realtime with a derived config of 100 * val which matches the mpstat
> -P output, however when I look in my archive for the same timestamp I
> don't get the same values. Any pointers?
Al,
I'm afraid there's not enough excerpts from your email discussion with Frank on
the list email to debug this.
Your derived metric expression
kernel.pct.cpu.user = 100 * kernel.percpu.cpu.user
looks OK.
This should produce a N values (for an N cpu system) in units of milliseconds.
They are counters, so in consecutive samples T seconds appart ...
Each kernel.percpu.cpu.user counter will increase by a value in the range 0 to
1000 * T, and each kernel.pct.cpu.user will increase by a value in the range 0
to 100 * 1000 * T.
As they are counters, most PCP tools will "rate convert", so report
(current(value) - prior(value)) / T (units of milliseconds/sec).
If the counters are in units of time (like these ones), pmval (and some other
tools) will convert the rate to a "time utilization" by normalizing the "delta"
and the interval to the same scale, so for millseconds this would report
1000*(current(value) - prior(value)) / T (units of sec/sec or "utilization").
So I tried to reproduce your use case (just looking at the per CPU user time)
...
raw data for first 3 samples from my archive (cputime):
kenj@bozo:~/src/pcp/qa/archives$ pmdumplog -T 3sec cputime
kernel.percpu.cpu.user
09:24:07.217 60.0.0 (kernel.percpu.cpu.user):
inst [0 or "cpu0"] value 66370010
inst [1 or "cpu1"] value 66920740
inst [2 or "cpu2"] value 66596310
inst [3 or "cpu3"] value 68942310
inst [4 or "cpu4"] value 68810160
inst [5 or "cpu5"] value 66902070
09:24:08.217 60.0.0 (kernel.percpu.cpu.user):
inst [0 or "cpu0"] value 66370130
inst [1 or "cpu1"] value 66920800
inst [2 or "cpu2"] value 66596350
inst [3 or "cpu3"] value 68942340
inst [4 or "cpu4"] value 68810210
inst [5 or "cpu5"] value 66902090
09:24:09.217 60.0.0 (kernel.percpu.cpu.user):
inst [0 or "cpu0"] value 66370270
inst [1 or "cpu1"] value 66920880
inst [2 or "cpu2"] value 66596420
inst [3 or "cpu3"] value 68942390
inst [4 or "cpu4"] value 68810300
inst [5 or "cpu5"] value 66902150
raw data from pmval (with -r):
kenj@bozo:~/src/pcp/qa/archives$ pmval -r -w10 -f2 -S '@09:24:07.217' -T 3sec
-a cputime kernel.percpu.cpu.user
metric: kernel.percpu.cpu.user
archive: cputime
host: bozo
start: Thu May 19 09:24:07 2016
end: Thu May 19 09:24:10 2016
semantics: cumulative counter
units: millisec
samples: 4
interval: 1.00 sec
09:24:07.216 No values available
cpu0 cpu1 cpu2 cpu3 cpu4 cpu5
09:24:08.216 66370130 66920800 66596350 68942340 68810210 66902090
09:24:09.216 66370270 66920880 66596420 68942390 68810300 66902150
09:24:10.216 66370540 66921010 66596500 68942450 68810370 66902520
I needed -S '@09:24:07.217' to get the results aligned with the first sample in
the archive, then there is nothing to report at 9:24:07.217 because the metric
is a counter, so the first value reported is one sample later at 09:24:08.216.
Values match as expected.
Now without the -r:
kenj@bozo:~/src/pcp/qa/archives$ pmval -w10 -f2 -S '@09:24:07.217' -T 3sec -a
cputime kernel.percpu.cpu.user
metric: kernel.percpu.cpu.user
archive: cputime
host: bozo
start: Thu May 19 09:24:07 2016
end: Thu May 19 09:24:10 2016
semantics: cumulative counter (converting to rate)
units: millisec (converting to time utilization)
samples: 4
interval: 1.00 sec
09:24:07.216 No values available
cpu0 cpu1 cpu2 cpu3 cpu4 cpu5
09:24:08.216 No values available
09:24:09.216 0.14 0.08 0.07 0.05 0.09 0.06
09:24:10.216 0.27 0.13 0.08 0.06 0.07 0.37
and randomly picking cpu1 for the 09:24:09 to 09:24:10 interval I see the
counter delta is 66921010-66920880 = 130msec = 130msec in 1000msec = 0.13 time
utilization
Now onto the derived metric using the definition: kernel.pct.percpu.user = 100
* kernel.percpu.cpu.user
kenj@bozo:~/src/pcp/qa/archives$ pmval -r -w10 -f2 -S '@09:24:07.217' -T 3sec
-a cputime kernel.pct.percpu.user
metric: kernel.pct.percpu.user
archive: cputime
host: bozo
start: Thu May 19 09:24:07 2016
end: Thu May 19 09:24:10 2016
semantics: cumulative counter
units: millisec
samples: 4
interval: 1.00 sec
09:24:07.216 No values available
cpu0 cpu1 cpu2 cpu3 cpu4 cpu5
09:24:08.2166637013000 6692080000 6659635000 6894234000 6881021000 6690209000
09:24:09.2166637027000 6692088000 6659642000 6894239000 6881030000 6690215000
09:24:10.2166637054000 6692101000 6659650000 6894245000 6881037000 6690252000
As expected.
And now rate and time utilization converted:
kenj@bozo:~/src/pcp/qa/archives$ pmval -w10 -f2 -S '@09:24:07.217' -T 3sec -a
cputime kernel.pct.percpu.user
metric: kernel.pct.percpu.user
archive: cputime
host: bozo
start: Thu May 19 09:24:07 2016
end: Thu May 19 09:24:10 2016
semantics: cumulative counter (converting to rate)
units: millisec (converting to time utilization)
samples: 4
interval: 1.00 sec
09:24:07.216 No values available
cpu0 cpu1 cpu2 cpu3 cpu4 cpu5
09:24:08.216 No values available
09:24:09.216 14.00 8.00 7.00 5.00 9.00 6.00
09:24:10.216 27.00 13.00 8.00 6.00 7.00 37.00
and using cpu1 for the 09:24:09 to 09:24:10 interval again 0.13 (utilization) =
13%.
I ran mpstat and sar at (about) the same time that pmlogger was run, and their
results for the randomly selected time interval are:
kenj@bozo:~/src/pcp/qa/archives$ grep 09:24:10 cputime.sar
09:24:10 all 15.11 0.00 3.57 3.90 0.00 77.42
09:24:10 0 24.49 0.00 3.06 15.31 0.00 57.14
09:24:10 1 11.11 0.00 4.04 0.00 0.00 84.85
09:24:10 2 6.06 0.00 2.02 0.00 0.00 91.92
09:24:10 3 8.16 0.00 3.06 0.00 0.00 88.78
09:24:10 4 6.12 0.00 2.04 0.00 0.00 91.84
09:24:10 5 34.65 0.00 8.91 8.91 0.00 47.52
09:24:10 CPU %user %nice %system %iowait %steal %idle
kenj@bozo:~/src/pcp/qa/archives$ grep 09:24:10 cputime.mpstat
09:24:10 all 14.94 0.00 3.57 3.90 0.00 0.00 0.00
0.17 0.00 77.42
09:24:10 0 24.49 0.00 3.06 15.31 0.00 0.00 0.00
0.00 0.00 57.14
09:24:10 1 11.11 0.00 4.04 0.00 0.00 0.00 0.00
0.00 0.00 84.85
09:24:10 2 6.12 0.00 1.02 0.00 0.00 0.00 0.00
0.00 0.00 92.86
09:24:10 3 8.08 0.00 3.03 0.00 0.00 0.00 0.00
0.00 0.00 88.89
09:24:10 4 5.15 0.00 2.06 0.00 0.00 0.00 0.00
1.03 0.00 91.75
09:24:10 5 34.65 0.00 8.91 8.91 0.00 0.00 0.00
0.00 0.00 47.52
09:24:10 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %gnice %idle
So within the accuracy of this crude experiment, 13% from PCP equals 11.11%
from sar equals 11.11% from mpstat.
If this does not help, you'll probably need to send me your archive, your
drived metrics config and the exact commands you're using.
|