Hi Nathan,
On 07/17/2015 12:41 PM, Nathan Scott wrote:
Hi guys,
Sorry about the delay getting back to you.
Not an issue.
And thanks a lot for the review.
----- Original Message -----
Following is a patch series prototyping a new Performance Monitoring
Daemon agent -> Memory_bandwidth.
[...]
OK, interesting stuff - I'm following your thinking up to this point...
Why do we need a different agent to do this ?
-------------------------------------------
We are trying to compute memory bandwidth of a system. This can be
achieved by reading a bunch of PMU counters via perfevent, Aggregating
those values, Multiplying with a scale as mentioned in the sysfs entry
on the system and some more math on top of it to get a single metric
value denoting the memory bandwidth of the system.
OK. That doesn't preclude use of a modified perfevent though.
Also, to note is that PMU counter names will vary depending on the
architecture. Ideally we would want to consume this metric via
OpenStack. Given this scenario, OpenStack will be our client. Ideally we
need to have all the reading of counters and math on top of it there.
OK.
From an OpenStack consume-ability side, it should connect to the pmcd
daemon and get the required single aggregated post processed metric in a
single call irrespective of underneath architecture.
*nod*
Given this requirement, would it be good to have all
the architecture dependent stuff i.e reading PMUs and related math in
PcP and just return the memory bandwidth metric to OpenStack ?
Yep, I think that makes alot of sense.
This would result in a cleaner design where all the architecture
dependent counters and computation is done in the backend PcP and just
the value is returned to OpenStack.
*nod*
Why not extend the perfevent agent ?
----------------------------------
perfevent agent and memory bandwidth agent end up reading PMUs
via the perf api. But currently as per design of perfevent agent, the
post processing for perfevents is done by pmval and other clients.
Yes. But pmdaperfevent doesn't have to *only* work that way though, it
could be extended to handle this new need for derivations.
In perfevent/pmns you'll see "hwcounters PERFEVENT:*:*". This allows
the PMDA to dynamically create configured event counter metrics below
perfevent.hwcounters.* (this, and the perfevent.conf(5) language for
specifying the counters - see setup_perfevents() and setup_pmns() in
the perfevent PMDA code).
Right.
A more flexible solution for your situation would be to extend this with
a perfevent.derivations.* set of metrics. For example, you might want
to extend perfevent.conf syntax to allow "[derivation: bandwidth]" (or
some better syntax) and then specify the averaging &| scaling in terms of
perfevent.hwcounters.* names to produce a perfevent.derivations.bandwidth
metric.
Nice. Will look into it.
This will mean you could make use of the existing PMDA instead of using
hard-coded events (like in this snippet of your patch...)
Ok, one question here, don't we need to hard code a subset of counters
at some or the other place even if we use the existing PMDA? The point
is, if we want to aggregate a certain number of counters, we have to hard
code those counters to be compared against.
Or, are you suggesting to make some kind of change to the
perfevent.conf file itself which will mark the below counters with some
marker to suggest the agent that these counters will be used in a derived
metric ?
+#ifdef __x86_64__
+char *events[] = {
+ "snbep_unc_imc0::UNC_M_CAS_COUNT:RD",
+ "snbep_unc_imc0::UNC_M_CAS_COUNT:WR",
+ "snbep_unc_imc1::UNC_M_CAS_COUNT:RD",
+ "snbep_unc_imc1::UNC_M_CAS_COUNT:WR",
+ "snbep_unc_imc2::UNC_M_CAS_COUNT:RD",
+ "snbep_unc_imc2::UNC_M_CAS_COUNT:WR",
+ "snbep_unc_imc3::UNC_M_CAS_COUNT:RD",
+ "snbep_unc_imc3::UNC_M_CAS_COUNT:WR"
+};
+#define NR_EVENTS 8
+
+#elif defined(__PPC64__)
+char *events[] = {
+ "powerpc_nest_mcs_read::MCS_00",
+ "powerpc_nest_mcs_read::MCS_01",
+ "powerpc_nest_mcs_read::MCS_02",
+ "powerpc_nest_mcs_read::MCS_03",
+ "powerpc_nest_mcs_write::MCS_00",
+ "powerpc_nest_mcs_write::MCS_01",
+ "powerpc_nest_mcs_write::MCS_02",
+ "powerpc_nest_mcs_write::MCS_03"
+};
+#define NR_EVENTS 8
[...]
+#else
+/* For unsupported architectures */
+char *events = NULL;
+#define NR_EVENTS 0
+#endif
+
... which lacks flexibility. Other people might want to do derivations
using other hardware counters - we should try to make it easier for the
people following in your footsteps.
Great. So, we can add a generic support for derivation of metrics in the
perfevent agent itself.
So, flexible specification is one advantage. Another is you would not be
creating a new PMDA that makes conflicting use of hardware counters (they
would be shared). A third is when you come to test the new code, you can
leverage the work already done in qa/perfevent/* and qa/{756,757} instead
of starting from scratch there.
Agreed. Makes sense.
(Joe, any thoughts on the above? Any ideas about neater config syntax?)
Onto the other metric added here - bandwidth.max - that looks a bit out
of place here (doesn't use perfevent APIs). Its reading a /proc file
and exporting a value - this is more the domain of the Linux kernel PMDA.
It would be a better fit for somewhere below the hardware metrics that
pmdalinux exports, perhaps using a "hinv.mem.bandwidth" metric name -
could you add it there instead?
The current code there runs uname(2) on every sample - it would be better
to move that out of the fetch code into a global variable (e.g. pmdalinux
already does this - see kernel_uname in src/pmdas/linux/pmda.c).
This new metric addition should be a separate patch in the series.
Sure, we can try that.
Going forward we want to have this reporting memory bandwidth as a rate
for certain time intervals which can be queried by the clients rather
than the aggregated counter values.
Ah - makes me think of the way the "hotproc" metrics in pmdaproc work,
which are also doing server-side calculations, on an interval, to reduce
the proc.* metrics. You might be able to find some ideas in that code.
See pmdaproc(1) man page for details, "HOTPROC OVERVIEW", and following
couple of "CONFIGURATION" sections.
cheers.
--
Nathan
--
Thanks,
Hemant Kumar
|