Hi guys,
Sorry about the delay getting back to you.
----- Original Message -----
>
> Following is a patch series prototyping a new Performance Monitoring
> Daemon agent -> Memory_bandwidth.
> [...]
OK, interesting stuff - I'm following your thinking up to this point...
>
> Why do we need a different agent to do this ?
> -------------------------------------------
> We are trying to compute memory bandwidth of a system. This can be
> achieved by reading a bunch of PMU counters via perfevent, Aggregating
> those values, Multiplying with a scale as mentioned in the sysfs entry
> on the system and some more math on top of it to get a single metric
> value denoting the memory bandwidth of the system.
OK. That doesn't preclude use of a modified perfevent though.
> Also, to note is that PMU counter names will vary depending on the
> architecture. Ideally we would want to consume this metric via
> OpenStack. Given this scenario, OpenStack will be our client. Ideally we
> need to have all the reading of counters and math on top of it there.
OK.
> From an OpenStack consume-ability side, it should connect to the pmcd
> daemon and get the required single aggregated post processed metric in a
> single call irrespective of underneath architecture.
*nod*
> Given this requirement, would it be good to have all
> the architecture dependent stuff i.e reading PMUs and related math in
> PcP and just return the memory bandwidth metric to OpenStack ?
Yep, I think that makes alot of sense.
> This would result in a cleaner design where all the architecture
> dependent counters and computation is done in the backend PcP and just
> the value is returned to OpenStack.
*nod*
> Why not extend the perfevent agent ?
> ----------------------------------
>
> perfevent agent and memory bandwidth agent end up reading PMUs
> via the perf api. But currently as per design of perfevent agent, the
> post processing for perfevents is done by pmval and other clients.
Yes. But pmdaperfevent doesn't have to *only* work that way though, it
could be extended to handle this new need for derivations.
In perfevent/pmns you'll see "hwcounters PERFEVENT:*:*". This allows
the PMDA to dynamically create configured event counter metrics below
perfevent.hwcounters.* (this, and the perfevent.conf(5) language for
specifying the counters - see setup_perfevents() and setup_pmns() in
the perfevent PMDA code).
A more flexible solution for your situation would be to extend this with
a perfevent.derivations.* set of metrics. For example, you might want
to extend perfevent.conf syntax to allow "[derivation: bandwidth]" (or
some better syntax) and then specify the averaging &| scaling in terms of
perfevent.hwcounters.* names to produce a perfevent.derivations.bandwidth
metric.
This will mean you could make use of the existing PMDA instead of using
hard-coded events (like in this snippet of your patch...)
> +#ifdef __x86_64__
> +char *events[] = {
> + "snbep_unc_imc0::UNC_M_CAS_COUNT:RD",
> + "snbep_unc_imc0::UNC_M_CAS_COUNT:WR",
> + "snbep_unc_imc1::UNC_M_CAS_COUNT:RD",
> + "snbep_unc_imc1::UNC_M_CAS_COUNT:WR",
> + "snbep_unc_imc2::UNC_M_CAS_COUNT:RD",
> + "snbep_unc_imc2::UNC_M_CAS_COUNT:WR",
> + "snbep_unc_imc3::UNC_M_CAS_COUNT:RD",
> + "snbep_unc_imc3::UNC_M_CAS_COUNT:WR"
> +};
> +#define NR_EVENTS 8
> +
> +#elif defined(__PPC64__)
> +char *events[] = {
> + "powerpc_nest_mcs_read::MCS_00",
> + "powerpc_nest_mcs_read::MCS_01",
> + "powerpc_nest_mcs_read::MCS_02",
> + "powerpc_nest_mcs_read::MCS_03",
> + "powerpc_nest_mcs_write::MCS_00",
> + "powerpc_nest_mcs_write::MCS_01",
> + "powerpc_nest_mcs_write::MCS_02",
> + "powerpc_nest_mcs_write::MCS_03"
> +};
> +#define NR_EVENTS 8
> [...]
> +#else
> +/* For unsupported architectures */
> +char *events = NULL;
> +#define NR_EVENTS 0
> +#endif
> +
... which lacks flexibility. Other people might want to do derivations
using other hardware counters - we should try to make it easier for the
people following in your footsteps.
So, flexible specification is one advantage. Another is you would not be
creating a new PMDA that makes conflicting use of hardware counters (they
would be shared). A third is when you come to test the new code, you can
leverage the work already done in qa/perfevent/* and qa/{756,757} instead
of starting from scratch there.
(Joe, any thoughts on the above? Any ideas about neater config syntax?)
Onto the other metric added here - bandwidth.max - that looks a bit out
of place here (doesn't use perfevent APIs). Its reading a /proc file
and exporting a value - this is more the domain of the Linux kernel PMDA.
It would be a better fit for somewhere below the hardware metrics that
pmdalinux exports, perhaps using a "hinv.mem.bandwidth" metric name -
could you add it there instead?
The current code there runs uname(2) on every sample - it would be better
to move that out of the fetch code into a global variable (e.g. pmdalinux
already does this - see kernel_uname in src/pmdas/linux/pmda.c).
This new metric addition should be a separate patch in the series.
> Going forward we want to have this reporting memory bandwidth as a rate
> for certain time intervals which can be queried by the clients rather
> than the aggregated counter values.
Ah - makes me think of the way the "hotproc" metrics in pmdaproc work,
which are also doing server-side calculations, on an interval, to reduce
the proc.* metrics. You might be able to find some ideas in that code.
See pmdaproc(1) man page for details, "HOTPROC OVERVIEW", and following
couple of "CONFIGURATION" sections.
cheers.
--
Nathan
|