Hello,
On 07/17/2015 12:41 PM, Nathan Scott wrote:
> Hi guys,
>
> Sorry about the delay getting back to you.
>
Thanks a lot Nathan for the detailed review.
> ----- Original Message -----
>>
>> Following is a patch series prototyping a new Performance Monitoring
>> Daemon agent -> Memory_bandwidth.
>> [...]
>
> OK, interesting stuff - I'm following your thinking up to this point...
>
>>
>> Why do we need a different agent to do this ?
>> -------------------------------------------
>> We are trying to compute memory bandwidth of a system. This can be
>> achieved by reading a bunch of PMU counters via perfevent, Aggregating
>> those values, Multiplying with a scale as mentioned in the sysfs entry
>> on the system and some more math on top of it to get a single metric
>> value denoting the memory bandwidth of the system.
>
> OK. That doesn't preclude use of a modified perfevent though.
>
Agree!
>> Also, to note is that PMU counter names will vary depending on the
>> architecture. Ideally we would want to consume this metric via
>> OpenStack. Given this scenario, OpenStack will be our client. Ideally we
>> need to have all the reading of counters and math on top of it there.
>
> OK.
>
>> From an OpenStack consume-ability side, it should connect to the pmcd
>> daemon and get the required single aggregated post processed metric in a
>> single call irrespective of underneath architecture.
>
> *nod*
>
>> Given this requirement, would it be good to have all
>> the architecture dependent stuff i.e reading PMUs and related math in
>> PcP and just return the memory bandwidth metric to OpenStack ?
>
> Yep, I think that makes alot of sense.
>
>> This would result in a cleaner design where all the architecture
>> dependent counters and computation is done in the backend PcP and just
>> the value is returned to OpenStack.
>
> *nod*
>
>> Why not extend the perfevent agent ?
>> ----------------------------------
>>
>> perfevent agent and memory bandwidth agent end up reading PMUs
>> via the perf api. But currently as per design of perfevent agent, the
>> post processing for perfevents is done by pmval and other clients.
>
> Yes. But pmdaperfevent doesn't have to *only* work that way though, it
> could be extended to handle this new need for derivations.
>
Good to know that.
> In perfevent/pmns you'll see "hwcounters PERFEVENT:*:*". This allows
> the PMDA to dynamically create configured event counter metrics below
> perfevent.hwcounters.* (this, and the perfevent.conf(5) language for
> specifying the counters - see setup_perfevents() and setup_pmns() in
> the perfevent PMDA code).
>
> A more flexible solution for your situation would be to extend this with
> a perfevent.derivations.* set of metrics. For example, you might want
> to extend perfevent.conf syntax to allow "[derivation: bandwidth]" (or
> some better syntax) and then specify the averaging &| scaling in terms of
> perfevent.hwcounters.* names to produce a perfevent.derivations.bandwidth
> metric.
>
> This will mean you could make use of the existing PMDA instead of using
> hard-coded events (like in this snippet of your patch...)
>
Sure. We can give this approach a try. We will definitely get back to
you if we have queries around this. Haven't explored much of this yet.
We agree that it would a better design at the end of the day if we can
avoid new PMDA agent and instead extend the existing one as they are
programming same hardware counters.
>> +#ifdef __x86_64__
>> +char *events[] = {
>> + "snbep_unc_imc0::UNC_M_CAS_COUNT:RD",
>> + "snbep_unc_imc0::UNC_M_CAS_COUNT:WR",
>> + "snbep_unc_imc1::UNC_M_CAS_COUNT:RD",
>> + "snbep_unc_imc1::UNC_M_CAS_COUNT:WR",
>> + "snbep_unc_imc2::UNC_M_CAS_COUNT:RD",
>> + "snbep_unc_imc2::UNC_M_CAS_COUNT:WR",
>> + "snbep_unc_imc3::UNC_M_CAS_COUNT:RD",
>> + "snbep_unc_imc3::UNC_M_CAS_COUNT:WR"
>> +};
>> +#define NR_EVENTS 8
>> +
>> +#elif defined(__PPC64__)
>> +char *events[] = {
>> + "powerpc_nest_mcs_read::MCS_00",
>> + "powerpc_nest_mcs_read::MCS_01",
>> + "powerpc_nest_mcs_read::MCS_02",
>> + "powerpc_nest_mcs_read::MCS_03",
>> + "powerpc_nest_mcs_write::MCS_00",
>> + "powerpc_nest_mcs_write::MCS_01",
>> + "powerpc_nest_mcs_write::MCS_02",
>> + "powerpc_nest_mcs_write::MCS_03"
>> +};
>> +#define NR_EVENTS 8
>> [...]
>> +#else
>> +/* For unsupported architectures */
>> +char *events = NULL;
>> +#define NR_EVENTS 0
>> +#endif
>> +
>
> ... which lacks flexibility. Other people might want to do derivations
> using other hardware counters - we should try to make it easier for the
> people following in your footsteps.
Agree!
> So, flexible specification is one advantage. Another is you would not be
> creating a new PMDA that makes conflicting use of hardware counters (they
> would be shared). A third is when you come to test the new code, you can
> leverage the work already done in qa/perfevent/* and qa/{756,757} instead
> of starting from scratch there.
>
> (Joe, any thoughts on the above? Any ideas about neater config syntax?)
>
>
> Onto the other metric added here - bandwidth.max - that looks a bit out
> of place here (doesn't use perfevent APIs). Its reading a /proc file
> and exporting a value - this is more the domain of the Linux kernel PMDA.
> It would be a better fit for somewhere below the hardware metrics that
> pmdalinux exports, perhaps using a "hinv.mem.bandwidth" metric name -
> could you add it there instead?
Yes, it is a better fit under Linux kernel PMDA as it is not reading any
PMU counters. We will work on this.
>
> The current code there runs uname(2) on every sample - it would be better
> to move that out of the fetch code into a global variable (e.g. pmdalinux
> already does this - see kernel_uname in src/pmdas/linux/pmda.c).
>
> This new metric addition should be a separate patch in the series.
>
Agree.
>> Going forward we want to have this reporting memory bandwidth as a rate
>> for certain time intervals which can be queried by the clients rather
>> than the aggregated counter values.
>
> Ah - makes me think of the way the "hotproc" metrics in pmdaproc work,
> which are also doing server-side calculations, on an interval, to reduce
> the proc.* metrics. You might be able to find some ideas in that code.
> See pmdaproc(1) man page for details, "HOTPROC OVERVIEW", and following
> couple of "CONFIGURATION" sections.
>
Thanks a lot for the pointers and design inputs. We will look to rework
this.
Cheers,
Deepthi
> cheers.
>
> --
> Nathan
>
|