pcp
[Top] [All Lists]

Re: [pcp] [RFC PATCH 1/1] Adding a PMDA to collect memory bandwidth

To: Nathan Scott <nathans@xxxxxxxxxx>, Hemant Kumar <hemant@xxxxxxxxxxxxxxxxxx>, Joseph White <jpwhite4@xxxxxxxxxxx>
Subject: Re: [pcp] [RFC PATCH 1/1] Adding a PMDA to collect memory bandwidth
From: Deepthi Dharwar <deepthi@xxxxxxxxxxxxxxxxxx>
Date: Mon, 20 Jul 2015 07:16:06 +0530
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <513905742.40875370.1437117101643.JavaMail.zimbra@xxxxxxxxxx>
References: <1436678109-30878-1-git-send-email-hemant@xxxxxxxxxxxxxxxxxx> <1436678109-30878-2-git-send-email-hemant@xxxxxxxxxxxxxxxxxx> <513905742.40875370.1437117101643.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
Hello,

On 07/17/2015 12:41 PM, Nathan Scott wrote:
> Hi guys,
> 
> Sorry about the delay getting back to you.
> 

Thanks a lot Nathan for the detailed review.


> ----- Original Message -----
>>
>> Following is a patch series prototyping a new Performance Monitoring
>> Daemon agent -> Memory_bandwidth.
>> [...]
> 
> OK, interesting stuff - I'm following your thinking up to this point...
> 
>>
>> Why do we need a different agent to do this ?
>> -------------------------------------------
>> We are trying to compute memory bandwidth of a system. This can be
>> achieved by reading a bunch of PMU counters via perfevent, Aggregating
>> those values, Multiplying with a scale as mentioned in the sysfs entry
>> on the system and some more math on top of it to get a single metric
>> value denoting the memory bandwidth of the system.
> 
> OK.  That doesn't preclude use of a modified perfevent though.
> 

Agree!

>> Also, to note is that PMU counter names will vary depending on the
>> architecture. Ideally we would want to consume this metric via
>> OpenStack. Given this scenario, OpenStack will be our client. Ideally we
>> need to have all the reading of counters and math on top of it there.
> 
> OK.
> 
>> From an OpenStack consume-ability side, it should connect to the pmcd
>> daemon and get the required single aggregated post processed metric in a
>> single call irrespective of underneath architecture.
> 
> *nod*
> 
>> Given this requirement, would it be good to have all
>> the architecture dependent stuff i.e reading PMUs and related math in
>> PcP and just return the memory bandwidth metric to OpenStack ?
> 
> Yep, I think that makes alot of sense.
> 
>> This would result in a cleaner design where all the architecture
>> dependent counters and computation is done in the backend PcP and just
>> the value is returned to OpenStack.
> 
> *nod*
> 
>> Why not extend the perfevent agent ?
>> ----------------------------------
>>
>> perfevent agent and memory bandwidth agent end up reading PMUs
>> via the perf api. But currently as per design of perfevent agent, the
>> post processing for perfevents is done by pmval and other clients.
> 
> Yes.  But pmdaperfevent doesn't have to *only* work that way though, it
> could be extended to handle this new need for derivations.
> 

Good to know that.

> In perfevent/pmns you'll see "hwcounters PERFEVENT:*:*".  This allows
> the PMDA to dynamically create configured event counter metrics below
> perfevent.hwcounters.* (this, and the perfevent.conf(5) language for
> specifying the counters - see setup_perfevents() and setup_pmns() in
> the perfevent PMDA code).
> 
> A more flexible solution for your situation would be to extend this with
> a perfevent.derivations.* set of metrics.  For example, you might want
> to extend perfevent.conf syntax to allow "[derivation: bandwidth]" (or
> some better syntax) and then specify the averaging &| scaling in terms of
> perfevent.hwcounters.* names to produce a perfevent.derivations.bandwidth
> metric.
> 
> This will mean you could make use of the existing PMDA instead of using
> hard-coded events (like in this snippet of your patch...)
> 

Sure. We can give this approach a try. We will definitely get back to
you if we have queries around this. Haven't explored much of this yet.

We agree that it would a better design at the end of the day if we can
avoid new PMDA agent and instead extend the existing one as they are
programming same hardware counters.

>> +#ifdef __x86_64__
>> +char *events[] = {
>> +    "snbep_unc_imc0::UNC_M_CAS_COUNT:RD",
>> +    "snbep_unc_imc0::UNC_M_CAS_COUNT:WR",
>> +    "snbep_unc_imc1::UNC_M_CAS_COUNT:RD",
>> +    "snbep_unc_imc1::UNC_M_CAS_COUNT:WR",
>> +    "snbep_unc_imc2::UNC_M_CAS_COUNT:RD",
>> +    "snbep_unc_imc2::UNC_M_CAS_COUNT:WR",
>> +    "snbep_unc_imc3::UNC_M_CAS_COUNT:RD",
>> +    "snbep_unc_imc3::UNC_M_CAS_COUNT:WR"
>> +};
>> +#define NR_EVENTS 8
>> +
>> +#elif defined(__PPC64__)
>> +char *events[] = {
>> +    "powerpc_nest_mcs_read::MCS_00",
>> +    "powerpc_nest_mcs_read::MCS_01",
>> +    "powerpc_nest_mcs_read::MCS_02",
>> +    "powerpc_nest_mcs_read::MCS_03",
>> +    "powerpc_nest_mcs_write::MCS_00",
>> +    "powerpc_nest_mcs_write::MCS_01",
>> +    "powerpc_nest_mcs_write::MCS_02",
>> +    "powerpc_nest_mcs_write::MCS_03"
>> +};
>> +#define NR_EVENTS 8
>> [...]
>> +#else
>> +/* For unsupported architectures */
>> +char *events = NULL;
>> +#define NR_EVENTS 0
>> +#endif
>> +
> 
> ... which lacks flexibility.  Other people might want to do derivations
> using other hardware counters - we should try to make it easier for the
> people following in your footsteps.

Agree!

> So, flexible specification is one advantage.  Another is you would not be
> creating a new PMDA that makes conflicting use of hardware counters (they
> would be shared).  A third is when you come to test the new code, you can
> leverage the work already done in qa/perfevent/* and qa/{756,757} instead
> of starting from scratch there.
> 
> (Joe, any thoughts on the above?  Any ideas about neater config syntax?)
> 
> 
> Onto the other metric added here - bandwidth.max - that looks a bit out
> of place here (doesn't use perfevent APIs).  Its reading a /proc file
> and exporting a value - this is more the domain of the Linux kernel PMDA.
> It would be a better fit for somewhere below the hardware metrics that
> pmdalinux exports, perhaps using a "hinv.mem.bandwidth" metric name -
> could you add it there instead?

Yes, it is a better fit under Linux kernel PMDA as it is not reading any
PMU counters. We will work on this.

> 
> The current code there runs uname(2) on every sample - it would be better
> to move that out of the fetch code into a global variable (e.g. pmdalinux
> already does this - see kernel_uname in src/pmdas/linux/pmda.c).
> 
> This new metric addition should be a separate patch in the series.
> 

Agree.

>> Going forward we want to have this reporting memory bandwidth as a rate
>> for certain time intervals which can be queried by the clients rather
>> than the aggregated counter values.
> 
> Ah - makes me think of the way the "hotproc" metrics in pmdaproc work,
> which are also doing server-side calculations, on an interval, to reduce
> the proc.* metrics.  You might be able to find some ideas in that code.
> See pmdaproc(1) man page for details, "HOTPROC OVERVIEW", and following
> couple of "CONFIGURATION" sections.
>

Thanks a lot for the pointers and design inputs. We will look to rework
this.

Cheers,
Deepthi


> cheers.
> 
> --
> Nathan
> 

<Prev in Thread] Current Thread [Next in Thread>