pcp
[Top] [All Lists]

Re: [pcp] [RFC PATCH 1/1] Adding a PMDA to collect memory bandwidth

To: Hemant Kumar <hemant@xxxxxxxxxxxxxxxxxx>, deepthi@xxxxxxxxxxxxxxxxxx, Joseph White <jpwhite4@xxxxxxxxxxx>
Subject: Re: [pcp] [RFC PATCH 1/1] Adding a PMDA to collect memory bandwidth
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Fri, 17 Jul 2015 03:11:41 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1436678109-30878-2-git-send-email-hemant@xxxxxxxxxxxxxxxxxx>
References: <1436678109-30878-1-git-send-email-hemant@xxxxxxxxxxxxxxxxxx> <1436678109-30878-2-git-send-email-hemant@xxxxxxxxxxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: zjkaWaqP4foRUn8DebkXtNqlPSjSgg==
Thread-topic: Adding a PMDA to collect memory bandwidth
Hi guys,

Sorry about the delay getting back to you.

----- Original Message -----
> 
> Following is a patch series prototyping a new Performance Monitoring
> Daemon agent -> Memory_bandwidth.
> [...]

OK, interesting stuff - I'm following your thinking up to this point...

> 
> Why do we need a different agent to do this ?
> -------------------------------------------
> We are trying to compute memory bandwidth of a system. This can be
> achieved by reading a bunch of PMU counters via perfevent, Aggregating
> those values, Multiplying with a scale as mentioned in the sysfs entry
> on the system and some more math on top of it to get a single metric
> value denoting the memory bandwidth of the system.

OK.  That doesn't preclude use of a modified perfevent though.

> Also, to note is that PMU counter names will vary depending on the
> architecture. Ideally we would want to consume this metric via
> OpenStack. Given this scenario, OpenStack will be our client. Ideally we
> need to have all the reading of counters and math on top of it there.

OK.

> From an OpenStack consume-ability side, it should connect to the pmcd
> daemon and get the required single aggregated post processed metric in a
> single call irrespective of underneath architecture.

*nod*

> Given this requirement, would it be good to have all
> the architecture dependent stuff i.e reading PMUs and related math in
> PcP and just return the memory bandwidth metric to OpenStack ?

Yep, I think that makes alot of sense.

> This would result in a cleaner design where all the architecture
> dependent counters and computation is done in the backend PcP and just
> the value is returned to OpenStack.

*nod*

> Why not extend the perfevent agent ?
> ----------------------------------
> 
> perfevent agent and memory bandwidth agent end up reading PMUs
> via the perf api. But currently as per design of perfevent agent, the
> post processing for perfevents is done by pmval and other clients.

Yes.  But pmdaperfevent doesn't have to *only* work that way though, it
could be extended to handle this new need for derivations.

In perfevent/pmns you'll see "hwcounters PERFEVENT:*:*".  This allows
the PMDA to dynamically create configured event counter metrics below
perfevent.hwcounters.* (this, and the perfevent.conf(5) language for
specifying the counters - see setup_perfevents() and setup_pmns() in
the perfevent PMDA code).

A more flexible solution for your situation would be to extend this with
a perfevent.derivations.* set of metrics.  For example, you might want
to extend perfevent.conf syntax to allow "[derivation: bandwidth]" (or
some better syntax) and then specify the averaging &| scaling in terms of
perfevent.hwcounters.* names to produce a perfevent.derivations.bandwidth
metric.

This will mean you could make use of the existing PMDA instead of using
hard-coded events (like in this snippet of your patch...)

> +#ifdef __x86_64__
> +char *events[] = {
> +    "snbep_unc_imc0::UNC_M_CAS_COUNT:RD",
> +    "snbep_unc_imc0::UNC_M_CAS_COUNT:WR",
> +    "snbep_unc_imc1::UNC_M_CAS_COUNT:RD",
> +    "snbep_unc_imc1::UNC_M_CAS_COUNT:WR",
> +    "snbep_unc_imc2::UNC_M_CAS_COUNT:RD",
> +    "snbep_unc_imc2::UNC_M_CAS_COUNT:WR",
> +    "snbep_unc_imc3::UNC_M_CAS_COUNT:RD",
> +    "snbep_unc_imc3::UNC_M_CAS_COUNT:WR"
> +};
> +#define NR_EVENTS 8
> +
> +#elif defined(__PPC64__)
> +char *events[] = {
> +    "powerpc_nest_mcs_read::MCS_00",
> +    "powerpc_nest_mcs_read::MCS_01",
> +    "powerpc_nest_mcs_read::MCS_02",
> +    "powerpc_nest_mcs_read::MCS_03",
> +    "powerpc_nest_mcs_write::MCS_00",
> +    "powerpc_nest_mcs_write::MCS_01",
> +    "powerpc_nest_mcs_write::MCS_02",
> +    "powerpc_nest_mcs_write::MCS_03"
> +};
> +#define NR_EVENTS 8
> [...]
> +#else
> +/* For unsupported architectures */
> +char *events = NULL;
> +#define NR_EVENTS 0
> +#endif
> +

... which lacks flexibility.  Other people might want to do derivations
using other hardware counters - we should try to make it easier for the
people following in your footsteps.

So, flexible specification is one advantage.  Another is you would not be
creating a new PMDA that makes conflicting use of hardware counters (they
would be shared).  A third is when you come to test the new code, you can
leverage the work already done in qa/perfevent/* and qa/{756,757} instead
of starting from scratch there.

(Joe, any thoughts on the above?  Any ideas about neater config syntax?)


Onto the other metric added here - bandwidth.max - that looks a bit out
of place here (doesn't use perfevent APIs).  Its reading a /proc file
and exporting a value - this is more the domain of the Linux kernel PMDA.
It would be a better fit for somewhere below the hardware metrics that
pmdalinux exports, perhaps using a "hinv.mem.bandwidth" metric name -
could you add it there instead?

The current code there runs uname(2) on every sample - it would be better
to move that out of the fetch code into a global variable (e.g. pmdalinux
already does this - see kernel_uname in src/pmdas/linux/pmda.c).

This new metric addition should be a separate patch in the series.

> Going forward we want to have this reporting memory bandwidth as a rate
> for certain time intervals which can be queried by the clients rather
> than the aggregated counter values.

Ah - makes me think of the way the "hotproc" metrics in pmdaproc work,
which are also doing server-side calculations, on an interval, to reduce
the proc.* metrics.  You might be able to find some ideas in that code.
See pmdaproc(1) man page for details, "HOTPROC OVERVIEW", and following
couple of "CONFIGURATION" sections.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>