Hi Martins,
----- Original Message -----
> Hi,
> Here is a new pmda to support monitoring intel mic/phi cards.
>
> https://github.com/ubccr/pcp/tree/micpmda
>
> I only have a single card to test with and this is developed against
> mpss 3.3 on a host running Centos 6.6. If anyone has issues with more
> cards or other mpss versions, please let me know. The pmda uses the
> libmicmgmt python bindings, which as far as I can tell live in /usr/src,
> so I have to mess with the python path in the pmda and QA. If there is
> a better way to do this, let me know.
>
> The pmda runs on the host, not the mic card itself. libmicmgmt takes
> care of grabbing the data from the card.
>
> This is my first python pmda, so let me know if I've messed anything
> up. I will likely be adding more metrics in the future.
>
Looking good! Here's some review notes:
- some new files have (cut&paste) copyright annotations from other folks
- I think there's some unused imports at the script head? (os, time,
ctypes - the latter has explicit "import foo, bar from ctypes" later
on)
- sys.path.append('/usr/src')
As you suggest above, this hard-coded setting could be made a bit more
flexible, and we can then use that for extending QA coverage (see below)
as well as allowing other potential locations for this python module.
Several other PMDAs support passing settings in via a <pmda>.conf file -
we could follow that convention here. The ./Install script then checks
for the python module in this (and potentially other) location(s), and
either errors out or creates the config file as appropriate.
- QA - we could do what we did with the Nvidia PMDA to ensure we exercise
the fetch/instance paths in this PMDA (in addition to those code paths
you've already covered). A very basic python-code module with classes
for MicDevice, MicDevInfo & stubbing out the used functions to return a
fixed hardware setup and some emulated values. The QA test could then
write a micmgmt.conf that points to the qa $here/micmgmt module.
- the PMDA contains...
# So far all the stats seem to be available to any user
#self.set_user(PCP.pmGetConfig('PCP_USER'))
These two lines are a bit contradictory - the default is to run the PMDA
as root, if thats not needed its a good idea to use the PCP_USER account
as the commented-out code line does. If this is the case, you can also
remove forced_restart=true from the PMDA ./Install script.
- regarding metric naming; consider instead of:
mic.mgmt.cores.core%d.sys with instances like "cardX"
using metric names more like:
mic.mgmt.percore.sys with instances like "cardX/coreY"
like we ended up with for cgroups, after going the first route. In
practice this often works out more simply. In the former case, e.g.
if core count becomes large, things become unwieldy using the metric
name style of encoding.
If there's a need to restrict fetching to one card, or one core, then
adding storable control metrics to enable those filters can be a good
option for getting fine-grained control. pmdagluster is a reference
you could use for that.
cheers.
--
Nathan
|