pcp
[Top] [All Lists]

Re: [pcp] new pmdamic

To: Martins Innus <minnus@xxxxxxxxxxx>
Subject: Re: [pcp] new pmdamic
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 10 Mar 2015 00:59:48 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <54FDC740.4040508@xxxxxxxxxxx>
References: <54FDC740.4040508@xxxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: +8Y002MyUgUmj4U5qcZy/qfDIp1HJw==
Thread-topic: new pmdamic
Hi Martins,

----- Original Message -----
> Hi,
>      Here is a new pmda to support monitoring intel mic/phi cards.
> 
> https://github.com/ubccr/pcp/tree/micpmda
> 
> I only have a single card to test with and this is developed against
> mpss 3.3 on a host running Centos 6.6.  If anyone has issues with more
> cards or other mpss versions, please let me know.  The pmda uses the
> libmicmgmt python bindings, which as far as I can tell live in /usr/src,
> so I have to mess with the python path in the pmda and QA.  If there is
> a better way to do this, let me know.
> 
> The pmda runs on the host, not the mic card itself.  libmicmgmt takes
> care of grabbing the data from the card.
> 
> This is my first python pmda, so let me know if I've messed anything
> up.  I will likely be adding more metrics in the future.
> 

Looking good!  Here's some review notes:

- some new files have (cut&paste) copyright annotations from other folks

- I think there's some unused imports at the script head?  (os, time,
  ctypes - the latter has explicit "import foo, bar from ctypes" later
  on)

- sys.path.append('/usr/src')
  As you suggest above, this hard-coded setting could be made a bit more
  flexible, and we can then use that for extending QA coverage (see below)
  as well as allowing other potential locations for this python module.
  Several other PMDAs support passing settings in via a <pmda>.conf file -
  we could follow that convention here.  The ./Install script then checks
  for the python module in this (and potentially other) location(s), and
  either errors out or creates the config file as appropriate.

- QA - we could do what we did with the Nvidia PMDA to ensure we exercise
  the fetch/instance paths in this PMDA (in addition to those code paths
  you've already covered).  A very basic python-code module with classes
  for MicDevice, MicDevInfo & stubbing out the used functions to return a
  fixed hardware setup and some emulated values.  The QA test could then
  write a micmgmt.conf that points to the qa $here/micmgmt module.

- the PMDA contains...
# So far all the stats seem to be available to any user
#self.set_user(PCP.pmGetConfig('PCP_USER'))
  These two lines are a bit contradictory - the default is to run the PMDA
  as root, if thats not needed its a good idea to use the PCP_USER account
  as the commented-out code line does.  If this is the case, you can also
  remove forced_restart=true from the PMDA ./Install script.

- regarding metric naming; consider instead of:
  mic.mgmt.cores.core%d.sys with instances like "cardX"
  using metric names more like:
  mic.mgmt.percore.sys with instances like "cardX/coreY"

  like we ended up with for cgroups, after going the first route. In
  practice this often works out more simply.  In the former case, e.g.
  if core count becomes large, things become unwieldy using the metric
  name style of encoding.

  If there's a need to restrict fetching to one card, or one core, then
  adding storable control metrics to enable those filters can be a good
  option for getting fine-grained control.  pmdagluster is a reference
  you could use for that.


cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>