pcp
[Top] [All Lists]

pcp updates: nvidia gpu pmda

To: PCP <pcp@xxxxxxxxxxx>
Subject: pcp updates: nvidia gpu pmda
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 1 Jul 2014 03:02:45 -0400 (EDT)
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1899317607.1034544.1404197955820.JavaMail.zimbra@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: sGon69BGsFIK7SaV4Saxg6z1PTQCDQ==
Thread-topic: pcp updates: nvidia gpu pmda
Changes committed to git://git.performancecopilot.org/pcp.git dev

 debian/changelog                      |    3 
 debian/control                        |    2 
 qa/744                                |   46 ++
 qa/744.out                            |  199 +++++++++
 qa/745                                |   51 ++
 qa/745.out                            |  217 +++++++++
 qa/common.filter                      |    3 
 qa/group                              |    3 
 qa/src/.gitignore                     |    3 
 qa/src/GNUlocaldefs                   |   14 
 qa/src/GNUmakefile                    |    4 
 qa/src/nvidia-ml.c                    |  216 +++++++++
 src/pmdas/nvidia/.gitignore           |    5 
 src/pmdas/nvidia/GNUmakefile          |   57 ++
 src/pmdas/nvidia/Install              |   28 +
 src/pmdas/nvidia/README               |    7 
 src/pmdas/nvidia/Remove               |   38 +
 src/pmdas/nvidia/help                 |  128 ++++-
 src/pmdas/nvidia/localnvml.c          |  276 ++++++++++++
 src/pmdas/nvidia/localnvml.h          |   89 ++++
 src/pmdas/nvidia/nvidia.c             |  739 ++++++++++++++++++++++++++--------
 src/pmdas/nvidia/pcp-pmda-nvidia.spec |  158 +++++++
 src/pmdas/nvidia/pmns                 |   30 +
 src/pmdas/nvidia/root                 |   10 
 24 files changed, 2124 insertions(+), 202 deletions(-)

commit b09e5de8e453e09ad3a7c5092c3fba8633094f2a
Author: Nathan Scott <nathans@xxxxxxxxxx>
Date:   Tue Jul 1 16:57:47 2014 +1000

    Error handling and QA testing work for the nvidia PMDA
    
    Updates to the PMDA itself include installation of the DSO
    form, several bug fixes from my earlier round of updates and
    the addition of error handling around all NVML calls.
    
    Tests qa/744 and qa/745 have been added, as well as a simple
    little shared library implementing the basic NVML interfaces
    with a static configuration for 2 GPUs (for testing).  These
    tests bail out if a system nvidia-ml.so is found, so that'd
    be an interesting third case (which I cannot test here, so
    I've not written the test).

commit a0b7b421d70242de4aff1ada437c338d3c6db821
Author: Nathan Scott <nathans@xxxxxxxxxx>
Date:   Tue Jul 1 11:21:42 2014 +1000

    Update pmdanvidia help text a bit, based on the API docs

commit 320fa34e00127380f3bc5d3792bf64c0071266d4
Author: Nathan Scott <nathans@xxxxxxxxxx>
Date:   Tue Jul 1 11:07:26 2014 +1000

    pmdanvidia: support for long form command line options

commit 9244538dc0f9a17f0ff8d44ab15b570667abbfc6
Author: Martins Innus <minnus@xxxxxxxxxxx>
Date:   Tue Jul 1 11:02:41 2014 +1000

    Add Nvidia PMDA files/scripts missing from initial commit

commit 2efac89c01a7e975c6cda25ef1716dbee7d9d8a6
Author: Nathan Scott <nathans@xxxxxxxxxx>
Date:   Tue Jul 1 10:50:42 2014 +1000

    Followup work on Martins' initial nvidia PMDA
    
    This commit explores a mechanism for allowing PCP to provide
    a nvidia PMDA with runtime link support only (IOW, without a
    direct complet-time binding to the Nvidia code).
    
    Firstly, an audit of the interfaces needed was done, any/all
    needed interfaces and data structures identified, and header
    built using http://docs.nvidia.com/deploy/nvml-api/index.html
    as guide - I don't have the SDK, nor even any Nvidia hardware
    - fortunately, the API is super-simple.
    
    Next, wrappers around the NVML interfaces were added back to
    the PMDA code, inserting our own little code layer ahead of
    the nvidia library calls.  This also provides the mechanism
    by which testing can happen (followup commit) when no nvidia
    hardware/libraries are present.
    
    Martins identified a need for improved error handling, and as
    the API doesn't appear to have a strerror-alike interface for
    its error codes, one has been added into the wrapper for our
    own use.  Happily (I guess others have walked this path too!)
    the NVML API provides error codes for "library unavailable" &
    "function unavailable" already.  So, we make use of those for
    the cases where the PMDA is installed but no library has been
    configured yet, or the library doesn't have the symbols needed.
    
    In my travels I noticed a memfree metric, which the API docs
    say can differ to "total - used", so I added that metric too.

commit c9a7c7680b361dd6064bc3b8c81ef7d53a9a5a72
Author: Martins Innus <minnus@xxxxxxxxxxx>
Date:   Mon Jun 30 13:28:00 2014 +1000

    Initial version of the nvidia/nvml pmda and spec

commit 7c59bf709d51bfc0ec6ac85053c1cbfe484183f5
Author: Nathan Scott <nathans@xxxxxxxxxx>
Date:   Mon Jun 30 12:11:04 2014 +1000

    Add note about debian build fix, fallout from last release

commit ccdf05003f6f7bbd3cc8d5748f6af847d2ea9917
Author: Xilin Sun <s.sn.giraffe@xxxxxxxxx>
Date:   Mon Jun 30 12:09:35 2014 +1000

    Resolve debian build regression, add autoconf build dependency

<Prev in Thread] Current Thread [Next in Thread>
  • pcp updates: nvidia gpu pmda, Nathan Scott <=