pcp
[Top] [All Lists]

Re: PCP libvirt PMDA

To: pcp developers <pcp@xxxxxxxxxxx>
Subject: Re: PCP libvirt PMDA
From: Marko Myllynen <myllynen@xxxxxxxxxx>
Date: Mon, 18 Jul 2016 14:21:44 +0300
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1fa58d82-ac73-7747-c58d-acf880bc2155@xxxxxxxxxx>
Organization: Red Hat
References: <1fa58d82-ac73-7747-c58d-acf880bc2155@xxxxxxxxxx>
Reply-to: Marko Myllynen <myllynen@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2
Hi,

On 2016-07-12 12:01, Marko Myllynen wrote:
> 
> In addition to domain information from XML description, we now also
> provide some basic metrics for the hypervisor and, more importantly, all
> metrics available for domains (VMs). Some of the metrics are of course
> dependent on the platform support so for example the libvirt perf
> metrics will be available only if supported by the CPU and then enabled
> in libvirt.
> 
> Few notes:
> 
> - the types/semantics/units descriptions for metrics are based on
> libvirt docs and sources but if you spot anything there (or can suggest
> better naming or description), let me know

These are reviewed and adjusted, should be correct now.

> - domain information is refreshed when any of the libvirt.dominfo.*
> metrics are requested, if it turns out that it should be done more often
> we can revisit this but so far this seems to be an ok approach (since
> I'd expect that something like libvirt.dominfo.name would be requested
> as well when requesting domain metrics)

We'll continue to monitor how this approach works out.

> - the Install script creates persistent indom files (like the Oracle
> one) but I'm not sure is that actually working, looks like the files
> are being rewritten occasionally when restarting pmcd

AFAICS this is working, perhaps I had some issues during early
development phase, so I think we can assume this is all ok now.

> - libvirt API provides VM metrics for individual vCPUs/NICs/block
> devices but since instance IDs are the domain UUIDs in the PMDA, it's
> unclear what would be the optimal approach to provide those (dynamic)
> device metrics as PCP metrics. Thus the PMDA combines these together.

I investigated the Python PMDA API a bit and this turned out to be
pretty straightforward. The patch below implements per-device metrics
for vCPU/block/net. It's easy to see that these could be very
interesting, e.g., in case of a database VM or a file server VM. There's
quite some amount of new code but most of it is executed only when
adding new metrics (for example, if libvirt.domstats.net.1.rd.times
does not exists and a VM with two NICs (0, 1) is present, then that
metric is being added, libvirt.domstats.net.0.rd.times and others are
left untouched. And when the VM with two NICs goes away, there will
be just the typical "not available" return code sent to the clients.

---
 src/pmdas/libvirt/pmdalibvirt.python | 162 ++++++++++++++++++++++++++++-------
 1 file changed, 129 insertions(+), 33 deletions(-)

diff --git a/src/pmdas/libvirt/pmdalibvirt.python 
b/src/pmdas/libvirt/pmdalibvirt.python
index 137138b..fa30b53 100755
--- a/src/pmdas/libvirt/pmdalibvirt.python
+++ b/src/pmdas/libvirt/pmdalibvirt.python
@@ -132,9 +132,9 @@ class LibvirtPMDA(PMDA):
             # See libvirt.git/src/libvirt-domain.c
             [ 'domstats.vcpu.current',     None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_count, 'VM vCPUs, current'               
],
             [ 'domstats.vcpu.maximum',     None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_count, 'VM vCPUs, maximum'               
],
-#           [ 'domstats.vcpu.state',       None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_none,  'VM vCPUs, state'                 
],
-            [ 'domstats.vcpu.time',        None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM vCPUs, time'                  
],
-            [ 'domstats.vcpu.wait',        None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM vCPUs, wait'                  
],
+            [ 'domstats.vcpu.all.state',   None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_none,  'VM vCPUs, total state'           
],
+            [ 'domstats.vcpu.all.time',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM vCPUs, total time'            
],
+            [ 'domstats.vcpu.all.wait',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM vCPUs, total wait'            
],
         ]
 
         self.vm_memstats_res = {}
@@ -166,18 +166,20 @@ class LibvirtPMDA(PMDA):
         self.vm_blockstats = [
             # Name - empty - type - semantics - units - help
             # See libvirt.git/src/libvirt-domain.c
-            [ 'domstats.block.count',      None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_count, 'VM block devs, count'            
],
-            [ 'domstats.block.rd.reqs',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM block devs, rd reqs'          
],
-            [ 'domstats.block.rd.bytes',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM block devs, rd bytes'         
],
-            [ 'domstats.block.rd.times',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM block devs, rd times'         
],
-            [ 'domstats.block.wr.reqs',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM block devs, wr reqs'          
],
-            [ 'domstats.block.wr.bytes',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM block devs, wr bytes'         
],
-            [ 'domstats.block.wr.times',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM block devs, wr times'         
],
-            [ 'domstats.block.fl.reqs',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM block devs, fl reqs'          
],
-            [ 'domstats.block.fl.times',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM block devs, fl times'         
],
-            [ 'domstats.block.allocation', None,                       
PM_TYPE_U64,    PM_SEM_INSTANT,  units_bytes, 'VM backing imgs, allocation'     
],
-            [ 'domstats.block.capacity',   None,                       
PM_TYPE_U64,    PM_SEM_INSTANT,  units_bytes, 'VM backing imgs, capacity'       
],
-            [ 'domstats.block.physical',   None,                       
PM_TYPE_U64,    PM_SEM_INSTANT,  units_bytes, 'VM backing imgs, physical'       
],
+            [ 'domstats.block.count',          None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_count, 'VM block devs, count'            
  ],
+            [ 'domstats.block.all.rd.reqs',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM block devs, total rd reqs'    
  ],
+            [ 'domstats.block.all.rd.bytes',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM block devs, total rd bytes'   
  ],
+            [ 'domstats.block.all.rd.times',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM block devs, total rd times'   
  ],
+            [ 'domstats.block.all.wr.reqs',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM block devs, total wr reqs'    
  ],
+            [ 'domstats.block.all.wr.bytes',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM block devs, total wr bytes'   
  ],
+            [ 'domstats.block.all.wr.times',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM block devs, total wr times'   
  ],
+            [ 'domstats.block.all.fl.reqs',    None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM block devs, total fl reqs'    
  ],
+            [ 'domstats.block.all.fl.times',   None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_nsecs, 'VM block devs, total fl times'   
  ],
+            [ 'domstats.block.all.name',       None,                       
PM_TYPE_STRING, PM_SEM_INSTANT,  units_none,  'VM block devs, all names'        
  ],
+            [ 'domstats.block.all.allocation', None,                       
PM_TYPE_U64,    PM_SEM_INSTANT,  units_bytes, 'VM backing imgs, total 
allocation' ],
+            [ 'domstats.block.all.capacity',   None,                       
PM_TYPE_U64,    PM_SEM_INSTANT,  units_bytes, 'VM backing imgs, total capacity' 
  ],
+            [ 'domstats.block.all.physical',   None,                       
PM_TYPE_U64,    PM_SEM_INSTANT,  units_bytes, 'VM backing imgs, total physical' 
  ],
+            [ 'domstats.block.all.path',       None,                       
PM_TYPE_STRING, PM_SEM_INSTANT,  units_none,  'VM backing imgs, all paths'      
  ],
         ]
 
         self.vm_netstats_res = []
@@ -185,15 +187,16 @@ class LibvirtPMDA(PMDA):
         self.vm_netstats = [
             # Name - empty - type - semantics - units - help
             # See libvirt.git/src/libvirt-domain.c
-            [ 'domstats.net.count',        None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_count, 'VM NICs, count'                  
],
-            [ 'domstats.net.rx.bytes',     None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM NICs, rx bytes'               
],
-            [ 'domstats.net.rx.pkts',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, rx pkts'                
],
-            [ 'domstats.net.rx.errs',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, rx errs'                
],
-            [ 'domstats.net.rx.drop',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, rx drop'                
],
-            [ 'domstats.net.tx.bytes',     None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM NICs, tx bytes'               
],
-            [ 'domstats.net.tx.pkts',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, tx pkts'                
],
-            [ 'domstats.net.tx.errs',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, tx errs'                
],
-            [ 'domstats.net.tx.drop',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, tx drop'                
],
+            [ 'domstats.net.count',            None,                       
PM_TYPE_U32,    PM_SEM_INSTANT,  units_count, 'VM NICs, count'                  
  ],
+            [ 'domstats.net.all.name',         None,                       
PM_TYPE_STRING, PM_SEM_INSTANT,  units_none,  'VM NICs, all names'              
  ],
+            [ 'domstats.net.all.rx.bytes',     None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM NICs, total rx bytes'         
  ],
+            [ 'domstats.net.all.rx.pkts',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, total rx pkts'          
  ],
+            [ 'domstats.net.all.rx.errs',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, total rx errs'          
  ],
+            [ 'domstats.net.all.rx.drop',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, total rx drop'          
  ],
+            [ 'domstats.net.all.tx.bytes',     None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_bytes, 'VM NICs, total tx bytes'         
  ],
+            [ 'domstats.net.all.tx.pkts',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, total tx pkts'          
  ],
+            [ 'domstats.net.all.tx.errs',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, total tx errs'          
  ],
+            [ 'domstats.net.all.tx.drop',      None,                       
PM_TYPE_U64,    PM_SEM_COUNTER,  units_count, 'VM NICs, total tx drop'          
  ],
         ]
 
         self.vm_perfstats_res = []
@@ -408,6 +411,30 @@ class LibvirtPMDA(PMDA):
                                 elif i == 2:
                                     res['vcpu.' + nrstr + '.time'] = 
stats[nr][i]
                         self.vm_vcpustats_res.append([dom, res])
+
+                if self.vm_vcpustats_res:
+                    high = 0
+                    for r in self.vm_vcpustats_res:
+                         if r[1]['vcpu.current'] > high:
+                             high = r[1]['vcpu.current']
+                    if not high:
+                        return
+                    base = self.read_name() + '.domstats.vcpu.'
+                    if base + str(high-1) + '.time' not in 
self._metric_names.values():
+                        # New high for vCPUs, add needed per-vCPU metrics
+                        metrics = ['state', 'time', 'wait']
+                        for nr in range(high):
+                            nrstr = str(nr)
+                            if base + nrstr + '.time' not in 
self._metric_names.values():
+                                for j, m in enumerate(metrics):
+                                    metric = base + nrstr + '.' + m
+                                    help = 'VM vCPU' + nrstr + ', ' + m
+                                    # 5 - nr of static items, 2 - nr of items 
before total metrics
+                                    self.add_metric(metric, 
pmdaMetric(self.pmid(self.vm_vcpustats_cluster, 5+nr*len(metrics)+j),
+                                        self.vm_vcpustats[2+j][2], 
self.vm_indom, self.vm_vcpustats[2+j][3],
+                                        self.vm_vcpustats[2+j][4]), help, help)
+                                    
self.vm_vcpustats.append([metric.replace(self.read_name() + '.', ''), None, 
self.vm_vcpustats[2+j][2]])
+                        self.pmns_refresh()
             except libvirt.libvirtError as error:
                 self.log("Failed to get domain vcpu stats: %s" % error)
             return
@@ -469,6 +496,34 @@ class LibvirtPMDA(PMDA):
                                 elif i == 3:
                                     res['block.' + nrstr + '.wr.bytes'] = 
stats[i]
                         self.vm_blockstats_res.append([dom, res])
+
+                if self.vm_blockstats_res:
+                    high = 0
+                    for r in self.vm_blockstats_res:
+                         if r[1]['block.count'] > high:
+                             high = r[1]['block.count']
+                    if not high:
+                        return
+                    base = self.read_name() + '.domstats.block.'
+                    if base + str(high-1) + '.rd.reqs' not in 
self._metric_names.values():
+                        # New high for block devices, add needed per-block 
device metrics
+                        metrics = ['rd.reqs', 'rd.bytes', 'rd.times', 
'wr.reqs', 'wr.bytes', 'wr.times', 'fl.reqs', 'fl.times', 'name', 'allocation', 
'capacity', 'physical', 'path']
+                        backing = ['allocation', 'capacity', 'physical', 
'path']
+                        for nr in range(high):
+                            nrstr = str(nr)
+                            if base + nrstr + '.rd.reqs' not in 
self._metric_names.values():
+                                for j, m in enumerate(metrics):
+                                    metric = base + nrstr + '.' + m
+                                    if m not in backing:
+                                        help = 'VM block dev ' + nrstr + ', ' 
+ m.replace('.', ' ')
+                                    else:
+                                        help = 'VM backing img ' + nrstr + ', 
' + m
+                                    # 14 - nr of static items, 1 - nr of items 
before total metrics
+                                    self.add_metric(metric, 
pmdaMetric(self.pmid(self.vm_blockstats_cluster, 14+nr*len(metrics)+j),
+                                        self.vm_blockstats[1+j][2], 
self.vm_indom, self.vm_blockstats[1+j][3],
+                                        self.vm_blockstats[1+j][4]), help, 
help)
+                                    
self.vm_blockstats.append([metric.replace(self.read_name() + '.', ''), None, 
self.vm_blockstats[1+j][2]])
+                        self.pmns_refresh()
             except libvirt.libvirtError as error:
                 self.log("Failed to get domain block stats: %s" % error)
             return
@@ -510,6 +565,33 @@ class LibvirtPMDA(PMDA):
                                 elif i == 7:
                                     res['net.' + nrstr + '.tx.drop'] = stats[i]
                         self.vm_netstats_res.append([dom, res])
+
+                if self.vm_netstats_res:
+                    high = 0
+                    for r in self.vm_netstats_res:
+                         if r[1]['net.count'] > high:
+                             high = r[1]['net.count']
+                    if not high:
+                        return
+                    base = self.read_name() + '.domstats.net.'
+                    if base + str(high-1) + '.rx.bytes' not in 
self._metric_names.values():
+                        # New high for NICs, add needed per-NIC metrics
+                        metrics = ['name', 'rx.bytes', 'rx.pkts', 'rx.errs', 
'rx.drop', 'tx.bytes', 'tx.pkts', 'tx.errs', 'tx.drop']
+                        for nr in range(high):
+                            nrstr = str(nr)
+                            if base + nrstr + '.rx.bytes' not in 
self._metric_names.values():
+                                for j, m in enumerate(metrics):
+                                    metric = base + nrstr + '.' + m
+                                    if m == 'name':
+                                        help = 'VM NIC ' + nrstr + ', name'
+                                    else:
+                                        help = 'VM NIC ' + nrstr + ', ' + 
m.replace('.', ' ')
+                                    # 10 - nr of static items, 1 - nr of items 
before total metrics
+                                    self.add_metric(metric, 
pmdaMetric(self.pmid(self.vm_netstats_cluster, 10+nr*len(metrics)+j),
+                                        self.vm_netstats[1+j][2], 
self.vm_indom, self.vm_netstats[1+j][3],
+                                        self.vm_netstats[1+j][4]), help, help)
+                                    
self.vm_netstats.append([metric.replace(self.read_name() + '.', ''), None, 
self.vm_netstats[1+j][2]])
+                        self.pmns_refresh()
             except libvirt.libvirtError as error:
                 self.log("Failed to get domain net stats: %s" % error)
             return
@@ -586,7 +668,7 @@ class LibvirtPMDA(PMDA):
 
         if cluster == self.vm_memstats_cluster:
             try:
-                key = self.vm_memstats[item][0].rsplit('.')[2]
+                key = self.vm_memstats[item][0].rpartition('.')[2]
                 return 
[self.vm_memstats_res[self.vm_insts.inst_name_lookup(inst)][key], 1]
             except:
                 return [PM_ERR_VALUE, 0]
@@ -627,21 +709,25 @@ class LibvirtPMDA(PMDA):
                 if pos < 0:
                     return [PM_ERR_INST, 0]
 
+                key = mtx[item][0].partition('.')[2]
+
                 # All done for non-dynamic clusters
                 if cluster != self.vm_vcpustats_cluster and \
                    cluster != self.vm_blockstats_cluster and \
                    cluster != self.vm_netstats_cluster:
-                    key = '.'.join(mtx[item][0].split('.')[1:])
                     if key in res[pos][1]:
                         return [res[pos][1][key], 1]
                     else:
                         return [PM_ERR_AGAIN, 0]
 
                 # Non-combined values in dynamic clusters
-                key = '.'.join(mtx[item][0].split('.')[1:])
                 if key == 'vcpu.current' or key == 'vcpu.maximum' or \
-                   key == 'net.count' or key == 'block.count':
-                    return [res[pos][1][key], 1]
+                   key == 'net.count' or key == 'block.count' or \
+                   '.all.' not in key:
+                    if key in res[pos][1]:
+                        return [res[pos][1][key], 1]
+                    else:
+                        return [PM_ERR_AGAIN, 0]
 
                 # Combine N values for dynamic metrics
                 if 'vcpu' in mtx[item][0]:
@@ -653,12 +739,22 @@ class LibvirtPMDA(PMDA):
                 else:
                     return [PM_ERR_VALUE, 0]
 
-                # Calculate the combined value
-                value = 0
+                # Construct the combined total value
+                mtype = mtx[item][2]
+                if mtype == PM_TYPE_STRING:
+                    value = ""
+                else:
+                    value = 0
                 for i in range(count):
-                    k = key.split('.')[0] + '.' + str(i) + '.' + 
'.'.join(key.split('.')[1:])
+                    parts = key.partition('.all.')
+                    k = parts[0] + '.' + str(i) + '.' + parts[2]
                     if k in res[pos][1]:
-                        value += res[pos][1][k]
+                        if mtype == PM_TYPE_STRING:
+                            value = value + ' ' + res[pos][1][k]
+                        else:
+                            value += res[pos][1][k]
+                if mtype == PM_TYPE_STRING and value.startswith(' '):
+                    value = value[1:]
                 return [value, 1]
             except:
                 return [PM_ERR_VALUE, 0]

Thanks,

-- 
Marko Myllynen

<Prev in Thread] Current Thread [Next in Thread>