pcp
[Top] [All Lists]

Re: [pcp] Verification and validation of performance metric values

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, PCP <pcp@xxxxxxxxxxx>
Subject: Re: [pcp] Verification and validation of performance metric values
From: William Cohen <wcohen@xxxxxxxxxx>
Date: Fri, 10 Jun 2016 12:10:06 -0400
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <575A0A54.2050009@xxxxxxxxxxxxxxxx>
References: <cdb0d65c-8e7b-bd92-aa0e-3e4def8b8dec@xxxxxxxxxx> <575A0A54.2050009@xxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0
On 06/09/2016 08:31 PM, Ken McDonell wrote:
> On 10/06/16 00:24, William Cohen wrote:
>> Hi,
>>
>> .... When I started using some of of the
>> performance metric provided by PCP I had the same question.  How, are
>> the PCP metrics checked to see if they are accurate/reasonable?  For
>> some of the PAPI (http://icl.cs.utk.edu/papi/) testsuite there are
>> cross checks to see that values agree with the expected values.
> 
> I don't think there is much we can do on the fly, but the PCP QA testsuite 
> contains many examples where we try to validate the values, either:
> (a) by comparing with an independent tool, e.g. sar or netstat, or
> (b) by generating synthetic loads and comparing the reported results against 
> the expected.
>
> But the coverage is weak.  Not many independent tools exist for (a) and 
> experiments of the style of (b) are notoriously (and conceptually) hard to 
> engineer in ways that make them statistically robust across the many 
> different platforms where PCP QA is run.
>
 
Yes, it is difficult to validate every metric.  As you mentioned maybe as 
separate testing have some benchmark/stress tests be monitored by pcp to see 
whether the values provided agree with the expected behavior.  It still might 
be hard to validate some metrics for unusual conditions (for example counts of 
various errors).  Plus, the validation tests on one hardware/software 
environment doesn't ensure it it works on others.

> 
>> For estimating network utilization using PCP the
>> network.interface.baudrate and network.interface.speed metrics should
>> provide some indication of the interface speed.  However, looking at
>> the numbers produced by pmval below they don't seem to be reasonable
>> for 1Gbps ethernet connection of em1.
> 
> Agreed.
> 

That is a nice list below. I will keep it in mind.

> These sort of errors arise from one or more of the following factors:
> 
> 1. the channels we use to get the raw data are typically not versioned and 
> outside any ABI guarantees, so even if the data was once correct, it may no 
> longer be so
>
> 2. algorithmic errors in the PMDA, e.g. picking the wrong field from a struct 
> or parsing ascii incorrectly
> 
> 3. metadata errors, e.g. the numbers are really bytes, but the PMDA reports 
> them as Mbytes
> 
> 4. arithmetic overflow, especially when converting from the raw units to the 
> PCP units to match the PCP metadata

Searching through the git logs show a number of commits to fix arithmetic 
overflow issues.

> 
> I don't think we can make these factors go away, so catching bogus data in QA 
> is the best plan.
> 
> So, any assistance in improving the QA coverage of any metrics to improve the 
> validation would be most appreciated.



-Will

<Prev in Thread] Current Thread [Next in Thread>