On 06/09/2016 08:31 PM, Ken McDonell wrote:
> On 10/06/16 00:24, William Cohen wrote:
>> Hi,
>>
>> .... When I started using some of of the
>> performance metric provided by PCP I had the same question. How, are
>> the PCP metrics checked to see if they are accurate/reasonable? For
>> some of the PAPI (http://icl.cs.utk.edu/papi/) testsuite there are
>> cross checks to see that values agree with the expected values.
>
> I don't think there is much we can do on the fly, but the PCP QA testsuite
> contains many examples where we try to validate the values, either:
> (a) by comparing with an independent tool, e.g. sar or netstat, or
> (b) by generating synthetic loads and comparing the reported results against
> the expected.
>
> But the coverage is weak. Not many independent tools exist for (a) and
> experiments of the style of (b) are notoriously (and conceptually) hard to
> engineer in ways that make them statistically robust across the many
> different platforms where PCP QA is run.
>
Yes, it is difficult to validate every metric. As you mentioned maybe as
separate testing have some benchmark/stress tests be monitored by pcp to see
whether the values provided agree with the expected behavior. It still might
be hard to validate some metrics for unusual conditions (for example counts of
various errors). Plus, the validation tests on one hardware/software
environment doesn't ensure it it works on others.
>
>> For estimating network utilization using PCP the
>> network.interface.baudrate and network.interface.speed metrics should
>> provide some indication of the interface speed. However, looking at
>> the numbers produced by pmval below they don't seem to be reasonable
>> for 1Gbps ethernet connection of em1.
>
> Agreed.
>
That is a nice list below. I will keep it in mind.
> These sort of errors arise from one or more of the following factors:
>
> 1. the channels we use to get the raw data are typically not versioned and
> outside any ABI guarantees, so even if the data was once correct, it may no
> longer be so
>
> 2. algorithmic errors in the PMDA, e.g. picking the wrong field from a struct
> or parsing ascii incorrectly
>
> 3. metadata errors, e.g. the numbers are really bytes, but the PMDA reports
> them as Mbytes
>
> 4. arithmetic overflow, especially when converting from the raw units to the
> PCP units to match the PCP metadata
Searching through the git logs show a number of commits to fix arithmetic
overflow issues.
>
> I don't think we can make these factors go away, so catching bogus data in QA
> is the best plan.
>
> So, any assistance in improving the QA coverage of any metrics to improve the
> validation would be most appreciated.
-Will
|