pcp
[Top] [All Lists]

Re: [pcp] Verification and validation of performance metric values

To: William Cohen <wcohen@xxxxxxxxxx>, PCP <pcp@xxxxxxxxxxx>
Subject: Re: [pcp] Verification and validation of performance metric values
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 10 Jun 2016 10:31:16 +1000
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <cdb0d65c-8e7b-bd92-aa0e-3e4def8b8dec@xxxxxxxxxx>
References: <cdb0d65c-8e7b-bd92-aa0e-3e4def8b8dec@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0
On 10/06/16 00:24, William Cohen wrote:
Hi,

.... When I started using some of of the
performance metric provided by PCP I had the same question.  How, are
the PCP metrics checked to see if they are accurate/reasonable?  For
some of the PAPI (http://icl.cs.utk.edu/papi/) testsuite there are
cross checks to see that values agree with the expected values.

I don't think there is much we can do on the fly, but the PCP QA testsuite contains many examples where we try to validate the values, either:
(a) by comparing with an independent tool, e.g. sar or netstat, or
(b) by generating synthetic loads and comparing the reported results against the expected.

But the coverage is weak. Not many independent tools exist for (a) and experiments of the style of (b) are notoriously (and conceptually) hard to engineer in ways that make them statistically robust across the many different platforms where PCP QA is run.

For estimating network utilization using PCP the
network.interface.baudrate and network.interface.speed metrics should
provide some indication of the interface speed.  However, looking at
the numbers produced by pmval below they don't seem to be reasonable
for 1Gbps ethernet connection of em1.

Agreed.

These sort of errors arise from one or more of the following factors:

1. the channels we use to get the raw data are typically not versioned and outside any ABI guarantees, so even if the data was once correct, it may no longer be so

2. algorithmic errors in the PMDA, e.g. picking the wrong field from a struct or parsing ascii incorrectly

3. metadata errors, e.g. the numbers are really bytes, but the PMDA reports them as Mbytes

4. arithmetic overflow, especially when converting from the raw units to the PCP units to match the PCP metadata

I don't think we can make these factors go away, so catching bogus data in QA is the best plan.

So, any assistance in improving the QA coverage of any metrics to improve the validation would be most appreciated.

<Prev in Thread] Current Thread [Next in Thread>