pcp
[Top] [All Lists]

Re: [pcp] QA: pmval showing one less sample

To: Martin Hicks <mort@xxxxxxxx>, kenj <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: [pcp] QA: pmval showing one less sample
From: Nathan Scott <nscott@xxxxxxxxxx>
Date: Thu, 3 Sep 2009 11:03:08 +1000 (EST)
Cc: pcp@xxxxxxxxxxx
In-reply-to: <931067231.228961251939716234.JavaMail.root@xxxxxxxxxxxxxxxxxx>
----- "Martin Hicks" <mort@xxxxxxxx> wrote:

> On Wed, Sep 02, 2009 at 09:53:51AM +1000, Nathan Scott wrote:
> > ----- "Martin Hicks" <mort@xxxxxxx> wrote:
> > > Tests 166 and 176 are showing one less sample than the expected
> > > output with pcp 2.9.0.  Not sure what's going on here.
> > > 
> > > Is this just a bug that's been fixed?  If I pmdumptext the original
> > > archive I see 54 samples, but the first one is a "?".  Maybe pmval
> > > is now reporting how many intervals (samples-1) ?
> > 
> > I can't think of any changes that could have caused this, don't think
> > there have been any pmval changes, nor libpcp fetch related changes.
> > When was the last successful run of this test for you?  I'll try to
> > get to doing a full QA run today/tomorrow, see if it happens here too.
> 
> This is the first time I've tried to use the QA stuff.  I'll see if the
> older PCP that we're still shipping has the bug.  (2.7.8 right around
> the 2.8.0 release time.  March-ish)

Hoo-boy, this was a good one.  AFAICT, this is a day-1 pmval bug.  Ken
& Martin, could you guys review the attached patch carefully?

What I've found, is the problem either exists/doesn't exist depending on
several factors:
- running on a 32 bit machine, Debian unstable, the problem never occurs
no matter how pmval is built.
- running on a 64 bit machine, Debian unstable, the problem only happens
with the packaged binaries.  The key factor appears to be the introduction
of -O2, which looks like it reliably changes certain values on the stack,
that would otherwise be zero (used uninitialised, in either case).

There's two problems here, I believe:
- in pmval.c the getargs() call initialises the value of "smpls" (number
of samples to take) based on cntxt->desc.sem.  However, at that point,
we have not established a context yet, and so that field is uninitialised.
- this test doesn't match the comment above it, and doesn't make sense:

        /* counters require 2 samples to produce reported sample */
        if (*smpls > 0 && cntxt->desc.sem != PM_SEM_COUNTER)

that would seem to me that it should be "cntxt->desc.sem == PM_SEM_COUNTER"
should it not?

Anyway, attached patch fixes test case 125, which is a simple test case for
this.  Some of the other QA tests now fail, like 261 - the sample.drift bit
fails though, which is an instantaneous metric, showing we used to go down
the wrong pmval code path in that case.

cheers.

-- 
Nathan

Attachment: patch
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>