pcp
[Top] [All Lists]

Re: [pcp] QA: pmval showing one less sample

To: Nathan Scott <nscott@xxxxxxxxxx>
Subject: Re: [pcp] QA: pmval showing one less sample
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat, 05 Sep 2009 07:02:59 +1000
Cc: Martin Hicks <mort@xxxxxxxx>, pcp@xxxxxxxxxxx
In-reply-to: <256314019.229001251939788289.JavaMail.root@xxxxxxxxxxxxxxxxxx>
References: <256314019.229001251939788289.JavaMail.root@xxxxxxxxxxxxxxxxxx>
Reply-to: kenj@xxxxxxxxxxxxxxxx
On closer inspection, Nathan's suggested patch is not correct.

Specifically this chunk ..

@@ -1129,7 +1132,7 @@ getargs(int               argc,           /* in -
command line argument count */
        /* if end is before start, no samples thanks */
        if (*smpls < 0) *smpls = 0;
        /* counters require 2 samples to produce reported sample */
-       if (*smpls > 0 && cntxt->desc.sem != PM_SEM_COUNTER)
+       if (*smpls > 0 && cntxt->desc.sem == PM_SEM_COUNTER)
            (*smpls)++;

And so the QA changes are not correct either, for example qa/080.out
_is_ correct and should not be changed.

This is another case where the comments are misleading (I rest my
case) ... the increment is needed for all cases _except_ PM_SEM_COUNTER
metrics.

My alternative patch is probably closer to correct.

On Thu, 2009-09-03 at 11:03 +1000, Nathan Scott wrote:
> ----- "Martin Hicks" <mort@xxxxxxxx> wrote:
> 
> > On Wed, Sep 02, 2009 at 09:53:51AM +1000, Nathan Scott wrote:
> > > ----- "Martin Hicks" <mort@xxxxxxx> wrote:
> > > > Tests 166 and 176 are showing one less sample than the expected
> > > > output with pcp 2.9.0.  Not sure what's going on here.
> > > > 
> > > > Is this just a bug that's been fixed?  If I pmdumptext the original
> > > > archive I see 54 samples, but the first one is a "?".  Maybe pmval
> > > > is now reporting how many intervals (samples-1) ?
> > > 
> > > I can't think of any changes that could have caused this, don't think
> > > there have been any pmval changes, nor libpcp fetch related changes.
> > > When was the last successful run of this test for you?  I'll try to
> > > get to doing a full QA run today/tomorrow, see if it happens here too.
> > 
> > This is the first time I've tried to use the QA stuff.  I'll see if the
> > older PCP that we're still shipping has the bug.  (2.7.8 right around
> > the 2.8.0 release time.  March-ish)
> 
> Hoo-boy, this was a good one.  AFAICT, this is a day-1 pmval bug.  Ken
> & Martin, could you guys review the attached patch carefully?
> 
> What I've found, is the problem either exists/doesn't exist depending on
> several factors:
> - running on a 32 bit machine, Debian unstable, the problem never occurs
> no matter how pmval is built.
> - running on a 64 bit machine, Debian unstable, the problem only happens
> with the packaged binaries.  The key factor appears to be the introduction
> of -O2, which looks like it reliably changes certain values on the stack,
> that would otherwise be zero (used uninitialised, in either case).
> 
> There's two problems here, I believe:
> - in pmval.c the getargs() call initialises the value of "smpls" (number
> of samples to take) based on cntxt->desc.sem.  However, at that point,
> we have not established a context yet, and so that field is uninitialised.
> - this test doesn't match the comment above it, and doesn't make sense:
> 
>         /* counters require 2 samples to produce reported sample */
>         if (*smpls > 0 && cntxt->desc.sem != PM_SEM_COUNTER)
> 
> that would seem to me that it should be "cntxt->desc.sem == PM_SEM_COUNTER"
> should it not?
> 
> Anyway, attached patch fixes test case 125, which is a simple test case for
> this.  Some of the other QA tests now fail, like 261 - the sample.drift bit
> fails though, which is an instantaneous metric, showing we used to go down
> the wrong pmval code path in that case.
> 
> cheers.
> 

<Prev in Thread] Current Thread [Next in Thread>