On closer inspection, Nathan's suggested patch is not correct.
Specifically this chunk ..
@@ -1129,7 +1132,7 @@ getargs(int argc, /* in -
command line argument count */
/* if end is before start, no samples thanks */
if (*smpls < 0) *smpls = 0;
/* counters require 2 samples to produce reported sample */
- if (*smpls > 0 && cntxt->desc.sem != PM_SEM_COUNTER)
+ if (*smpls > 0 && cntxt->desc.sem == PM_SEM_COUNTER)
(*smpls)++;
And so the QA changes are not correct either, for example qa/080.out
_is_ correct and should not be changed.
This is another case where the comments are misleading (I rest my
case) ... the increment is needed for all cases _except_ PM_SEM_COUNTER
metrics.
My alternative patch is probably closer to correct.
On Thu, 2009-09-03 at 11:03 +1000, Nathan Scott wrote:
> ----- "Martin Hicks" <mort@xxxxxxxx> wrote:
>
> > On Wed, Sep 02, 2009 at 09:53:51AM +1000, Nathan Scott wrote:
> > > ----- "Martin Hicks" <mort@xxxxxxx> wrote:
> > > > Tests 166 and 176 are showing one less sample than the expected
> > > > output with pcp 2.9.0. Not sure what's going on here.
> > > >
> > > > Is this just a bug that's been fixed? If I pmdumptext the original
> > > > archive I see 54 samples, but the first one is a "?". Maybe pmval
> > > > is now reporting how many intervals (samples-1) ?
> > >
> > > I can't think of any changes that could have caused this, don't think
> > > there have been any pmval changes, nor libpcp fetch related changes.
> > > When was the last successful run of this test for you? I'll try to
> > > get to doing a full QA run today/tomorrow, see if it happens here too.
> >
> > This is the first time I've tried to use the QA stuff. I'll see if the
> > older PCP that we're still shipping has the bug. (2.7.8 right around
> > the 2.8.0 release time. March-ish)
>
> Hoo-boy, this was a good one. AFAICT, this is a day-1 pmval bug. Ken
> & Martin, could you guys review the attached patch carefully?
>
> What I've found, is the problem either exists/doesn't exist depending on
> several factors:
> - running on a 32 bit machine, Debian unstable, the problem never occurs
> no matter how pmval is built.
> - running on a 64 bit machine, Debian unstable, the problem only happens
> with the packaged binaries. The key factor appears to be the introduction
> of -O2, which looks like it reliably changes certain values on the stack,
> that would otherwise be zero (used uninitialised, in either case).
>
> There's two problems here, I believe:
> - in pmval.c the getargs() call initialises the value of "smpls" (number
> of samples to take) based on cntxt->desc.sem. However, at that point,
> we have not established a context yet, and so that field is uninitialised.
> - this test doesn't match the comment above it, and doesn't make sense:
>
> /* counters require 2 samples to produce reported sample */
> if (*smpls > 0 && cntxt->desc.sem != PM_SEM_COUNTER)
>
> that would seem to me that it should be "cntxt->desc.sem == PM_SEM_COUNTER"
> should it not?
>
> Anyway, attached patch fixes test case 125, which is a simple test case for
> this. Some of the other QA tests now fail, like 261 - the sample.drift bit
> fails though, which is an instantaneous metric, showing we used to go down
> the wrong pmval code path in that case.
>
> cheers.
>
|