----- "Martin Hicks" <mort@xxxxxxxx> wrote:
> On Wed, Sep 02, 2009 at 09:53:51AM +1000, Nathan Scott wrote:
> > ----- "Martin Hicks" <mort@xxxxxxx> wrote:
> > > Tests 166 and 176 are showing one less sample than the expected
> > > output with pcp 2.9.0. Not sure what's going on here.
> > >
> > > Is this just a bug that's been fixed? If I pmdumptext the original
> > > archive I see 54 samples, but the first one is a "?". Maybe pmval
> > > is now reporting how many intervals (samples-1) ?
> >
> > I can't think of any changes that could have caused this, don't think
> > there have been any pmval changes, nor libpcp fetch related changes.
> > When was the last successful run of this test for you? I'll try to
> > get to doing a full QA run today/tomorrow, see if it happens here too.
>
> This is the first time I've tried to use the QA stuff. I'll see if the
> older PCP that we're still shipping has the bug. (2.7.8 right around
> the 2.8.0 release time. March-ish)
Hoo-boy, this was a good one. AFAICT, this is a day-1 pmval bug. Ken
& Martin, could you guys review the attached patch carefully?
What I've found, is the problem either exists/doesn't exist depending on
several factors:
- running on a 32 bit machine, Debian unstable, the problem never occurs
no matter how pmval is built.
- running on a 64 bit machine, Debian unstable, the problem only happens
with the packaged binaries. The key factor appears to be the introduction
of -O2, which looks like it reliably changes certain values on the stack,
that would otherwise be zero (used uninitialised, in either case).
There's two problems here, I believe:
- in pmval.c the getargs() call initialises the value of "smpls" (number
of samples to take) based on cntxt->desc.sem. However, at that point,
we have not established a context yet, and so that field is uninitialised.
- this test doesn't match the comment above it, and doesn't make sense:
/* counters require 2 samples to produce reported sample */
if (*smpls > 0 && cntxt->desc.sem != PM_SEM_COUNTER)
that would seem to me that it should be "cntxt->desc.sem == PM_SEM_COUNTER"
should it not?
Anyway, attached patch fixes test case 125, which is a simple test case for
this. Some of the other QA tests now fail, like 261 - the sample.drift bit
fails though, which is an instantaneous metric, showing we used to go down
the wrong pmval code path in that case.
cheers.
--
Nathan
patch
Description: Binary data
|