pcp
[Top] [All Lists]

Re: [pcp] collectl vs pmcollectl and qa/709

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: [pcp] collectl vs pmcollectl and qa/709
From: Lukas Berk <lberk@xxxxxxxxxx>
Date: Wed, 15 Jul 2015 09:55:13 -0400
Cc: PCP <pcp@xxxxxxxxxxx>, mamarc@xxxxxxxxxx, mgoodwin@xxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <559AE072.5050901@xxxxxxxxxxxxxxxx> (Ken McDonell's message of "Tue, 07 Jul 2015 06:09:22 +1000")
References: <559AE072.5050901@xxxxxxxxxxxxxxxx>
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
Hi,

Ken McDonell <kenj@xxxxxxxxxxxxxxxx> writes:
> I've been chasing almost universal failure of qa/709 and finally tracked it 
> down to output fields being overflowed and "numbers" running together, e.g.
>
> kenj@vm01:~$ pmcollectl -c 2 -i 0.1
> #<--------CPU--------><----------Disks-----------><----------Network---------->
> #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes KBIn  PktIn  KBOut  PktOut
>    2   0 95706115 357392013103731009 2223225 11872228 1036747864638 4579391 
> 630128 2717987
>    2   0 95706133 357392034103731009 2223225 11872228 1036747864638 4579393 
> 630129 2717989
>
> But more worryingly, the output from pmcollectl and collectl is not
> even close to the same ...

I've been trying to track this down for the past couple days.  Looks
like 8748740943e7e93b4cafc56dc7304250411d4961 is where this was
introduced.  The pmsubsys.py - Subsystem._timestamp value seems to be
getting reset to zero after adding the pmFreeResult line.  This causes
pmcollectl results to appear if it is the very first fetch (raw values),
instead of providing the diff with the previous values to print.  So
there's a memory corruption issue happening.  Unfortunately my python-fu
isn't good enough to know the tooling, or the approach to identify the
full effects that this patch had.

Marc, could you perhaps elaborate a little bit more on the memory leak
you were seeing?  How did you identify it?  Was there a testcase
submitted with the patch at all? (Mark, maybe you have a pointer to
this? I haven't see any in the tree or git logs).

I'd be happy to create an archived based testcase for this moving
forward (once we have the fix), to verify the values we're seeing out of
pmcollectl make sense.  However, until the time being would it make
senes to revert this patch until we have a fix?

Cheers,

Lukas

<Prev in Thread] Current Thread [Next in Thread>