pcp
[Top] [All Lists]

Re: [pcp] python pmExtractValue segfault

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] python pmExtractValue segfault
From: Michele Baldessari <michele@xxxxxxxxxx>
Date: Wed, 28 May 2014 15:44:33 +0100
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/simple; d=acksyn.org; h= user-agent:in-reply-to:content-disposition:content-type :content-type:mime-version:references:message-id:subject:subject :from:from:date:date:received:received; s=2010; t=1401288273; bh=GP8gypB3imphFc2u1fdkYGoaeYHKIIYJnB7rK/8cNgM=; b=j2X5Z1jk9IgN F1kGVCMTX8LLVfID3gDzEtZJvAMcv7Drkon/3Jvo+0JBhPfUB5Bli3zC0STvJYEd OZiDthPhLpHXeLg5ZRY3tdjOsAuilExCDpfOSYnfT7Ld8/P1gC0PAnmYr6zuo8Cq BnSncPlAZ1dXKBt8qnnvAjbYxQWY+3w=
In-reply-to: <1135934547.16110444.1401239312938.JavaMail.zimbra@xxxxxxxxxx>
References: <20140527223044.GC4384@xxxxxxxxxxxxxxx> <1135934547.16110444.1401239312938.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2012-12-30)
Hi Nathan,

On Tue, May 27, 2014 at 09:08:32PM -0400, Nathan Scott wrote:
> > wondering if there is a better way in general to achieve my goal here
> > (retrieve all values/indoms for all metrics for all timestamps).
> 
> This class of problem sounds suited to solving using the same model
> that pmlogsummary uses.  It performs sequential result scanning via
> pmFetchArchive(3), with a single pmSetMode at the start to set the
> initial archive offset.
> 
> As it passes through the pmResult structures, it constructs a data
> structure alot like the one you describe above (written in C though).
> It uses a hash of all PMIDs (key == PMID, value == "struct aveData")
> wherein each PMID hash value contains a list of all instances that
> grows dynamically as the archive is scanned and new instances found.
> 
> Then at the end of scanning the archive, the now in-memory PMID hash
> is walked, final calculations are done, and a report printed out.  In
> the end, it doesn't use pmGetInDom[Archive] at all, but instead uses
> pmNameInDom(3).

thanks for the hints. I've now switched to using pmFetchArchive (how did I
not notice this function before is beyond me) and pmNameInDomArchive. So now
the pseudo code is something like the following:
while true:
  result = ctx.pmFetchArchive()
  for i in range(result.contents.numpmid):                           
    pmid = result.contents.get_pmid(i)                             
    desc = context.pmLookupDesc(pmid)                              
    count = result.contents.get_numval(i)                          
    if count <= 1: # No indoms are present                         
      ...extract value...
    else:
      for j in range(count):                                         
        inst = result.contents.get_inst(i, j)                      
        indom_name = context.pmNameInDomArchive(desc, inst)
        ...extract data..

Since pmNameInDomArchive was quite high up in my profiling
I cached it in a dictionary so that indom_cache[(i, j)] = indom_name.
This way I only look it up when the metric appears the first time, and
I shave off 40% of the time needed to parse this (the rest is dominated
by python casts and by pmExtractValue calls, for which there are less
obvious ways to improve). Is this a safe thing to do? Am I guaranteed
that the mapping (i, j)->indom_name will stay the same in an archive?

Somehow I assume that is not the case (pmcd restart with new PMDA, etc.),
but maybe I'll get lucky ;)

cheers,
Michele
-- 
Michele Baldessari            <michele@xxxxxxxxxx>
C2A5 9DA3 9961 4FFB E01B  D0BC DDD4 DCCB 7515 5C6D

<Prev in Thread] Current Thread [Next in Thread>