pcp
[Top] [All Lists]

Performance of parsing an archive in python

To: pcp@xxxxxxxxxxx
Subject: Performance of parsing an archive in python
From: Michele Baldessari <michele@xxxxxxxxxx>
Date: Wed, 29 Oct 2014 21:06:43 +0100
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/simple; d=acksyn.org; h= user-agent:content-disposition:content-type:content-type :mime-version:message-id:subject:subject:from:from:date:date :received:received; s=2010; t=1414613205; bh=uRHof0kWlsbe2WZCmBF 7J8CbzU7ObnyLOSc/m3aAWqw=; b=eBBLj0sOBzfNdiuSOYvM6xYbtYRJjJUP0rf FeJXe8XrUlownZ1Or1GP4iyBKT8Yj6UdfUAFoHLam1v6Qc1UOD3VmSgP5yAQtnt1 ANrouDgODFHLBsnpYi7kGv3fIXgpEw6Z8+hHAm1pwgc6jrwAP+LwD85BQ+MpLFai gqq09l+g=
User-agent: Mutt/1.5.21 (2012-12-30)
Hi all,

while tinkering with pcp2pdf one of my goals is to be able
to render a fairly big archive which also includes per process metrics.
On one of my servers such a daily archive file is around 800MB or so.

Currently my archive parsing looks more or less like this ~250 liner:
https://gist.github.com/mbaldessari/30dc7ae2fe46d9b804f2

I basically use a function that returns a big dictionary in the following
form:
{ metric1: {'indom1': [(ts0, ts1, .., tsN), (v0, v1, .., vN)],
           ....
            'indomN': [(ts0, ts1, .., tsN), (v0, v1, .., vN)]},
  metric2: {'indom1': [(ts0, ts1, .., tsX), (v0, v1, .., vX)],
           ....
           'indomN': [(ts0, ts1, .., tsX), (v0, v1, .., vX)]}...}

Then I use this dictionary and create an image with matplotlib and
I parallelize this on all the available CPUs.

Now when profiling the above script against a fairly large archive (~800MB),
I get the following:
"""
Parsing files: 20140908.0 - 764.300262451 MB
Before parsing: usertime=0.066546 systime=0.011918 mem=12.34375 MB
After parsing: usertime=1161.825736 systime=2.364544 mem=1792.53125 MB

Profiling of parse()
         725026682 function calls in 1169.003 seconds

   Ordered by: cumulative time
   List reduced from 72 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1  124.550  124.550 1169.003 1169.003 ./fetch.py:140(parse)
 29028435  111.320    0.000  693.559    0.000 ./fetch.py:113(_extract_value)
 57876970  134.777    0.000  539.856    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:379(get_vlist)
146339015  367.384    0.000  367.384    0.000 
/usr/lib64/python2.7/ctypes/__init__.py:496(cast)
 28848535   34.114    0.000  312.698    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:384(get_inst)
 57876970   89.000    0.000  254.070    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:374(get_vset)
 29028435  168.361    0.000  179.190    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1724(pmExtractValue)
 29028435   61.598    0.000  141.777    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:364(get_valfmt)
233809633   33.780    0.000   33.780    0.000 {_ctypes.POINTER}
 58013698    9.081    0.000    9.081    0.000 {method 'append' of 'list' 
objects}
   519120    4.551    0.000    8.556    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1243(pmLookupDesc)
 36257575    8.167    0.000    8.167    0.000 {_ctypes.byref}
     1442    6.801    0.005    6.803    0.005 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1578(pmFetch)
   518760    4.574    0.000    4.884    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1172(pmNameID)
   518760    1.280    0.000    2.924    0.000 
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:369(get_numval)

real    19m31.860s
user    19m24.391s
sys     0m2.566s
"""

While 20 minutes to parse such a big archive might be relatively ok, I
was wondering what options I have to improve this. The ones I can
currently think of are:

1) Split the time interval parsing over multiple CPUs. I can divide the
archive in subintervals (one per cpu) and have each CPU do its own
subinterval parsing and then stitch everything together at the end.
This is the approach I currently use to create the graph images that go
in the pdf (as matplotlib+reportlab aren't the fastest thing on the
planet)

2) Implement a function in the C python bindings which returns a
python dictionary as described above.  This would save me all the
ctypes/__init__ costs and probably I would shave some time off as there
would be less python->C function calls. Maybe we can find a generic
enough API for this to be usable by other clients?

3) See if I can use Cython tricks to speed up things 

4) Anything else I have not thought of?

Grab your cluebats and feel free to point me in the right direction ;)

Thanks,
Michele

NB: I've tried using pmFetchArchive() but a) there was no substantial
difference and b) pmFetchArchive() does not allow interpolation so
users could not specify a custom time interval
-- 
Michele Baldessari            <michele@xxxxxxxxxx>
C2A5 9DA3 9961 4FFB E01B  D0BC DDD4 DCCB 7515 5C6D

<Prev in Thread] Current Thread [Next in Thread>