Hi all,
while tinkering with pcp2pdf one of my goals is to be able
to render a fairly big archive which also includes per process metrics.
On one of my servers such a daily archive file is around 800MB or so.
Currently my archive parsing looks more or less like this ~250 liner:
https://gist.github.com/mbaldessari/30dc7ae2fe46d9b804f2
I basically use a function that returns a big dictionary in the following
form:
{ metric1: {'indom1': [(ts0, ts1, .., tsN), (v0, v1, .., vN)],
....
'indomN': [(ts0, ts1, .., tsN), (v0, v1, .., vN)]},
metric2: {'indom1': [(ts0, ts1, .., tsX), (v0, v1, .., vX)],
....
'indomN': [(ts0, ts1, .., tsX), (v0, v1, .., vX)]}...}
Then I use this dictionary and create an image with matplotlib and
I parallelize this on all the available CPUs.
Now when profiling the above script against a fairly large archive (~800MB),
I get the following:
"""
Parsing files: 20140908.0 - 764.300262451 MB
Before parsing: usertime=0.066546 systime=0.011918 mem=12.34375 MB
After parsing: usertime=1161.825736 systime=2.364544 mem=1792.53125 MB
Profiling of parse()
725026682 function calls in 1169.003 seconds
Ordered by: cumulative time
List reduced from 72 to 15 due to restriction <15>
ncalls tottime percall cumtime percall filename:lineno(function)
1 124.550 124.550 1169.003 1169.003 ./fetch.py:140(parse)
29028435 111.320 0.000 693.559 0.000 ./fetch.py:113(_extract_value)
57876970 134.777 0.000 539.856 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:379(get_vlist)
146339015 367.384 0.000 367.384 0.000
/usr/lib64/python2.7/ctypes/__init__.py:496(cast)
28848535 34.114 0.000 312.698 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:384(get_inst)
57876970 89.000 0.000 254.070 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:374(get_vset)
29028435 168.361 0.000 179.190 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1724(pmExtractValue)
29028435 61.598 0.000 141.777 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:364(get_valfmt)
233809633 33.780 0.000 33.780 0.000 {_ctypes.POINTER}
58013698 9.081 0.000 9.081 0.000 {method 'append' of 'list'
objects}
519120 4.551 0.000 8.556 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1243(pmLookupDesc)
36257575 8.167 0.000 8.167 0.000 {_ctypes.byref}
1442 6.801 0.005 6.803 0.005
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1578(pmFetch)
518760 4.574 0.000 4.884 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:1172(pmNameID)
518760 1.280 0.000 2.924 0.000
/usr/lib64/python2.7/site-packages/pcp/pmapi.py:369(get_numval)
real 19m31.860s
user 19m24.391s
sys 0m2.566s
"""
While 20 minutes to parse such a big archive might be relatively ok, I
was wondering what options I have to improve this. The ones I can
currently think of are:
1) Split the time interval parsing over multiple CPUs. I can divide the
archive in subintervals (one per cpu) and have each CPU do its own
subinterval parsing and then stitch everything together at the end.
This is the approach I currently use to create the graph images that go
in the pdf (as matplotlib+reportlab aren't the fastest thing on the
planet)
2) Implement a function in the C python bindings which returns a
python dictionary as described above. This would save me all the
ctypes/__init__ costs and probably I would shave some time off as there
would be less python->C function calls. Maybe we can find a generic
enough API for this to be usable by other clients?
3) See if I can use Cython tricks to speed up things
4) Anything else I have not thought of?
Grab your cluebats and feel free to point me in the right direction ;)
Thanks,
Michele
NB: I've tried using pmFetchArchive() but a) there was no substantial
difference and b) pmFetchArchive() does not allow interpolation so
users could not specify a custom time interval
--
Michele Baldessari <michele@xxxxxxxxxx>
C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D
|