pcp
[Top] [All Lists]

Re: [pcp] Performance of parsing an archive in python

To: Michele Baldessari <michele@xxxxxxxxxx>
Subject: Re: [pcp] Performance of parsing an archive in python
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Wed, 29 Oct 2014 18:52:51 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20141029200642.GA19804@xxxxxxxxxxxxxxx>
References: <20141029200642.GA19804@xxxxxxxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: dj3FQ50RiVrbCVuezOPA9/XBDOTEWw==
Thread-topic: Performance of parsing an archive in python
Hi Michele,

----- Original Message -----
> [...]
> real    19m31.860s
> user    19m24.391s
> sys     0m2.566s
> """

OOC, can you time a pmlogsummary run on this archive?

> While 20 minutes to parse such a big archive might be relatively ok, I
> was wondering what options I have to improve this. The ones I can
> currently think of are:
> 
> 1) Split the time interval parsing over multiple CPUs. I can divide the
> archive in subintervals (one per cpu) and have each CPU do its own
> subinterval parsing and then stitch everything together at the end.
> This is the approach I currently use to create the graph images that go
> in the pdf (as matplotlib+reportlab aren't the fastest thing on the
> planet)

Should definitely help, since it appears to be CPU bound currently.

> 2) Implement a function in the C python bindings which returns a
> python dictionary as described above.  This would save me all the
> ctypes/__init__ costs and probably I would shave some time off as there
> would be less python->C function calls. Maybe we can find a generic
> enough API for this to be usable by other clients?

Yep, sounds good.

> 3) See if I can use Cython tricks to speed up things
> 
> 4) Anything else I have not thought of?

pmlogsummary uses that raw archive fetching interface we talked about
awhile back, which isn't always ideal for your needs - I'm interested
in seeing the time difference though, if you could run that locally?

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>