Hi Nathan,
On Wed, Oct 29, 2014 at 06:52:51PM -0400, Nathan Scott wrote:
> ----- Original Message -----
> > [...]
> > real 19m31.860s
> > user 19m24.391s
> > sys 0m2.566s
> > """
>
> OOC, can you time a pmlogsummary run on this archive?
Sure:
time pmlogsummary 20141029.00.10 &> /dev/null
real 0m8.651s
user 0m2.913s
sys 0m0.229s
It's a python issue due to all the type conversions mostly.
> > While 20 minutes to parse such a big archive might be relatively ok, I
> > was wondering what options I have to improve this. The ones I can
> > currently think of are:
> >
> > 1) Split the time interval parsing over multiple CPUs. I can divide the
> > archive in subintervals (one per cpu) and have each CPU do its own
> > subinterval parsing and then stitch everything together at the end.
> > This is the approach I currently use to create the graph images that go
> > in the pdf (as matplotlib+reportlab aren't the fastest thing on the
> > planet)
>
> Should definitely help, since it appears to be CPU bound currently.
>
> > 2) Implement a function in the C python bindings which returns a
> > python dictionary as described above. This would save me all the
> > ctypes/__init__ costs and probably I would shave some time off as there
> > would be less python->C function calls. Maybe we can find a generic
> > enough API for this to be usable by other clients?
>
> Yep, sounds good.
>
> > 3) See if I can use Cython tricks to speed up things
> >
> > 4) Anything else I have not thought of?
>
> pmlogsummary uses that raw archive fetching interface we talked about
> awhile back, which isn't always ideal for your needs - I'm interested
> in seeing the time difference though, if you could run that locally?
You mean pmFetchArchive() I assume? I did not notice any speed
improvements when using that one from python (plus it does not support
INTERP).
I'll give 2) a try and then we can see if it is generic enough for other
python users.
Thanks,
Michele
--
Michele Baldessari <michele@xxxxxxxxxx>
C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D
|