Hi Michele,
----- Original Message -----
> [...]
> real 19m31.860s
> user 19m24.391s
> sys 0m2.566s
> """
OOC, can you time a pmlogsummary run on this archive?
> While 20 minutes to parse such a big archive might be relatively ok, I
> was wondering what options I have to improve this. The ones I can
> currently think of are:
>
> 1) Split the time interval parsing over multiple CPUs. I can divide the
> archive in subintervals (one per cpu) and have each CPU do its own
> subinterval parsing and then stitch everything together at the end.
> This is the approach I currently use to create the graph images that go
> in the pdf (as matplotlib+reportlab aren't the fastest thing on the
> planet)
Should definitely help, since it appears to be CPU bound currently.
> 2) Implement a function in the C python bindings which returns a
> python dictionary as described above. This would save me all the
> ctypes/__init__ costs and probably I would shave some time off as there
> would be less python->C function calls. Maybe we can find a generic
> enough API for this to be usable by other clients?
Yep, sounds good.
> 3) See if I can use Cython tricks to speed up things
>
> 4) Anything else I have not thought of?
pmlogsummary uses that raw archive fetching interface we talked about
awhile back, which isn't always ideal for your needs - I'm interested
in seeing the time difference though, if you could run that locally?
cheers.
--
Nathan
|