Hi,
For a while we have been having very slow performance with
pmlogextract when processing archives with lots of "proc" information.
So a large number of changing instances. From profiling, >90% of time
was spent in pmGetInDomArchive.
From my analysis, this code path was hit when:
- the config listed metrics, but not instances to filter
- pmGetInDomArchive is called by gram.y -> dometric
- a list of all instances is generated
- metriclist.c -> searchmlist uses this list to compare against to see
if the instance should be passed through
Since the same archive that generated the list is being compared against
that list, the test will always pass. I assume this was done in order
to use the same code regardless of whether or not instance filtering was
desired.
Here is a proposed optimization to short-circuit this step:
https://github.com/ubccr/pcp/tree/pmlogextract
We noticed a speedup of 10x-100x depending on the archives processed.
Processing time went from minutes to seconds for many archives.
All QA in the pmlogextract group passes with this change.
Martins
commit 7ddf5dbbd8f1ea9eb682a49d991a3b274bad2d95
Author: Martins Innus <minnus@xxxxxxxxxxx>
Date: Wed Sep 23 19:30:33 2015 +0000
pmlogextract optimization
If we want all instances, don't build a list of all instances to
compare against. Just pass through all instances.
src/pmlogextract/gram.y | 23 ++++-------------------
src/pmlogextract/metriclist.c | 21 +++++++++++++--------
2 files changed, 17 insertions(+), 27 deletions(-)
|