pcp
[Top] [All Lists]

Use of PMIE to implement system analysis

To: "pcp@xxxxxxxxxxx" <pcp@xxxxxxxxxxx>
Subject: Use of PMIE to implement system analysis
From: William Cohen <wcohen@xxxxxxxxxx>
Date: Thu, 20 Aug 2015 12:15:01 -0400
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
Hi All,

I have been examining how to expand the toplev cpu performance
analysis approach described in Yasim's papers
(https://sites.google.com/site/analysismethods/yasin-pubs) and
implemented in Andi Kleen's toplev tool
(http://halobates.de/blog/p/262) to entire systems. The metrics that
PCP collects allow similar analysis.  However, it would be nicer to
automate this with PMIE.  I have read through the documentation about
PMIE and there seems to be some aspects of PMIE that doesn't make this
as easy as it could be.


Nesting of predicates

The toplev approach has trees where the roots are composed of very
broad causes and there are subtrees below are more specific causes
that contribute to the issue.  A subtree several levels down may have
a high values, but that subtree many levels down is only examined if
the nodes above it also have high values.  Thus, the levels above
predicate the lower levels.  PMIE seems to be oriented to doing a flat
analysis of the machine metric.  Maybe the macros could be used to
avoid replicating the conditional checks in the pmie code, but it
seems like the pmie could be doing a lot of work reevaluating
predicates.


Better parametrization of thresholds

In the PMIE documentation magic numbers are used (for example 2,000
context switches per second) as thresholds.  These numbers may be
emperically determined for the environment.  However, what happens
when the environment changes such as traditional spinning hard drives
are replaced with SSD drives that are capable of many more access per
second?  A PMIE script would be falsely warning about the high number
of IOPS on the SSD drives.  Or worse yet what happens in a
heterogeneous enviroment where some machines have spinning drives and
others have SSD drives?  Have multiple versions of the pmie scripts
for the different machines?


Analysis over aribitrary intervals

The analysis in PCP and PMIE is mainly time base.  It would be really
nice to have the analysis work on a units that might be more closely
tied to application code events.  For example allow analysis over the
lifetime of a cron job is running or between the start and end of some
phase of a program.


-Will

<Prev in Thread] Current Thread [Next in Thread>