On 28/08/2010 1:39 AM, Frank Ch. Eigler wrote:
Hi -
We're investigating to what extent the PCP suite may be suitable for
more general low-level event tracing. Just from docs / source gazing
(so please excuse my terminology errors), a few challenges would seem
to be:
G'day Frank and others.
Apologies for the length of this reply, but there are a number of
non-trivial issues at play here.
Nathan has already answered some of your questions. I'd like to
start by providing some historical and design center context. From
the outset PCP was not designed for event-tracing, but PCP was
designed for a specific class of performance monitoring and
management scenarios.
The table below outlines some of the differences ... these help to
explain why PCP is a priori not necessarily suitable for
event tracing. This does not mean PCP could not evolve to support
event-tracing in the ways Nathan has suggested, we just need to
understand that the needs are different and make sure we do not end
up morphing PCP into something that no longer works for the original
design center and may not work all that well for event tracing.
Feature
|
PCP Design Center
|
Event Tracing
|
Locality of data processing
|
Monitored system is typically not the same
system that the collection and/or analysis is performed on.
|
Data collection happens on the system being
monitored, analysis may happen later on another system.
|
Real time analysis
|
Central to the design requirements.
|
Often not required, other than edge-triggers
to start and stop collection.
|
Retrospective analysis
|
Central to the design requirements.
|
Central to the design requirements.
|
Time scales
|
We are typically concerned with large and
complex systems where average levels of activity over
periods of the order of tens of seconds are representative.
|
Short-term and transients are often
important, and inter-arrival time for events may be on the
order of milliseconds.
|
Data rates
|
Moderate. Monitoring is often long-term,
requiring broad and shallow data collection, with a small
number of narrow and deep collections aligned to known or
suspected problem areas.
|
Very high. Monitoring is most often narrow,
deep and short-lived.
|
Data spread
|
Very broad ... interesting data may come from
a number of places, e.g. hardware instrumentation, operating
system stats, service layers and libraries, applications and
distributed applications.
|
Very narrow ... one source and one host.
|
Data semantics
|
A very broad range, but the most common are
activity levels and event counters (with little or
no event parameter information)
|
Very specific, being the record of an event
and its parameters with a high resolution time stamp.
|
Data source extensibility
|
Critical.
|
Rare.
|
So with this backgrtound, let's look at Frank's specific questions.
* poll-based data gathering
It seems as though PMDAs are used exclusively in 'polling' mode,
meaning that underlying system statistics are periodically queried
and summary results stored. In our context, it would be useful if
PMDAs could push event data into the stream as they occur - perhaps
hundreds of times a second.
Yep, this would be a big change. There is not really a data stream
in PCP ... there is a source of performance metrics (a host or an
archive) and clients connect to that source and pull data at a
sample interval defined by the client.
At the host source, the co-ordinating daemon (pmcd) maintains no
cache nor stream of recent data ... a client asks for a specific
subset of the available information, this is instantiated and
returned to the client. There is no requirement for the subsets of
the requested information to be the same for consecutive requests
from a single client, and pmcd is receiving requests from a number
of clients that are handled completely independently.
As Nathan has suggested, if event traces are intended for
retrospective analysis (as opposed to event counters being suited
for either real time or retrospective analysis), then there is an
alternative approach, namely to create a PCP archive directly from a
source of data without involving pmcd or a pmda or pmlogger. We've
recently reworked the "pmimport" services to expose better APIs to
support just this style of use ... see LOGIMPORT(3) and sar2pcp(1)
for an example. I think this approach is possibly a better semantic
match between PCP and a stream of event records.
* relatively static pmns
It would be desirable if PMNS metrics were parametrizable with
strings/numbers, so that a PMDA engine could use it to synthesize
metrics on demand from a large space. (Example: have a
"kernel-probe" PMNS namespace, parametrized by function name, which
returns statistics of that function's execution. There are too many
kernel functions, and they vary from host to host enough, so that
enumerating them as a static PMNS table would be impractical.)
This is not so much of a problem. We've relaxed the PMNS services
to allow PMDAs to dynamically define new metrics on the fly. And as
Nathan has pointed out, the instance domain provides a dynamic
dimension for the available metric values that may also be useful,
e.g. this is how all of procfs is instantiated.
* scalar payloads
It seems as though each metric value provided by PMDAs is
necessarily a scalar value, as opposed to some structured type. For
event tracing, it would be useful to have tuples. Front-ends could
choose the interesting fields to render. (Example: tracing NFS
calls, complete with decoded payloads.)
We've tried really hard to make the PCP metadata rich enough (in the
data model and the API services) to enable clients to be
data-driven, based on what performance data happens to be available
today from a host or archive. This is why the data aggregate (or
blob) data type that Nathan has mentioned is rarely used (although
it is fully supported).
If there was a tight coupling between the source of the event data
and the client that interprets the event data, then the PCP data
aggregate could be used to provide a transport and storage
encapsulation that is consistent with the PCP APIs and protocols.
Of course, such a client would be exposed to all of the word-size,
endian and version issues that plague other binary formats for
performance data, e.g. the sar variants based on AT&T UNIX.
* filtering
It would be desirable for the apps fetching metric values to
communicate a filtering predicate associated with them, perhaps as
per pmie rules. This is to allow the data server daemon to reduce
the amount of data sent to the gui frontends. Perhaps also it could
use them to inform PMDAs as a form of subscription, and in turn they
could reduce the amount of data flow.
PMDAs are free to do as much or as little work as they choose. Some
are totally demand-driven, instantiating only the information they
are asked for when they are asked for it. Others use cacheing
strategies to refresh some or all of the information at each
request. Others maintain timestamped caches and only refresh when
the information is deemed "stale". Another class run a refresh
thread that is contunally updating a data cache, and requests are
serviced from the cache.
The PMDA behaviour can be modal ... based on client requests, or
more interestingly as Nathan has suggested using the pmStore(3) API
to allow one or more clients to enable/disable collection (think
about expensive, detailed information that you don't want to collect
unless some client really wants it). The values passed into
the PMDA via pmStore(3) are associated with PCP metrics, so they
have the full richness of the PCP data model to encode switches,
text strings, blobs, etc.
* no web-based frontends
In our usage, it would be desirable to have some mini pcp-gui that
is based on web technologies rather than QT.
There are several examples of web interfaces driven by PCP data ...
but each of these has been developed as a proprietary and specific
application and hence is not included in the PCP open source
distribution. The PCP APIs provide all the services needed to build
something like this.
To what extent could/should PCP be used/extended to cover this space?
I think this suggestion is worth further discussion, but we probably
need some more concrete examples of the sorts of event trace data
that is being considered, and the most likely use cases and patterns
for that data.
Cheers, Ken.
|
|