pcp
[Top] [All Lists]

Re: [pcp] suitability of PCP for event tracing

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: [pcp] suitability of PCP for event tracing
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Mon, 30 Aug 2010 01:34:22 +1000
Cc: pcp@xxxxxxxxxxx, systemtap@xxxxxxxxxxxxxxxxxx
In-reply-to: <20100827153906.GD3185@xxxxxxxxxx>
References: <20100827153906.GD3185@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
On 28/08/2010 1:39 AM, Frank Ch. Eigler wrote:
Hi -

We're investigating to what extent the PCP suite may be suitable for
more general low-level event tracing.  Just from docs / source gazing
(so please excuse my terminology errors), a few challenges would seem
to be:

G'day Frank and others.

Apologies for the length of this reply, but there are a number of non-trivial issues at play here.

Nathan has already answered some of your questions.  I'd like to start by providing some historical and design center context.  From the outset PCP was not designed for event-tracing, but PCP was designed for a specific class of performance monitoring and management scenarios.

The table below outlines some of the differences ... these help to explain why PCP is a priori not necessarily suitable for event tracing.  This does not mean PCP could not evolve to support event-tracing in the ways Nathan has suggested, we just need to understand that the needs are different and make sure we do not end up morphing PCP into something that no longer works for the original design center and may not work all that well for event tracing.

Feature
PCP Design Center
Event Tracing
Locality of data processing
Monitored system is typically not the same system that the collection and/or analysis is performed on.
Data collection happens on the system being monitored, analysis may happen later on another system.
Real time analysis
Central to the design requirements.
Often not required, other than edge-triggers to start and stop collection.
Retrospective analysis
Central to the design requirements.
Central to the design requirements.
Time scales
We are typically concerned with large and complex systems where average levels of activity over periods of the order of tens of seconds are representative.
Short-term and transients are often important, and inter-arrival time for events may be on the order of milliseconds.
Data rates
Moderate. Monitoring is often long-term, requiring broad and shallow data collection, with a small number of narrow and deep collections aligned to known or suspected problem areas.
Very high.  Monitoring is most often narrow, deep and short-lived.
Data spread
Very broad ... interesting data may come from a number of places, e.g. hardware instrumentation, operating system stats, service layers and libraries, applications and distributed applications.
Very narrow ... one source and one host.
Data semantics
A very broad range, but the most common are activity levels and event counters (with little or no event parameter information)
Very specific, being the record of an event and its parameters with a high resolution time stamp.
Data source extensibility
Critical.
Rare.

So with this backgrtound, let's look at Frank's specific questions.
* poll-based data gathering

  It seems as though PMDAs are used exclusively in 'polling' mode,
  meaning that underlying system statistics are periodically queried
  and summary results stored.  In our context, it would be useful if
  PMDAs could push event data into the stream as they occur - perhaps
  hundreds of times a second.

Yep, this would be a big change.  There is not really a data stream in PCP ... there is a source of performance metrics (a host or an archive) and clients connect to that source and pull data at a sample interval defined by the client.

At the host source, the co-ordinating daemon (pmcd) maintains no cache nor stream of recent data ... a client asks for a specific subset of the available information, this is instantiated and returned to the client.  There is no requirement for the subsets of the requested information to be the same for consecutive requests from a single client, and pmcd is receiving requests from a number of clients that are handled completely independently.

As Nathan has suggested, if event traces are intended for retrospective analysis (as opposed to event counters being suited for either real time or retrospective analysis), then there is an alternative approach, namely to create a PCP archive directly from a source of data without involving pmcd or a pmda or pmlogger.  We've recently reworked the "pmimport" services to expose better APIs to support just this style of use ... see LOGIMPORT(3) and sar2pcp(1) for an example.  I think this approach is possibly a better semantic match between PCP and a stream of event records.

* relatively static pmns

  It would be desirable if PMNS metrics were parametrizable with
  strings/numbers, so that a PMDA engine could use it to synthesize
  metrics on demand from a large space.  (Example: have a
  "kernel-probe" PMNS namespace, parametrized by function name, which
  returns statistics of that function's execution.  There are too many
  kernel functions, and they vary from host to host enough, so that
  enumerating them as a static PMNS table would be impractical.)

This is not so much of a problem.  We've relaxed the PMNS services to allow PMDAs to dynamically define new metrics on the fly.  And as Nathan has pointed out, the instance domain provides a dynamic dimension for the available metric values that may also be useful, e.g. this is how all of procfs is instantiated.

* scalar payloads

  It seems as though each metric value provided by PMDAs is
  necessarily a scalar value, as opposed to some structured type.  For
  event tracing, it would be useful to have tuples.  Front-ends could
  choose the interesting fields to render.  (Example: tracing NFS
  calls, complete with decoded payloads.)


We've tried really hard to make the PCP metadata rich enough (in the data model and the API services) to enable clients to be data-driven, based on what performance data happens to be available today from a host or archive.  This is why the data aggregate (or blob) data type that Nathan has mentioned is rarely used (although it is fully supported).

If there was a tight coupling between the source of the event data and the client that interprets the event data, then the PCP data aggregate could be used to provide a transport and storage encapsulation that is consistent with the PCP APIs and protocols.  Of course, such a client would be exposed to all of the word-size, endian and version issues that plague other binary formats for performance data, e.g. the sar variants based on AT&T UNIX.

* filtering

  It would be desirable for the apps fetching metric values to
  communicate a filtering predicate associated with them, perhaps as
  per pmie rules.  This is to allow the data server daemon to reduce
  the amount of data sent to the gui frontends.  Perhaps also it could
  use them to inform PMDAs as a form of subscription, and in turn they
  could reduce the amount of data flow.

PMDAs are free to do as much or as little work as they choose.  Some are totally demand-driven, instantiating only the information they are asked for when they are asked for it.  Others use cacheing strategies to refresh some or all of the information at each request.  Others maintain timestamped caches and only refresh when the information is deemed "stale".  Another class run a refresh thread that is contunally updating a data cache, and requests are serviced from the cache.

The PMDA behaviour can be modal ... based on client requests, or more interestingly as Nathan has suggested using the pmStore(3) API to allow one or more clients to enable/disable collection (think about expensive, detailed information that you don't want to collect unless some client really wants it).  The values passed into the PMDA via pmStore(3) are associated with PCP metrics, so they have the full richness of the PCP data model to encode switches, text strings, blobs, etc.

* no web-based frontends

  In our usage, it would be desirable to have some mini pcp-gui that
  is based on web technologies rather than QT.

There are several examples of web interfaces driven by PCP data ... but each of these has been developed as a proprietary and specific application and hence is not included in the PCP open source distribution.  The PCP APIs provide all the services needed to build something like this.


To what extent could/should PCP be used/extended to cover this space?

I think this suggestion is worth further discussion, but we probably need some more concrete examples of the sorts of event trace data that is being considered, and the most likely use cases and patterns for that data.

Cheers, Ken.
<Prev in Thread] Current Thread [Next in Thread>