pcp
[Top] [All Lists]

Re: [pcp] PCP question ... leading to pmimport discussion

To: nathans@xxxxxxxxxx
Subject: Re: [pcp] PCP question ... leading to pmimport discussion
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 02 Jul 2010 08:49:34 +1000
Cc: pmatousu@xxxxxxxxxxxxx, pcp@xxxxxxxxxxx
In-reply-to: <1619637932.410411277773130222.JavaMail.root@xxxxxxxxxxxxxxxxxx>
References: <1619637932.410411277773130222.JavaMail.root@xxxxxxxxxxxxxxxxxx>
Reply-to: kenj@xxxxxxxxxxxxxxxx
OK, I have a proposal for your consideration and feedback.

This is a bit long, apologies for that.

New pmimport proposal

Assumptions

- there is no need to directly support the old pmimport API, as the only
  plugins were for various versions of sar on irix and unicos

- some pmimport API users may have little knowledge of PCP internal
  data models

- there is a case for both C/C++ and Perl APIs

- it makes more sense for each customization piece to include a
  main loop fetching data and calling the pmimport API to produce
  a PCP archive (as opposed to the old way of the mainloop in
  pmimport and the customization providing callback implementations)

Proposal

Create a new libpcp_import library in C.

Unless stated otherwise, these functions return 0 for success else
values less than zero for PCP error codes.

int pmiSetOption(int option, char *value)
    Call this early on to set archive name (option == PMI_ARCHIVE),
    archive's hostname (option == PMI_HOSTNAME), archive's timezone
    (option == PMI_TIMEZONE), etc.

int pmiAddMetric(char *name, pmID pmid, int type, pmInDom indom,
             int sem, pmUnits units)
    Define a metric.

    The arguments after name are the fields of a pmDesc, just linearized
    to make is easier for the Perl API (like add_metric in PCP::PMDA)
    and all of pmID, pmInDom and pmUnits are in effect ints as well.

    If pmid is PM_ID_NULL (-1), then the library will assign the pmid
    (but this means the user must use pmiValue or pmiValueHdl and cannot
    use pmiResult, see below).

    indom must be PM_INDOM_NULL (-1) or a distinct value that is the
    same for all metrics with the same instance domain.

int pmiAddInstance(pmInDom indom, char *instance, int inst)
    Define an instance for a specific instance domaim (indom).

    instance must follow the "unique up to the first space" rule for
    all external instance names defined for a specific instance domain.

    If the internal instance identifier (inst) is PM_IN_NULL (-1) then
    the library will assign the value (but this means the user must
    use pmiValue or pmiValueHdl and cannot use pmiResult, see below).
    Otherwise, inst must be unique for all instances for a specific
    instance domain.

    A companion routine to _remove_ and instance from an instance domain
    is not provided as the likely uses for pmimport are cases where
    the instances are fixed for the duration of the archive.  Even if
    this is not the case, as new instances are discovered they can be
    added with pmiAddInstance() and retired instances could simply _not_
    appear in the archive pmResults.

int pmiPutValue(char *name, char *instance, pmAtomValue *atom)
    Add a single value to the current result for a given metric and
    instance.

    The type of the value within the union (atom) should match the type
    of the metric defined with pmiAddMetric.

int pmiGetHdl(char *name, char *instance)
int pmiPutValueHdl(int handle, pmAtomValue *atom)
    These two routines provide a performance optimization in that
    a metric instance pair may be defined with pmiGetHdl() and the
    returned value is a unique "handle" that can be used in subsequent
    calls to pmiPutValueHdl().  This avoids the double string lookup
    per call that is associated with pmiPutValue().

int pmiWrite(struct timeval *stamp)
    Output the values previously gathered from calls to pmiPutValue()
    and/or pmiPutValueHdl() as a single pmResult with a specific
timestamp
    (stamp).

    Clears all state associated with previous calls to pmiPutValue()
    and/or pmiPutValueHdl().

int pmiPutResult(pmResult *result)
    For the programmer comfortable with the PCP data structures, this
    routine provides a direct way to construct an archive record.

    It is equivalent to one call to pmiPutValue() or pmiPutValueHdl()
    for each metric-instance pair in the result, and then calling
    pmiWrite().

For each of the routines above there would be a Perl wrapper
(pmiFooBar()
is wrapped by foo_bar()) and some constant functions and helper
routines,
so a Perl pmimport program might look like:

    use PCP::IMPORT;

    use vars qw( $import $myindom $inst $ctr $now $time );
    $import = PCP::IMPORT->new('');
    $myindom = 1;

    $import->add_metric('myown.counter', PM_ID_NULL, PM_TYPE_U32,
            PM_INDOM_NULL, PM_SEM_COUNTER, units(0,0,1,0,0,PM_COUNT_ONE));
    $import->add_metric('myown.time', PM_ID_NULL, PM_TYPE_FLOAT,
            $myindom, PM_SEM_INSTANT, units(0,1,0,0,PM_TIME_SEC,0));
    ...
    $import->add_instance($myindom, "sleep", PM_IN_NULL);
    $import->add_instance($myindom, "eat", PM_IN_NULL);
    $import->add_instance($myindom, "play", PM_IN_NULL);
    ...

    # main loop once per output record
    ...
        # loop per metric
            $import->put_value('myown.counter', PM_IN_NULL, $ctr);
            ...
            # loop over instances
                $import->put_value('myown.time', $inst, $time);
                ...
        $import->write($now);


Some other notes/considerations

Every metric and instance appearing in the output archive must have been
defined in a call to pmiAddMetric() and pmiAddInstance() before calling
pmiPutValue() or pmiPutValueHdl() or pmiPutResult().

End of archive processing is needed to write the log trailer and is
handled by an atexit handler.

The library will handle all of the metadata operations as metrics and
instance domains appear in the output archive records.

The traditional pmimport(1) application becomes a front-end that could
execute "standard" and shipped binaries built with the libpcp_import to
support standard and perhaps even generic data sources.  By collecting
these binaries in standard place, add-ons can extend the set of
available converters (which provides the same functionality as the
plug-in architecture of the original implementation)

I've toyed with a pmiConfig() routine as a generic metadata parser to
handle standard data formats (like CSV or spreadsheets) in a data-driven
way.  The problem I've come up against is that while a standard syntax
(Nathan's keyword one or even XML) is very possible, and the associated
calls to pmiSetOption(), pmiAddMetric() and pmiAddInstance() would all
work just fine, I cannot see how to marry up the resultant metadata with
the data stream.

Consider a 5 column spreadsheet and the config (in Nathan's syntax):

instance name="some" id=0 indom=7
instance name="other" id=1 indom=7
metric name=db.cache.hits id=128.4.23 indom=7 type=uint32 sem=counter
units=0,0,1,0,0,count
metric name=db.transactions id=128.4.24 type=uint64 sem=counter
units=0,0,1,0,0,count
metric name=db.cache.misses id=128.4.25 indom=7 type=uint32 sem=counter
units=0,0,1,0,0,count

The problem is we have defined 5 possible metric-instance pairs
        db.cache.hits[some]
        db.cache.hits[other]
        db.transactions
        db.cache.misses[some]
        db.cache.misses[other]
but there is no way to
(a) tell which metric-instance pair is associated with each column of
the
    spreadsheet (and the problem only gets worse when not all possible
    metric-instance pairs are actually in the spreadsheet)
(b) reach back inside the metadata in the library to know how to parse
    each cell of the spreadsheet and prepare the pmResult or the
    parameters for calls to pmiPutValue() or pmiPutValueHdl()


<Prev in Thread] Current Thread [Next in Thread>