OK, I have a proposal for your consideration and feedback.
This is a bit long, apologies for that.
New pmimport proposal
Assumptions
- there is no need to directly support the old pmimport API, as the only
plugins were for various versions of sar on irix and unicos
- some pmimport API users may have little knowledge of PCP internal
data models
- there is a case for both C/C++ and Perl APIs
- it makes more sense for each customization piece to include a
main loop fetching data and calling the pmimport API to produce
a PCP archive (as opposed to the old way of the mainloop in
pmimport and the customization providing callback implementations)
Proposal
Create a new libpcp_import library in C.
Unless stated otherwise, these functions return 0 for success else
values less than zero for PCP error codes.
int pmiSetOption(int option, char *value)
Call this early on to set archive name (option == PMI_ARCHIVE),
archive's hostname (option == PMI_HOSTNAME), archive's timezone
(option == PMI_TIMEZONE), etc.
int pmiAddMetric(char *name, pmID pmid, int type, pmInDom indom,
int sem, pmUnits units)
Define a metric.
The arguments after name are the fields of a pmDesc, just linearized
to make is easier for the Perl API (like add_metric in PCP::PMDA)
and all of pmID, pmInDom and pmUnits are in effect ints as well.
If pmid is PM_ID_NULL (-1), then the library will assign the pmid
(but this means the user must use pmiValue or pmiValueHdl and cannot
use pmiResult, see below).
indom must be PM_INDOM_NULL (-1) or a distinct value that is the
same for all metrics with the same instance domain.
int pmiAddInstance(pmInDom indom, char *instance, int inst)
Define an instance for a specific instance domaim (indom).
instance must follow the "unique up to the first space" rule for
all external instance names defined for a specific instance domain.
If the internal instance identifier (inst) is PM_IN_NULL (-1) then
the library will assign the value (but this means the user must
use pmiValue or pmiValueHdl and cannot use pmiResult, see below).
Otherwise, inst must be unique for all instances for a specific
instance domain.
A companion routine to _remove_ and instance from an instance domain
is not provided as the likely uses for pmimport are cases where
the instances are fixed for the duration of the archive. Even if
this is not the case, as new instances are discovered they can be
added with pmiAddInstance() and retired instances could simply _not_
appear in the archive pmResults.
int pmiPutValue(char *name, char *instance, pmAtomValue *atom)
Add a single value to the current result for a given metric and
instance.
The type of the value within the union (atom) should match the type
of the metric defined with pmiAddMetric.
int pmiGetHdl(char *name, char *instance)
int pmiPutValueHdl(int handle, pmAtomValue *atom)
These two routines provide a performance optimization in that
a metric instance pair may be defined with pmiGetHdl() and the
returned value is a unique "handle" that can be used in subsequent
calls to pmiPutValueHdl(). This avoids the double string lookup
per call that is associated with pmiPutValue().
int pmiWrite(struct timeval *stamp)
Output the values previously gathered from calls to pmiPutValue()
and/or pmiPutValueHdl() as a single pmResult with a specific
timestamp
(stamp).
Clears all state associated with previous calls to pmiPutValue()
and/or pmiPutValueHdl().
int pmiPutResult(pmResult *result)
For the programmer comfortable with the PCP data structures, this
routine provides a direct way to construct an archive record.
It is equivalent to one call to pmiPutValue() or pmiPutValueHdl()
for each metric-instance pair in the result, and then calling
pmiWrite().
For each of the routines above there would be a Perl wrapper
(pmiFooBar()
is wrapped by foo_bar()) and some constant functions and helper
routines,
so a Perl pmimport program might look like:
use PCP::IMPORT;
use vars qw( $import $myindom $inst $ctr $now $time );
$import = PCP::IMPORT->new('');
$myindom = 1;
$import->add_metric('myown.counter', PM_ID_NULL, PM_TYPE_U32,
PM_INDOM_NULL, PM_SEM_COUNTER, units(0,0,1,0,0,PM_COUNT_ONE));
$import->add_metric('myown.time', PM_ID_NULL, PM_TYPE_FLOAT,
$myindom, PM_SEM_INSTANT, units(0,1,0,0,PM_TIME_SEC,0));
...
$import->add_instance($myindom, "sleep", PM_IN_NULL);
$import->add_instance($myindom, "eat", PM_IN_NULL);
$import->add_instance($myindom, "play", PM_IN_NULL);
...
# main loop once per output record
...
# loop per metric
$import->put_value('myown.counter', PM_IN_NULL, $ctr);
...
# loop over instances
$import->put_value('myown.time', $inst, $time);
...
$import->write($now);
Some other notes/considerations
Every metric and instance appearing in the output archive must have been
defined in a call to pmiAddMetric() and pmiAddInstance() before calling
pmiPutValue() or pmiPutValueHdl() or pmiPutResult().
End of archive processing is needed to write the log trailer and is
handled by an atexit handler.
The library will handle all of the metadata operations as metrics and
instance domains appear in the output archive records.
The traditional pmimport(1) application becomes a front-end that could
execute "standard" and shipped binaries built with the libpcp_import to
support standard and perhaps even generic data sources. By collecting
these binaries in standard place, add-ons can extend the set of
available converters (which provides the same functionality as the
plug-in architecture of the original implementation)
I've toyed with a pmiConfig() routine as a generic metadata parser to
handle standard data formats (like CSV or spreadsheets) in a data-driven
way. The problem I've come up against is that while a standard syntax
(Nathan's keyword one or even XML) is very possible, and the associated
calls to pmiSetOption(), pmiAddMetric() and pmiAddInstance() would all
work just fine, I cannot see how to marry up the resultant metadata with
the data stream.
Consider a 5 column spreadsheet and the config (in Nathan's syntax):
instance name="some" id=0 indom=7
instance name="other" id=1 indom=7
metric name=db.cache.hits id=128.4.23 indom=7 type=uint32 sem=counter
units=0,0,1,0,0,count
metric name=db.transactions id=128.4.24 type=uint64 sem=counter
units=0,0,1,0,0,count
metric name=db.cache.misses id=128.4.25 indom=7 type=uint32 sem=counter
units=0,0,1,0,0,count
The problem is we have defined 5 possible metric-instance pairs
db.cache.hits[some]
db.cache.hits[other]
db.transactions
db.cache.misses[some]
db.cache.misses[other]
but there is no way to
(a) tell which metric-instance pair is associated with each column of
the
spreadsheet (and the problem only gets worse when not all possible
metric-instance pairs are actually in the spreadsheet)
(b) reach back inside the metadata in the library to know how to parse
each cell of the spreadsheet and prepare the pmResult or the
parameters for calls to pmiPutValue() or pmiPutValueHdl()
|