Hi -
With the recent python-pmda pmns-flexibility improvements, we can
revisit the old thread [1] about the metadata/schema format for the
json-pmda. When we left off, the prototype was based on a slightly
enlarged json-schema [2] format. It's kind of attractive because it
defines a simple mapping from json payload to metadata, by mirroring its
nesting structure to some extent.
Since then, I've come across a widget called jsonpointers. This is a
standardized notation for identifying parts of a json payload via a
string syntax; python libraries for it are easily available. The nice
thing about this is that it would allow our pcp-json metadata to be
focused on just what's needed to extract pcp metrics from an arbitrary
json document: no fluff.
Here's a hypothetical rewriting of the metadata [4] of dsmith's original
prototype. (The payload [5] could be unmodified, but let's imagine
that the "generation":1 and "data": { } wrappers are removed, and
write a metadata for that variant.) It should give the same pminfo
output (except for extra units included here).
% cat stap_json.json
{ "pcp-metrics":[
{"pmns": "json.xstring", # metric name
"pointer": "/xstring", # jsonpointer into json payload
"type": "string"}, # pmDesc; default units/semantics
{"pmns": "json.read_count",
"pointer": "/read_count",
"type": "int64",
"units": "bytes/sec"}, # (extra: feed to pmParseUnitsStr)
{"pmns": "json.dummy2",
"pointer": "/dummy2",
"type": "string"},
{"pmns": "json.dummy_array.dummy2",
"indom-str": "/dummy_array/-/__id", # identify indom field
"pointer": "/dummy_array/-/dummy2", # use - as array-index
jsonpointer
"type": "string"},
{"pmns": "json.dummy_array.dummy1",
"indom-str": "/dummy_array/-/__id",
"pointer": "/dummy_array/-/dummy1",
"type": "int64",
"semantics": "counter"},
{"pmns": "json.net_xmit_data.xmit_latency",
"indom-str": "/net_xmit_data/-/__id",
"pointer": "/net_xmit_data/-/xmit_latency",
"description": "sum of latency for xmit device",
"units": "ms", # (extra)
"type": "int64"},
{"pmns": "json.net_xmit_data.xmit_count",
"indom-str": "/net_xmit_data/-/__id",
"pointer": "/net_xmit_data/-/xmit_count",
"description": "number of packets for xmit device",
"type": "int64",
"semantics": "counter"}
],
"pmns-prefix":"stap_json" # in absence, default to the metadata file basename
}
This JSON-formatted metadata can be easily consumed by a python script.
It would construct the PMNS from the obvious fields. (The metadata file
may nominate a prefix, to make it possible for stap-generated metadata
files to hide their clumsy stap_XXXXX names.)
Each metric value would be found by jsonpointer-dereferencing the
"pointer" field against the json payload file. The only tricky aspect
is the indoms/arrays. The above proposal uses the "-" jsonpointer
syntax to identify the (sole) indexing dimension that is to be turned
into a pcp instance-domain; the python script would iterate 0... along
that array index to enumerate the actual indom & values. (Note that
in this model, the __id parameter is not hard-coded in python, and
pure-numeric indoms fit easily.)
As a further step, it could simplify the json-pmda configuration if
the metadata file contained within it instructions as to where to
fetch the json payload:
"payload-file":"foo.json"
or even "payload-exec":"ceph perf metric"
or even "payload-url":"http://localhost:1235/foo-metrics"
then the json-pmda would need to be configured only with a list of
metadata files/directories, and it can find the payload files by
itself.
[1] https://sourceware.org/ml/systemtap/2014-q3/msg00302.html
[2] http://json-schema.org/
[3] https://python-json-pointer.readthedocs.org/en/latest/mod-jsonpointer.html
[4] https://sourceware.org/ml/systemtap/2014-q3/txtRzjotldCEn.txt
[5] https://sourceware.org/ml/systemtap/2014-q3/txtfRBQXZpuUt.txt
|