pcp
[Top] [All Lists]

Re: systemtap/pcp integration pmda 0.1

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: systemtap/pcp integration pmda 0.1
From: David Smith <dsmith@xxxxxxxxxx>
Date: Tue, 23 Sep 2014 09:39:23 -0500
Cc: Systemtap List <systemtap@xxxxxxxxxxxxxx>, pcp <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <y0mh9zztipi.fsf@xxxxxxxx>
References: <54133D71.6040208@xxxxxxxxxx> <y0mh9zztipi.fsf@xxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0
On 09/22/2014 05:04 PM, Frank Ch. Eigler wrote:
> 
> dsmith wrote:
> 
>> Here's version 0.1 (up from 0.01!) of my systemtap/pcp integration work
> 
> Thanks a lot!
> 
>> that uses systemtap (https://sourceware.org/systemtap/) to export JSON
>> data and a pcp (http://www.performancecopilot.org/) python pmda that
>> reads and processes the JSON data. [...]
> 
> (Sorry I missed this when it went by -- please call the next
> version 3.14 or something else very different from 0.1! :-)

The 1st one was called '0.01'. See, I'm making progress...

>> # pminfo -df stap_json
>>
>> stap_json.json.dummy2
>>     Data Type: string  InDom: PM_INDOM_NULL 0xffffffff
>>     Semantics: instant  Units: none
>>     value "dummy2"
>>
>> stap_json.json.dummy_array.dummy2
>>     Data Type: string  InDom: 130.1 0x20800001
>>     Semantics: instant  Units: none
>>     inst [0 or "1"] value "def"
>>     inst [1 or "0"] value "abc"
>>     inst [2 or "2"] value "ghi"
>> [...]
> 
> Looking good!
> 
> 
>> // ===========================
>> // ==== Tapset file start ====
>> // ===========================
>>
> 
> Looks long but generally good.  (But see below re. suggestions about
> the schema/data representations.)
> 
> 
>> global net_devices
>> global read_count
>>
>> probe json_data
>> {
>>   @json_output_data_start
>>   @json_output_string_value("xstring", "testing, 1, 2, 3")
>>   @json_output_numeric_value("read_count", read_count)
>>   read_count++
>>
>>   foreach (dev in net_devices) {
>>     if (@count(skb_queue_t[dev])) {
>>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count",
>>                                     @sum(skb_queue_t[dev]))
>>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_latency",
>>                                     @count(skb_queue_t[dev]))
>> [...]
>>   @json_output_data_end
>> }
>>
>> // Set up the metrics
>> probe begin
>> {
>> [...]
>>   json_add_string_metric("xstring", "Test string")
>>   json_add_numeric_metric("read_count", "Times values read")
>> [...]
>> }
> 
> 
> Have you considered merging together these two bits of code, so that a
> single stap probe alias that generates json data is also used to
> populate metadata globals, so a subsequent (!) schema json query would
> be possible?  Something like
> 
> probe json_data {
>    @json_output_string_value("xstring", "testing 1, 2, 3", "Test String")
>    @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 3,
>                          "sum of latency for xmit device", $UNITS/SCALE)
> }
> 
> so that the metadata is attached at the end of the data-supplying calls?

Hmm, I hadn't considered that. It might be possible to do, but seems
quite tricky in getting the macros right to support both schema output
and data output.

>> #!/usr/bin/python
>> import json
>> import jsonschema
>> import collections
>> [...]
> 
> This PMDA code looks quite manageable; in particular the cumulative
> nature of imdom changes looks right on.  Please don't shoot me as I'll
> suggest a somewhat different schema/data encoding below; I'm pretty
> sure they're not too hard to express in the python code.  The main
> reason for proposing the changes is so that this pmda has a fighting
> chance at consuming json data from sources other than just the above
> systemtap tapset.
> 
> 
> 
>> {
>>   "generation": 1,
>>   "data": {
> 
> (IMHO we shouldn't mandate such wrappers.)

Here's the deal here. I stole the "generation" idea from the mmv code.
If I want to support being able to add/remove fields on the fly, I have
to let the pmda know something has changed. Hence the "generation"
field. In theory a change in the generation value would trigger the pmda
to re-read the JSON schema. (I say "in theory" because I haven't
implemented that yet in the systemtap or pcp side.)

There may be other ways to notice added/removed fields.

The only reason for the "data" wrapper is that I didn't want to disallow
a user's "generation" data field.

>>     "xstring": "testing, 1, 2, 3",
>>     "read_count": 9,
>>     "net_xmit_data": [
>>       {
>>         "__id": "eth0",
>>         "xmit_count": 7699136,
>>         "xmit_latency": 1109
>>       },
> 
> (We already mentioned in passing how the "__id" string might be
> desirable to be schema-configured.)

Yep, it could be possible to have that configurable.

> Anyway, onto the schema.  
> 
> I see how you chose the json-schema.org to piggyback-ride on.  One
> thing we should keep in mind though that json-schema does a slightly
> different thing than what we need.  It's more like an XML DTD, and
> just describes what's a "grammatically correct" document.  We do not
> really need this exact kind of checking, but it's not a big hindrance
> either - it's not grossly wordy.  (The "additionalProperties=false"
> might be an example of unhelpful wordiness though.)

The "additionalProperties=false" is optional, and just a bit more
correct. We can remove it.

> What we really need is the interpretation, for purposes of extracting
> data and relaying to PCP.  And for that, we can be a little more
> aggressive in the sense of adding our own schema elements, rather than
> riding on top of json-schema.org patterns.  For example:
> 
> 
>> {
>>   "type": "object",
>>   "title": "root",
>> [...]
>>   "properties": {
>> [...]
>>     "data": {
>> [...]
>>       "properties": {
>>         "xstring": {
>>           "type": "string",
>>           "description": "Test string",
>>           "additionalProperties": false
>>         },
>> [...]
> 
> The prototype PMDA turns this into metric "json.xstring" of pcp
> PM_TYPE_STRING.  The heuristic's probably fine, but if we want
> more generality, we could as well do something like this to
> describe a scalar:
> 
> { "foo": { "xstring":
>      { "pcp-name":"foo.bar.xstring",
>        "pcp-type":"string",          // esp. if pmda offers to cast
>        "pcp-units":"MBytes/sec",     // need an inverter for pmUnitsStr(3)
>        "pcp-semantics":"instant",    // need an inverter for PM_SEM_*
>        "pcp-shorthelp":"short help", 
>        "pcp-longhelp":"long help" 
>      }
> } }
> 
> (Adding the json-schema fields is optional & orthogonal.)

I see what you are doing here, but I'm quite unsure. If you goal is to
handle JSON from a variety of sources, to my mind this is a step
backwards. Your more generic source isn't likely to output a schema in
that format.

> One benefit of a formal "pcp-name" field here is that the mapping from
> the JSON nesting structure need not match the pcp namespace exactly.
> It would let the json object name components be free of constraints
> like not containing dots (since we would not propagate them to pcp).

Validating names (no dots, spaces, etc. and not too long) is on my todo
list.

Currently, fields end like the the following:

stap_json.{STAP_MODULE_NAME}.{FIEILD_NAME}

Originally I had designs of allowing the user to override
{STAP_MODULE_NAME}. But then we have issues with that field being
unique. For instance if the same systemtap script was run twice, both
would try to override the field to the same value. Since we're assured
that {STAP_MODULE_NAME} is unique, I just decided to go with it.

I'm not really fond of the 'pcp-name' field idea. It means more
validation (on both sides?) in not allowing things like "foo.bar" being
a value and then "foo.bar.baz" being a value.

>>         "net_xmit_data": {
>>           "type": "array",
>>           "description": "Network transmit data indexed by ethernet device",
>>           "additionalProperties": false,
>>           "items": {
>>             "type": "object",
>>             "additionalProperties": false,
>>             "properties": {
>>               "__id": {
>>                 "type": "string",
>>                 "additionalProperties": false
>>               },
>>               "xmit_count": {
>>                 "type": "integer",
>>                 "description": "number of packets for xmit device",
>>                 "additionalProperties": false
>>               },
>> [...]
> 
> Here's an alternative formulation kind of along the previous one:
> 
> 
> { "bar": { "networks":
>      { "pcp-imdom-discriminator":"__id",  // parametrizing this
>        "type":"array",   // json-schema style identification of array-ness
>        "items": {
>           "xmit": {
>             "pcp-name":"bar.xmitfoo",
>             "pcp-type":"float",           // (stap can print fp with some 
> effort)
>             "pcp-units":"Bytes/hour",
>             "pcp-semantics":"instant",
>             "pcp-shorthelp":"short help", 
>             "pcp-longhelp":"long help" 
>             }
>       } }
> } }
> 
> 
> Again, the json-schema parts are mostly orthogonal (just kiting the
> array-ness description).
> 
> So what would something like this give us?  At the pure stap-pmda
> level, not that much extra over what the 0.1 prototype has.  But
> beyond stap, we may well be able to write some schema for more general
> json files, trivially e.g. ones that lack the "data" as top-level
> wrapper.
>
> Before I go and write a bigger example schema of some other JSON data
> we have lying around, do you see what I'm getting at?  Do you agree
> that this style is also implementable in the python pmda?

This is probably implementable, although I do lose the easy data/schema
validation provided by the stock python JSON stuff.

I guess I'm coming at this from a different angle.

- If we want this pmda to (one day) support more generic JSON sources,
we'll have to expect generic JSON schemas.

- If we'd like the systemtap side of things to be able to support other
data collectors (nagios, zabbix, etc.), it should export a fairly
generic JSON schema.

To my mind, the changes you've got here take us farther from both goals.

-- 
David Smith
dsmith@xxxxxxxxxx
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

<Prev in Thread] Current Thread [Next in Thread>