pcp
[Top] [All Lists]

Re: [pcp] fetchgroups api - python bindings

To: Mark Goodwin <mgoodwin@xxxxxxxxxx>
Subject: Re: [pcp] fetchgroups api - python bindings
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Mon, 14 Dec 2015 10:42:41 -0500
Cc: myllynen@xxxxxxxxxx, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <566E5093.6080603@xxxxxxxxxx>
References: <20151206204742.GC22561@xxxxxxxxxx> <5666E4F7.4070005@xxxxxxxxxx> <y0m4mft10vf.fsf@xxxxxxxx> <566A59A5.6090403@xxxxxxxxxx> <20151211150348.GH22434@xxxxxxxxxx> <566E5093.6080603@xxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi, Mark -


> [...]
> >>and rely only on the per-instance error returns?  (and mandate the
> >>currently optional error arrays in the fetchgroup_create calls)?
> >
> >It's one of those cases where we can't really mandate.  Even if we
> >make the caller pass in an int* of statuses, we can't force them to
> >check them [...]
> 
> omitting error checks would encourage poor code. 

*We* are not omitting error checks; we're permitting applications to
decide not to check certain errors, if the sentinel value is good
enough for their purposes.  Again see pmstat.

> And using an ambiguous sentinel to indicate either an error or a
> valid value seems wrong. e.g.  consider
> network.interface.total.errors ...

Yes, but -some- value needs to be stored so we don't leave things
uninitialized.

> maybe we could keep the per instance err arrays internally (if not
> passed as args), and provide a function to check for errors for a
> particular fetchgroup instance/value

That wouldn't really help, since we can't force an application to call
that function.


> >>- Need some API documentation and more examples of the python binding in
> >>pmFetchGroup(3)
> >
> >Hm, where are python APIs documented in general?  Not in man pages
> >AFAIK.
> 
> As Lukas pointed out, in the code itself, including example code. 

That was already done.

> And in the Programmer's Guide. 

Will look into it.  It might need little other than copy & pasteing of
the man page or sample code.

> The section 3 man page could also document the python binding and
> provide usage examples. [...]

That would be new; there's no pcp precedent for python api
documentation in the man3 pages.


> >[...]  why not have PMAPI do a pmReconnectContext underneath
> >us all the time?
> 
> good idea [...]

Opened http://oss.sgi.com/bugzilla/show_bug.cgi?id=1131 .


> >>- how come no support for PM_TYPE_AGGREGATE (and events)?
> >
> >AGGREGATE would be a possibility, using the pmAtomValue vbp pointer, I
> >guess, but it seemed far-fetched as a beneficiary of rate/unit
> >conversion.  The only real PMDAs that provide aggregate data at the
> >moment are windows-events and systemd (which also offers non-blob
> >alternatives for the same data).  How about we leave this as a todo,
> >in case it becomes interesting?
> 
> Well, it may have just become interesting! as per Marko's reply

Note he was talking about PM_TYPE_EVENT.  That is in no way trivial to
decode.  Each pmFetch can result in a vector of event records, each of
which is a tuple of arbitrary metrics, potentially recursively.  I
have not yet gotten my head around what a simplified pmfg-flavoured
API for this could look like.  It's not just one pmAtomValue per
metric/instance, it's a tree of diverse ones!


> >>- overload pmCreateFetchGroup() to take either a context, or a
> >>source string, (defaulting to "local:"). And then provide method to
> >>return the context for use by other pmapi functions.
> >
> >Does that really seem like it would save anything?  The context
> >creation is just one function call already.  And if the pmNewContext
> >failed, one can ignore its rc anyway and let the following
> >pmCreateFetchGroup return the PM_ERR_NOCONTEXT.
> 
> well it saves one or two lines or code (no big deal), but perhaps
> more importantly it reinforces one fg per context.

Hm, I'm starting to like the sound of that.  It would make it less
likely to accidentally misuse (share) the pmfg-dedicated contexts -
and moot multithreading / sharing-detection-error concerns.


> [...]
> some kind of common filtering API would make sense, but not urgently.
> (add to pmcc perhaps)

Yeah, will think about it more later.  There is probably some benefit
to pushing some function down to the pmfg layer (to optimize indom
profiles).  It could be added to the API later without breaking the
current code.


> >>- heaps more QA :
> >>     qa for multiple fetchgroups from the same context
> >
> >Already documented as improper (esp. without a functional pmDupContext).
> 
> then an error should be returned right?

It can't easily be detected.  pmfg does not have global state, so has
no way of knowing what other pmfg instances might be doing with the
context, or indeed what the application might be doing with the
contexts with lower level pmapi calls.


> >Well, if PM_ERR_VALUE is not the right error number for missing
> >values, what is?  Or shall we send back a 0 sentinel value? :-)
> 
> PM_ERR_VALUE is more for values that are not fetchable for whatever reason
> (inst went away, whatever). Here we need two values for the rate conversion
> and the values _are_ available. so maybe PM_ERR_AGAIN? or a new err code?
> (since we're extending the pampi here)

PM_ERR_AGAIN makes sense to me (for the rate conversion case).


> [...]  Perhaps a hidden first fetch is better - proper tools will
> ensure two fetches before reporting anything anyway; this is the
> first time any PCP API has actually done rate conversion
> automatically - it's always been left to the caller AFAIK [...]

The derived-metric rate() operator does it too, and models missing
history as an absent result in the pmResult, i.e., numval=0 in the
pmValueSet, which is an error to the application.

So the alternatives seem to be:

a) a fixed sentinel value
b) a hidden first fetch
c) a hypothetical rate (e.g. current-count / system-uptime, how
   /usr/bin/iostat present the first row of info)
d) PM_ERR_VALUE  (status quo)
e) PM_ERR_AGAIN

To me, (b) seems a lot like (a), because the computed rate value will
be meaningless, as the rate would be computed between two arbitrarily
close timestamps.

Note also that the case of a counter value disappearing then
reapparing looks to pmfg a lot like the first-fetch.  It would seem
desirable to have the same policy for both cases.  (The (c) case
for example would not sensibly apply to the disappear-reappear case.) 

It's an interesting dual of the "ambiguous sentinel / forced error
checking" philosophy at the top.


- FChE

<Prev in Thread] Current Thread [Next in Thread>