pcp
[Top] [All Lists]

Re: [pcp] fetchgroups api - python bindings

To: Mark Goodwin <mgoodwin@xxxxxxxxxx>
Subject: Re: [pcp] fetchgroups api - python bindings
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Fri, 11 Dec 2015 10:03:48 -0500
Cc: myllynen@xxxxxxxxxx, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <566A59A5.6090403@xxxxxxxxxx>
References: <20151206204742.GC22561@xxxxxxxxxx> <5666E4F7.4070005@xxxxxxxxxx> <y0m4mft10vf.fsf@xxxxxxxx> <566A59A5.6090403@xxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi, Mark -

Wow, thanks for your effort & comments.


> [...]
> - sentinal value of 0 as an alternative error indicator for integer metrics,
> as explained at the end of the pmfetchgroup(3) man page :

> 0 is a very valid value for many metrics (especially those with
> instant semantics) so is hardly a good choice for indicating "no
> value available".  Perhaps just do away with this

Note that other numbers like -1 seem tempting, but from other actual
usage, like the pmstat conversion, 0 is useful.

> and rely only on the per-instance error returns?  (and mandate the
> currently optional error arrays in the fetchgroup_create calls)?

It's one of those cases where we can't really mandate.  Even if we
make the caller pass in an int* of statuses, we can't force them to
check them, unless perhaps we do something like deliberately messing
up the data values so they are unreliable.  That would seem harsh.
(Similarly, we can't force applications to check pmFetch() or
pmResult-embedded error codes.)


> - Need some API documentation and more examples of the python binding in 
> pmFetchGroup(3)

Hm, where are python APIs documented in general?  Not in man pages
AFAIK.


> - Internal exception handling to pmReconnectContext, especially for
> pmFetchGroup() but also for all other fetchgroup API
> functions. Otherwise we'll end up with duplicated code to do this in
> new clients. It could be transparent (?)

I've been wondering about pmReconnectContext too.  The documentation
does not provide much guidance as to what sorts of invariants we are
permitted to assume after a reconnection.  The key question is: can we
skip any PMNS/indom lookups and reuse earlier results?

If no, then we'd have to save the pmExtendFetchGroup* parameters, for
a post-reconnect rescan.  And then we'd have to have a way of
signaling new errors that may occur.

If yes, then ... why not have PMAPI do a pmReconnectContext underneath
us all the time?


> - We can extract inst id, inst name, values and tiemstamps. But how
> about helptext and some of the other descriptor fields? Or just use
> the existing pmapi functions off the contextfor this?

Yeah.


> - how come no support for PM_TYPE_AGGREGATE (and events)?

AGGREGATE would be a possibility, using the pmAtomValue vbp pointer, I
guess, but it seemed far-fetched as a beneficiary of rate/unit
conversion.  The only real PMDAs that provide aggregate data at the
moment are windows-events and systemd (which also offers non-blob
alternatives for the same data).  How about we leave this as a todo,
in case it becomes interesting?

EVENT is so complicated (requiring further recursive processing for
the nested fields, many pmAtomValue instances, etc.) that it just
doesn't seem like a fit for a simplified API at all.


> - Do we want a c++ binding (similar to API to py)?

Well, possible, but libpcp doesn't speak c++ AIUI.  Where would one
plop one?  And it's not like the pmfg C API is that bad from C++
clients: see pmmgr.


> Also, the fg py binding might probably slot into the pmcc classes
> better than pmapi, but then under the hood you've extended libpcp so
> I guess it belongs in pmapi.py, so we might be able to eventually
> delete some of the pmcc helpers (?)

One could rework pmcc to use pmfg perhaps.  Or leave it alone as a
legacy API.


> - overload pmCreateFetchGroup() to take either a context, or a
> source string, (defaulting to "local:"). And then provide method to
> return the context for use by other pmapi functions.

Does that really seem like it would save anything?  The context
creation is just one function call already.  And if the pmNewContext
failed, one can ignore its rc anyway and let the following
pmCreateFetchGroup return the PM_ERR_NOCONTEXT.


> - consider another function that is a cross between
> pmExtendFetchGroup_item and _indom supplying a regex for a subset of
> instance names (i.e. use an indom profile) Would be very handy for
> tools that want to filter their reports, and more scalable and
> efficient with very large indoms since they'd actually use the
> profile, e.g. massive disk farms with multipath scsi. And proc.

That's an interesting idea.  One problem is that a fetchgroup can
refer to multiple metrics against the same indom (with different set
of instances).  As soon as you have two metrics with different indom
subsets, we need to grow to the union of them anyway in the context
instance-profile, and then filter the results on the client side.  So
we'd have to do the regex processing -twice-.  Anyway, it's doable
just messy; do you think it is worth doing urgently?


> - heaps more QA :
>     qa for multiple fetchgroups from the same context

Already documented as improper (esp. without a functional pmDupContext).


>     qa for multiple fetchgroups using multiple contexts with same
>     and different host (ditto in archive mode for qa determinism)

Since the C pmfg implementation has no internal global state at all,
this is almost trivially true.  Independent fetchgroups would work
against independent contexts exactly the same way as the classical
PMAPI calls would work against them.  The "almost" part above is just
for the pmfg internal pmUseContext save/restore stuff, which is
indeed worth testing.


>     qa for pmFetchGroupSetMode in archive mode. 

OK.


>     Some API code for pmtime would help too.

What do you have in mind?


>     qa for multithreaded use (with separate fetch group in each
>     thread).  Not safe to share a fetchgroup handle between threads
>     (that's fine and made clear), but what about separate fg handles
>     bound to the same context?

Again, this is already forbidden.


>     qa with valgrind to check pmDestroyFetchGroup frees everything
>     for all the above tests and also that regular tools using this
>     API don't grow unnecessarily.

Basic valgrinding already done (qa/802).

>     qa for rate conversion of counter metrics

Already done by cunning choice of metrics in the qa scripts.

>     (and clearer err reporting)

(See below.)

>     qa coverage on all supported platforms

Not sure what the source code can do there.  The only
platform-specificity I'm aware of that I touched was in pmstat's
metric-fallback logic, which I replicated in simplified form.


> - The following doesn't work - the error message could perhaps be
> improved to report that two fetches are needed for rate conversion
> of metrics with counter semantics, rather than just throwing an
> obscure exception about nested lambdas and missing metric values :P
> [...]

Well, if PM_ERR_VALUE is not the right error number for missing
values, what is?  Or shall we send back a 0 sentinel value? :-) Or do
an initial hidden pmFetch?  (But even then one can have missing values
and thus impossible rate conversions.)

(The mention of nested lambdas is just the python traceback which is
out of our control.)


- FChE

<Prev in Thread] Current Thread [Next in Thread>