pcp
[Top] [All Lists]

Re: [pcp] fetchgroups api - python bindings

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: [pcp] fetchgroups api - python bindings
From: Mark Goodwin <mgoodwin@xxxxxxxxxx>
Date: Mon, 14 Dec 2015 15:16:03 +1000
Cc: myllynen@xxxxxxxxxx, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20151211150348.GH22434@xxxxxxxxxx>
References: <20151206204742.GC22561@xxxxxxxxxx> <5666E4F7.4070005@xxxxxxxxxx> <y0m4mft10vf.fsf@xxxxxxxx> <566A59A5.6090403@xxxxxxxxxx> <20151211150348.GH22434@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
On 12/12/2015 01:03 AM, Frank Ch. Eigler wrote:
Hi, Mark -

Wow, thanks for your effort & comments.


[...]
- sentinal value of 0 as an alternative error indicator for integer metrics,
as explained at the end of the pmfetchgroup(3) man page :

0 is a very valid value for many metrics (especially those with
instant semantics) so is hardly a good choice for indicating "no
value available".  Perhaps just do away with this

Note that other numbers like -1 seem tempting, but from other actual
usage, like the pmstat conversion, 0 is useful.

and rely only on the per-instance error returns?  (and mandate the
currently optional error arrays in the fetchgroup_create calls)?

It's one of those cases where we can't really mandate.  Even if we
make the caller pass in an int* of statuses, we can't force them to
check them [...]

omitting error checks would encourage poor code. And using an ambiguous
sentinel to indicate either an error or a valid value seems wrong. e.g.
consider network.interface.total.errors ...  maybe we could keep the per
instance err arrays internally (if not passed as args), and provide a
function to check for errors for a particular fetchgroup instance/value

- Need some API documentation and more examples of the python binding in
pmFetchGroup(3)

Hm, where are python APIs documented in general?  Not in man pages
AFAIK.

As Lukas pointed out, in the code itself, including example code. And in
the Programmer's Guide. The section 3 man page could also document the
python binding and provide usage examples. Lots of scope here

[...]  why not have PMAPI do a pmReconnectContext underneath
us all the time?

good idea - I've always thought it should, but only after a successful 
connection
has already been established and subsequently dropped. IIRC pmReconnectContext
was added in response to users complaining their long running charts stopped
scrolling after they bounced a server. I think there's more historical context
to this ... can't remember the details though.

- We can extract inst id, inst name, values and tiemstamps. But how
about helptext and some of the other descriptor fields? Or just use
the existing pmapi functions off the contextfor this?

Yeah.


- how come no support for PM_TYPE_AGGREGATE (and events)?

AGGREGATE would be a possibility, using the pmAtomValue vbp pointer, I
guess, but it seemed far-fetched as a beneficiary of rate/unit
conversion.  The only real PMDAs that provide aggregate data at the
moment are windows-events and systemd (which also offers non-blob
alternatives for the same data).  How about we leave this as a todo,
in case it becomes interesting?

Well, it may have just become interesting! as per Marko's reply

- Do we want a c++ binding (similar to API to py)?

Well, possible, but libpcp doesn't speak c++ AIUI.  Where would one
plop one?  And it's not like the pmfg C API is that bad from C++
clients: see pmmgr.

yeah fair enough, drop this idea for now. It just seemed to me the
py binding is so much nicer due to default args etc. The code ends up
nice and terse. c++ wrappers for just the pmfg calls would be good

One could rework pmcc to use pmfg perhaps.  Or leave it alone as a
legacy API.

I wasn't thinking of reimplementing pmcc to use fetchgroup. But rather
that future extensiosn to pmcc could use it, as new convenience classes
factored out as handy common code (which is kind of where pmcc originated
from)

- overload pmCreateFetchGroup() to take either a context, or a
source string, (defaulting to "local:"). And then provide method to
return the context for use by other pmapi functions.

Does that really seem like it would save anything?  The context
creation is just one function call already.  And if the pmNewContext
failed, one can ignore its rc anyway and let the following
pmCreateFetchGroup return the PM_ERR_NOCONTEXT.

well it saves one or two lines or code (no big deal), but perhaps
more importantly it reinforces one fg per context.

- consider another function that is a cross between
pmExtendFetchGroup_item and _indom supplying a regex for a subset of
instance names (i.e. use an indom profile) Would be very handy for
tools that want to filter their reports, and more scalable and
efficient with very large indoms since they'd actually use the
profile, e.g. massive disk farms with multipath scsi. And proc.

That's an interesting idea.  One problem is that a fetchgroup can
refer to multiple metrics against the same indom (with different set
of instances).  As soon as you have two metrics with different indom
subsets, we need to grow to the union of them anyway in the context
instance-profile, and then filter the results on the client side.  So
we'd have to do the regex processing -twice-.  Anyway, it's doable
just messy; do you think it is worth doing urgently?

some kind of common filtering API would make sense, but not urgently.
(add to pmcc perhaps)

- heaps more QA :
     qa for multiple fetchgroups from the same context

Already documented as improper (esp. without a functional pmDupContext).

then an error should be returned right?

     qa for multiple fetchgroups using multiple contexts with same
     and different host (ditto in archive mode for qa determinism)

Since the C pmfg implementation has no internal global state at all,
this is almost trivially true.  Independent fetchgroups would work
against independent contexts exactly the same way as the classical
PMAPI calls would work against them.  The "almost" part above is just
for the pmfg internal pmUseContext save/restore stuff, which is
indeed worth testing.

ok, good to test then
     qa for pmFetchGroupSetMode in archive mode.

OK.


     Some API code for pmtime would help too.

What do you have in mind?

connect to pmtime(1) to control the main loop. there are currently no py
bindings for this, so it's not really a pmfg rfe per-se.

     qa for multithreaded use (with separate fetch group in each
     thread).  Not safe to share a fetchgroup handle between threads
     (that's fine and made clear), but what about separate fg handles
     bound to the same context?

Again, this is already forbidden.

needs err /exception handling then
     qa with valgrind to check pmDestroyFetchGroup frees everything
     for all the above tests and also that regular tools using this
     API don't grow unnecessarily.

Basic valgrinding already done (qa/802).

     qa for rate conversion of counter metrics

Already done by cunning choice of metrics in the qa scripts.

     (and clearer err reporting)

(See below.)

     qa coverage on all supported platforms

Not sure what the source code can do there.  The only
platform-specificity I'm aware of that I touched was in pmstat's
metric-fallback logic, which I replicated in simplified form.


- The following doesn't work - the error message could perhaps be
improved to report that two fetches are needed for rate conversion
of metrics with counter semantics, rather than just throwing an
obscure exception about nested lambdas and missing metric values :P
[...]

Well, if PM_ERR_VALUE is not the right error number for missing
values, what is?  Or shall we send back a 0 sentinel value? :-)

PM_ERR_VALUE is more for values that are not fetchable for whatever reason
(inst went away, whatever). Here we need two values for the rate conversion
and the values _are_ available. so maybe PM_ERR_AGAIN? or a new err code?
(since we're extending the pampi here)

Or do
an initial hidden pmFetch?  (But even then one can have missing values
and thus impossible rate conversions.)

an initial hidden fetch would solve the issue, but the first rate
converted value would likely use the wrong time delta (the API doesn't
know the update interval used by a tool).

(The mention of nested lambdas is just the python traceback which is
out of our control.)


my point was the resulting py exception is totally obscure - a naive
programmer will struggle understanding why they're getting these weird
bizarre errors for metrics that have counter semantics using the pmfg api.
Perhaps a hidden first fetch is better - proper tools will ensure two
fetches before reporting anything anyway; this is the first time any PCP
API has actually done rate conversion automatically - it's always been
left to the caller AFAIK

Regards
-- Mark

<Prev in Thread] Current Thread [Next in Thread>