pcp
[Top] [All Lists]

fche/for-merge updates (was Re: [pcp] graphite interfacing prototype)

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: fche/for-merge updates (was Re: [pcp] graphite interfacing prototype)
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Mon, 14 Apr 2014 04:17:47 -0400 (EDT)
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20140414021313.GH14108@xxxxxxxxxx>
References: <20140414021313.GH14108@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: jBeK0Ew1KmDcoyOtnTrqoVkZkNU/SA==
Thread-topic: fche/for-merge updates (was Re: [pcp] graphite interfacing prototype)
Hi Frank!

----- Original Message -----
> Hi -
> [...]

Thanks for looking into the graphite support, haven't reviewed it yet
but have fwd'd your post on to several people who have asked for this
and feedback has been universally positive & grateful!  :)

> It required correcting a few problems in the python bindings, which
> would be really handy to pull from the fche/for-merge branch into
> release 3.9.2.

BTW, there are several changes in fche/for-merge that you've not put
in the summary - any reason for that?  Some of them look useful and
possibly merge-ready too.  Others, maybe not ready yet ... hmmm, who
knows?  It makes it impossible to just git pull into the dev branch,
anyway, which slows my workflow.

Comments inline... (I'm still hopeful Ken will review the new qa/666
and offer suggestions - 6 minutes is still a really, really long time
for a test to run - its >5 minutes longer than most other tests and
about double the next slowest test.  And *that* other test is already
a big problem IMO so I'm very reluctant to add more QA laggards).

> commit 1d235890e44e53d738a6ecc05b08cc2bc7ff2e1b (HEAD, origin/fche/for-merge,
> fche/for-merge)
> Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
> Date:   Sat Apr 12 23:57:34 2014 -0400
> 
>     pcp pmapi: fix pmConvScale parametrization
>     
>     The python binding to pmConvScale should be given a general pmUnits
>     struct as a target scale, as at the C PMAPI level, not a parameter
>     that limits us to "1 $space"-units.

Good find, however this is an API breaking change.  A fully backward
compatible solution would be to use type(units) and switch behaviour
based on the type of the final parameter passed in.

>     While in the vicinity, fix pmLookupName's exception message to
>     identify the metrics that failed resolution, instead of an internal
>     magic python char* rune.

Hmm, not so sure about this - looking inside the libpcp code, errors
returned here are not related to the names at all, right?  (they are
runtime errors, so things like PDU I/O errors, and so on).

Passing out the names doesn't really help to understand the failure,
AFAICT?  It looks like we should just treat this one the same as all
the other pmErr exceptions (IOW, neither pmid/names list passed out).

Name lookup failures (i.e. failure related to the names themselves,
not other errors) are the area of auto-reporting via "relaxed mode",
which appears to be fine.  (or did it not suit/help here?)

What was the error case you came across here, OOC?

> commit 3749b110f1d4e3dd5245779a68a21efdd98df969
> Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
> Date:   Sat Apr 12 22:00:28 2014 -0400
> 
>     python pmapi.c: Don't crash on exceptions from options_callback()
>     
>     In case the python routine being invoked exits with an exception
>     (result is NULL), we shouldn't try to Py_DECREF the bad boy.

Good catch!

> commit 163db2e7ae96437a33ca8a0535af8bde35e95a34
> Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
> Date:   Sat Apr 12 21:46:51 2014 -0400
> 
>     libpcp __pmSetProgname: copy incoming string
>     
>     __pmSetProgname, undocumented but widely loved, is used to being given
>     system argv[0] as a parameter, which it has saved verbatim under the
>     assumption that the underlying string can never change.  This assumption
>     is broken by the python pmoptions bindings, wherein random python string
>     copies are processed, and occasionally handed to __pmSetProgname.  This
>     can easily corrupt pmUsageMessage().

Wow, amazed that I have not ever observed this!  Do any/all of
tests 739, 741, 742, 743, 991 fail for you?  Its clearly busted
(the python code, I mean) - good diagnosis, though I'm not super
keen on the fix...

>     In order to make it safer for mankind, __pmSetProgname henceforth copies
>     the incoming const char*'s with strdup().

So, I'm trying to ignore the fact that you're once again introducing
GNU coding style, into the middle of libpcp here (seriously?  oh the
humanity!).

But lets be more objective - all the people using valgrind on C PMAPI
tools will gnash their teeth with this change, as they will start to
get warnings now.  We need to be absolutely sure this is the only way
to fix this, and there appear to be other good options (below).

If we were to make this kind of change the qa/valgrind-suppressions
file would need to be updated.  Were there any failing QA tests after
this change?  Shame on myself/Ken if not...?

[ Thats not rhetorical - could well be no new failures - I think we're
reporting only "definitely lost" memory, not "maybe lost" in QA, so
this would possibly sneak under the radar.  Anyway, just a note - there
exists a PCP valgrind suppressions file - in case of any future changes
of this ilk (e.g. pmGetConfig behaves like this, so its in there). ]

At the end of the day, this is a python API problem, ideally we'd solve
it there and not affect the C or other language tools.  Does the patch
attached resolve the problem you observed?  Do we have QA coverage of
the problem with those existing tests I wrote (above) or do we need to
add something more?  (any hints re increasing chances of hitting it?)

thanks!

--
Nathan

Attachment: patchlet
Description: Text Data

<Prev in Thread] Current Thread [Next in Thread>