pcp
[Top] [All Lists]

Re: fche/for-merge updates (was Re: [pcp] graphite interfacing prototype

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: fche/for-merge updates (was Re: [pcp] graphite interfacing prototype)
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Mon, 14 Apr 2014 10:17:32 -0400
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <466664093.5020375.1397463467942.JavaMail.zimbra@xxxxxxxxxx>
References: <20140414021313.GH14108@xxxxxxxxxx> <466664093.5020375.1397463467942.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi, Nathan -

> [...]
> BTW, there are several changes in fche/for-merge that you've not put
> in the summary - any reason for that?  Some of them look useful and
> possibly merge-ready too.  Others, maybe not ready yet ... hmmm, who
> knows?  It makes it impossible to just git pull into the dev branch,
> anyway, which slows my workflow.

I considered all the for-merge bits ready to go, just didn't want
to advertise them in the context of the python work.


> Comments inline... (I'm still hopeful Ken will review the new qa/666
> and offer suggestions - 6 minutes is still a really, really long time
> for a test to run - its >5 minutes longer than most other tests and
> about double the next slowest test.  And *that* other test is already
> a big problem IMO so I'm very reluctant to add more QA laggards).

Fair enough, but I'm reluctant to keep working on it more without a
definite "good enough" target.  (Of course one could reduce test
coverage to save time, but that's not dignified.)


> [...]
> >     While in the vicinity, fix pmLookupName's exception message to
> >     identify the metrics that failed resolution, instead of an internal
> >     magic python char* rune.
> 
> Hmm, not so sure about this - looking inside the libpcp code, errors
> returned here are not related to the names at all, right?  (they are
> runtime errors, so things like PDU I/O errors, and so on).

In this particular case, the error parameter was for an informative
name, but the object passed back was the ctypes char* rather than the
python string, which made the error message unnecessarily opaque.


> Passing out the names doesn't really help to understand the failure,
> AFAICT?  It looks like we should just treat this one the same as all
> the other pmErr exceptions (IOW, neither pmid/names list passed out).
> [...]

There are many cases where the pcp/pmapi.py code does something like:

      raise pmErr, (status, nameA)

In these cases, returning legible parameters (a native python object
rather than nameA) would be helpful.

> What was the error case you came across here, OOC?

Sending bad metric names to pcp-graphite.py, either before or after
the PMNS traversal code-additions.


> [...]
> > commit 163db2e7ae96437a33ca8a0535af8bde35e95a34
> > Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
> > Date:   Sat Apr 12 21:46:51 2014 -0400
> > 
> >     libpcp __pmSetProgname: copy incoming string
> [...]
>
> Wow, amazed that I have not ever observed this!  Do any/all of
> tests 739, 741, 742, 743, 991 fail for you?  [...]

No.

> >     In order to make it safer for mankind, __pmSetProgname henceforth copies
> >     the incoming const char*'s with strdup().
> 
> So, I'm trying to ignore the fact that you're once again introducing
> GNU coding style, into the middle of libpcp here (seriously?  oh the
> humanity!).

Oops, you've uncovered my secret scheme to eventually convert the
whole codebase!


> But lets be more objective - all the people using valgrind on C
> PMAPI tools will gnash their teeth with this change, as they will
> start to get warnings now.  [...]  If we were to make this kind of
> change the qa/valgrind-suppressions file would need to be updated.

Did you see an actual warning from this change?  For programs that
call __pmSetProgname only once (as is typical), there is no "leak"
in that a global variable maintains a reference to the object.  I
can't get valgrind to even mention it.

> [...] At the end of the day, this is a python API problem, ideally
> we'd solve it there and not affect the C or other language tools.

I disagree.  There is no guarantee that a program's original argv[0]
pointer needs to point at a stable string.

> Does the patch attached resolve the problem you observed?

I'm not sure it's sufficient, and IMHO the patch is worse: it still
introduces the same kind of leak, and is now far removed from the
place in libpcp that holds onto that pointer and makes the poor
assumption.


> Do we have QA coverage of the problem with those existing tests I
> wrote (above) or do we need to add something more?  (any hints re
> increasing chances of hitting it?)

You could strcpy(argv[0], "bozo the clown"); in a C PMAPI program
sometime after __pmSetProgname().


The only other patch in fche/for-merge was this:

commit e54d33f7fde4208a01107cd807292e9aaeffd3f6
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sat Apr 12 14:32:46 2014 -0400

    pcp.sh: add Available Commands: list to usage message, like already 
documented

I see that you have your own version in /dev now.  FWIW, I like this
one better because its logic matches that of what is done later in
pcp.sh, namely the ordering and executability checking, and does
not litter in $tmp.  YMMV, HTH, HAND, & c.


- FChE

<Prev in Thread] Current Thread [Next in Thread>