----- Original Message -----
> > [...]
> > BTW, there are several changes in fche/for-merge that you've not put
> > in the summary - any reason for that? [...]
>
> I considered all the for-merge bits ready to go, just didn't want
> to advertise them in the context of the python work.
(nor in any other context either, though)
The guidelines in HACKING suggest this is a good, common methodology
to follow (2nd paragraph, last sentence). Otherwise, confusion and
misinterpretation-of-intent will result on my end - I am but a simple
lad, easily confused. Sending the report to the list gives everyone
else a good opportunity to review too (not just me all the time).
Thanks for helping out with that!
> > Comments inline... (I'm still hopeful Ken will review the new qa/666
> > and offer suggestions - 6 minutes is still a really, really long time
> > for a test to run - its >5 minutes longer than most other tests and
> > about double the next slowest test. And *that* other test is already
> > a big problem IMO so I'm very reluctant to add more QA laggards).
>
> Fair enough, but I'm reluctant to keep working on it more without a
> definite "good enough" target.
As mentioned on IRC, splitting the test into smaller parts may be one
strategy. For reference, the pmlogger check/daily tests are exercising
similar sorts of functionality (sorta) and there are ~15 tests there,
which are in the several-seconds to one-or-two-minutes-max range.
Those are the kinds of timeframes and number of tests I'd hope to see
for an important daemon like pmmgr. Some valgrind testing would be a
welcome addition too - its long-running, makes sense to check memory
use. And so on. Like all the other daemons - use those other tests
as references for what is normal here.
> (Of course one could reduce test coverage to save time,
> but that's not dignified.)
*nod* - goes without saying. Everyone loses if you do that.
> > [...]
> > Hmm, not so sure about this - looking inside the libpcp code, errors
> > returned here are not related to the names at all, right? (they are
> > runtime errors, so things like PDU I/O errors, and so on).
>
> In this particular case, the error parameter was for an informative
> name, but the object passed back was the ctypes char* rather than the
> python string, which made the error message unnecessarily opaque.
Understood. My question, though, was does passing back out parameters
passed in help at all here, not whether the existing code was useful.
> > Passing out the names doesn't really help to understand the failure,
> > AFAICT? It looks like we should just treat this one the same as all
> > the other pmErr exceptions (IOW, neither pmid/names list passed out).
> > [...]
> > What was the error case you came across here, OOC?
>
> Sending bad metric names to pcp-graphite.py, either before or after
> the PMNS traversal code-additions.
Right - ah, I see whats happened here. I was specifically asking about
pmLookupName, thinking it should be going through the second error code
handling path on name lookup failure, where it already returns the names
as a result of this (from pmLookupName(3))...
The result from pmLookupName will be the number of names translated in
the absence of errors, else an error code less than zero. When errors
are encountered, the corresponding value in pmidlist will be
PM_ID_NULL.
I think you are seeing a case where every single name fails to resolve,
right? This case (I think? hence my unanswered questions), is not
distinguished from other errors like networking errors, etc. For this
latter case, its probably unhelpful to misguide the user with an error
message containing details irrelevant to the failure.
To avoid this, the list could be conditionally appended if PM_ERR_NAME
is the error. Could you try that out with your test case? (and please
make a qa test addition to exercise these cases - both Stan and I have
been hacking in here, it appears to be a problematic area). Thanks!
Beware though - it could be a really long list of names - are we sure
we want to do this? (may result in potentially long list of names that
we dump on stderr, typically). Not sure, not sure at all.
> > [...]
> > > commit 163db2e7ae96437a33ca8a0535af8bde35e95a34
> > > Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
> > > Date: Sat Apr 12 21:46:51 2014 -0400
> > >
> > > libpcp __pmSetProgname: copy incoming string
> > [...]
> >
> > Wow, amazed that I have not ever observed this! Do any/all of
> > tests 739, 741, 742, 743, 991 fail for you? [...]
>
> No.
OK - so we have no tests that demonstrate the problem, on any python
version / test platforms? That should be addressed.
> > But lets be more objective - all the people using valgrind on C
> > PMAPI tools will gnash their teeth with this change, as they will
> > start to get warnings now. [...] If we were to make this kind of
> > change the qa/valgrind-suppressions file would need to be updated.
>
> Did you see an actual warning from this change? For programs that
> call __pmSetProgname only once (as is typical), there is no "leak"
> in that a global variable maintains a reference to the object. I
> can't get valgrind to even mention it.
Nope, this assertion was based on the way pmGetConfig behaves and is
known to trigger valgrind warnings. But on review, that is slightly
different and the permanent pointer reference will indeed keep any
valgrind warnings at bay - you're right! (sorry, my mistake)
> > Does the patch attached resolve the problem you observed?
>
> I'm not sure it's sufficient, and IMHO the patch is worse
So ... you didn't try it out for me on your test case? (please do &
let me know, and could you add the python test case to the testsuite
please? this discussion is made more difficult because we're talking
about abstract instead of concrete test cases).
libpcp has taken this pragmatic approach for so many years - I think
its just fine. There are no performance tools that I've come across
that would ever want to change their argv[0] after starting. So, I'm
going for "practicality beats purity" and limiting the scope of the
change, particularly as we are so close to a release.
> [...]
> pcp.sh: add Available Commands: list to usage message, like already
> documented
>
> I see that you have your own version in /dev now. FWIW, I like this
> one better because its logic matches that of what is done later in
> pcp.sh, namely the ordering and executability checking, and does
> not litter in $tmp. YMMV, HTH, HAND, & c.
>
*nod* - I really like the way your variant checks the executable bits,
I've observed that problem as well. I'm not seeing value in ensuring
the path evaluation order is maintained from elsewhere - keeping the
list sorted makes it easier to find things in it. The other version
also neatly formats the output over multiple lines, handy as the list
grows also.
This script already uses numerous tmpfiles and cleans 'em up already,
so I wouldn't worry too much about that.
So in the end, I've taken another look though and have attempted to
produce a best-of-both-worlds approach for this release.
cheers.
--
Nathan
|