Marko,
Sorry there was so much pain for you here. I've been running PCP QA
every day for 20+ years, currently across 30+ machines, so there is
ample evidence to suggest your grief does not need to be systemic.
There appear to be several issues early on in your saga that might
explain the differences in our experiences ...
On 16/12/15 18:52, Marko Myllynen wrote:
Hi,
here are the issues mentioned earlier I found when trying to run the QA
test I wanted (1069) against the latest git code:
0) I started with:
./configure --prefix=/tmp/pcp && make && make install
export LD_LIBRARY_PATH=/tmp/pcp/lib
export PATH=/tmp/pcp/bin:$PATH
export PCP_CONF=/tmp/pcp/etc/pcp.conf
You're already in unchartered water here ... I've never tested the
--prefix=... option for a build and would not be surprised if this has
problems.
I don't think this should be necessary, nor should it be attempted ...
the PCP QA suite is designed with a philosophy that it is trying to
exercise the code in a context that is as close as possible to that
which an end-user would experience. This suggests the software to be
tested should be installed in the "usual" places, with the "usual"
permissions and operate on the "usual" ports. The qa/README file should
make this clear, and it does not (or did not until a few moments ago).
Yes, PCP QA should not run alongside a production deployment. But all
our QA machines are (a) dedicated to that function, or (b) belong to
developers.
Initially I thought I could avoid doing anything on system level by
setting also
export PMCD_PORT=44444
export PMLOGGER_PORT=33333
Same comments as above about non-standard execution environment.
but when system pmcd (on port 44321) was not running during chk.setup I saw:
OK, here is the next source of your pain. chk.setup and some of the
defaults in common.config) date from the SGI days (hence the initial
attempt to contact the long-gone host hilo.sgi.com) ... this needs to be
pulled apart and reworked.
make[1]: *** [foo.0] Error 1
make: *** [setup] Error 2
...
Contacting local pmcd at localhost ... no response (fatal)
...
I suspect this is because there was no pmcd listening on port $PMCD_PORT
at this stage.
1) Then I had system pmcd+pmlogger running from the standard
installation, *_PORT not set and I added pmdasample there as instructed
in qa/README
OK, that should have worked better, except for the --prefix comments above.
2) I didn't adjust common.config nor qa_hosts.master as 1069 doesn't
need remote hosts
OK
3) I did:
cd /tmp/pcp/var/lib/pcp/testsuite
./chk.setup
The script asks for password for sudo for few times without telling any
reason for that (so I didn't enter it). Then it says:
The first is runing sudo -E to see if -E works ... this could be
re-engineered (and I've now done that)
./mk.qa_hosts: no #order line matches this host "localhost", local
testing only
which looks ok. However, clienttimeout.c failed to compile, perhaps it
could be compiled already during earlier steps with correct include
paths etc.
clienttimeout.c: In function âmainâ:
clienttimeout.c:90:2: warning: implicit declaration of function
â__pmSetConnectTimeoutâ [-Wimplicit-function-declaration]
if ((sts = __pmSetConnectTimeout(conn_timeout)) < 0) {
...
Almost for sure this is --prefix infection ... the PCP headers have
probably been installed in a place that gcc and the makefiles don't know
about.
And I think everything from here on down is also a consequence of using
--prefix in the configure and build.
...
All in all, to get to this point it took more time than to write the
xlsx output support for pmrep so perhaps this explains why I haven't
contributed any QA tests so far. And allowing sudo for scripts which
play with both local and system scripts and services is too risky and
something I won't allow again in the future.
It should not have been this hard. I think you took the wrong fork in
the road very early on (in the absence of any obvious guidance) and the
wheels began to wobble before dropping off and dumping you in the ditch.
The use of sudo in the PCP QA Suite when run against a normal PCP
installation (everything in the expected place) is completely safe in my
experience ... I'm running more than 50,000 PCP QA tests per week and
have not had a single system trashed in the process.
I'll fix the chk.setup issues (mostly defaults in common.config that are
no longer sensible, and don't have to be set especially in your use case
of "I just want to run test 1069").
Hopefully I can encourage you to try once more in the light of these
comments ... we need to make sure Marko's experience is not the norm for
a PCP QA newbie.
|