pcp
[Top] [All Lists]

Re: [pcp] Issues running QA

To: myllynen@xxxxxxxxxx, pcp developers <pcp@xxxxxxxxxxx>
Subject: Re: [pcp] Issues running QA
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Thu, 17 Dec 2015 08:26:37 +1100
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <5671184C.1@xxxxxxxxxx>
References: <5671184C.1@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
Marko,

Sorry there was so much pain for you here. I've been running PCP QA every day for 20+ years, currently across 30+ machines, so there is ample evidence to suggest your grief does not need to be systemic.

There appear to be several issues early on in your saga that might explain the differences in our experiences ...

On 16/12/15 18:52, Marko Myllynen wrote:
Hi,

here are the issues mentioned earlier I found when trying to run the QA
test I wanted (1069) against the latest git code:

0) I started with:

./configure --prefix=/tmp/pcp && make && make install
export LD_LIBRARY_PATH=/tmp/pcp/lib
export PATH=/tmp/pcp/bin:$PATH
export PCP_CONF=/tmp/pcp/etc/pcp.conf

You're already in unchartered water here ... I've never tested the --prefix=... option for a build and would not be surprised if this has problems.

I don't think this should be necessary, nor should it be attempted ... the PCP QA suite is designed with a philosophy that it is trying to exercise the code in a context that is as close as possible to that which an end-user would experience. This suggests the software to be tested should be installed in the "usual" places, with the "usual" permissions and operate on the "usual" ports. The qa/README file should make this clear, and it does not (or did not until a few moments ago).

Yes, PCP QA should not run alongside a production deployment. But all our QA machines are (a) dedicated to that function, or (b) belong to developers.

Initially I thought I could avoid doing anything on system level by
setting also

export PMCD_PORT=44444
export PMLOGGER_PORT=33333

Same comments as above about non-standard execution environment.

but when system pmcd (on port 44321) was not running during chk.setup I saw:

OK, here is the next source of your pain. chk.setup and some of the defaults in common.config) date from the SGI days (hence the initial attempt to contact the long-gone host hilo.sgi.com) ... this needs to be pulled apart and reworked.

make[1]: *** [foo.0] Error 1
make: *** [setup] Error 2
...
Contacting local pmcd at localhost ... no response (fatal)
...

I suspect this is because there was no pmcd listening on port $PMCD_PORT at this stage.

1) Then I had system pmcd+pmlogger running from the standard
installation, *_PORT not set and I added pmdasample there as instructed
in qa/README

OK, that should have worked better, except for the --prefix comments above.

2) I didn't adjust common.config nor qa_hosts.master as 1069 doesn't
need remote hosts

OK

3) I did:

cd /tmp/pcp/var/lib/pcp/testsuite
./chk.setup

The script asks for password for sudo for few times without telling any
reason for that (so I didn't enter it). Then it says:

The first is runing sudo -E to see if -E works ... this could be re-engineered (and I've now done that)

./mk.qa_hosts: no #order line matches this host "localhost", local
testing only
which looks ok. However, clienttimeout.c failed to compile, perhaps it
could be compiled already during earlier steps with correct include
paths etc.

clienttimeout.c: In function âmainâ:
clienttimeout.c:90:2: warning: implicit declaration of function
â__pmSetConnectTimeoutâ [-Wimplicit-function-declaration]
   if ((sts = __pmSetConnectTimeout(conn_timeout)) < 0) {
...

Almost for sure this is --prefix infection ... the PCP headers have probably been installed in a place that gcc and the makefiles don't know about.

And I think everything from here on down is also a consequence of using --prefix in the configure and build.

...
All in all, to get to this point it took more time than to write the
xlsx output support for pmrep so perhaps this explains why I haven't
contributed any QA tests so far. And allowing sudo for scripts which
play with both local and system scripts and services is too risky and
something I won't allow again in the future.

It should not have been this hard. I think you took the wrong fork in the road very early on (in the absence of any obvious guidance) and the wheels began to wobble before dropping off and dumping you in the ditch.

The use of sudo in the PCP QA Suite when run against a normal PCP installation (everything in the expected place) is completely safe in my experience ... I'm running more than 50,000 PCP QA tests per week and have not had a single system trashed in the process.

I'll fix the chk.setup issues (mostly defaults in common.config that are no longer sensible, and don't have to be set especially in your use case of "I just want to run test 1069").

Hopefully I can encourage you to try once more in the light of these comments ... we need to make sure Marko's experience is not the norm for a PCP QA newbie.

<Prev in Thread] Current Thread [Next in Thread>