Hi,
here are the issues mentioned earlier I found when trying to run the QA
test I wanted (1069) against the latest git code:
0) I started with:
./configure --prefix=/tmp/pcp && make && make install
export LD_LIBRARY_PATH=/tmp/pcp/lib
export PATH=/tmp/pcp/bin:$PATH
export PCP_CONF=/tmp/pcp/etc/pcp.conf
Initially I thought I could avoid doing anything on system level by
setting also
export PMCD_PORT=44444
export PMLOGGER_PORT=33333
but when system pmcd (on port 44321) was not running during chk.setup I saw:
make[1]: *** [foo.0] Error 1
make: *** [setup] Error 2
...
Contacting local pmcd at localhost ... no response (fatal)
...
1) Then I had system pmcd+pmlogger running from the standard
installation, *_PORT not set and I added pmdasample there as instructed
in qa/README
2) I didn't adjust common.config nor qa_hosts.master as 1069 doesn't
need remote hosts
3) I did:
cd /tmp/pcp/var/lib/pcp/testsuite
./chk.setup
The script asks for password for sudo for few times without telling any
reason for that (so I didn't enter it). Then it says:
./mk.qa_hosts: no #order line matches this host "localhost", local
testing only
which looks ok. However, clienttimeout.c failed to compile, perhaps it
could be compiled already during earlier steps with correct include
paths etc.
clienttimeout.c: In function âmainâ:
clienttimeout.c:90:2: warning: implicit declaration of function
â__pmSetConnectTimeoutâ [-Wimplicit-function-declaration]
if ((sts = __pmSetConnectTimeout(conn_timeout)) < 0) {
...
After manually getting it to compile (by commenting out everything as
I'm not planning to test client timeouts) and after very long pauses
(the script takes almost 10 minutes here in total compared to 3 minutes
or configure/make/make install), despite the earlier "local testing
only", the script goes to contact hilo.sgi.com and bozo-laptop which
give no response.
In the end pmcd on localhost was detected ok however. PCP_PLATFORM and
PCP_VERSION were also detected ok.
4) ./check 1069 starts by again asking pw for sudo (didn't enter) and
the test fails.
Then I took a risk and enabled sudo and started from scratch.
3) Same as before.
4) For ./check 1069 I then saw:
PMDA simple is not responding
And:
/tmp/pcp/var/log/pcp/pmcd/pmcd.log: not found
/tmp/pcp/var/log/pcp/pmcd/simple.log: not found
Restarting PMCD ...
PMCD process ... 31256
/tmp/pcp/share/pcp/lib/pmcd:
Warning: found no /tmp/pcp/var/run/pcp/pmcd.pid
and no /tmp/pcp/var/log/pcp/pmcd/pmcd.log.
Assuming an uninstall from a chroot: pmcd not killed.
If this is incorrect, "pmsignal -s TERM 31256" can be used.
/tmp/pcp/share/pcp/lib/pmlogger: Warning: Performance Co-Pilot archive
logger(s) not permanently enabled.
To enable pmlogger, run the following as root:
# /bin/systemctl enable pmlogger.service
Starting pmlogger ...
Trying to re-install PMDA simple from /tmp/pcp/var/lib/pcp/pmdas/simple ...
FYI ... here are the PMCD logs
/tmp/pcp/var/log/pcp/pmcd/pmcd.log: not found
/tmp/pcp/var/log/pcp/pmcd/simple.log: not found
mktemp: failed to create directory via template
â/tmp/pcp/var/tmp/pmdaproc.XXXXXXXXXâ: No such file or directory
Cannot make PMDA simple work, ... giving up!
After I installed pmdasimple and did mkdir -p /tmp/pcp/var/log/pcp/pmcd
/tmp/pcp/var/tmp I saw the script bailing out after pmlogger issues.
Third attempt from scratch with the above settings, leftover processes
killed and system pmlogger stopped and
/tmp/pcp/var/log/pcp/pmlogger/localhost created then I saw for during
./check 1069:
./check 1069
PMDA probe: pminfo -h localhost -f sample.milliseconds
PMDA probe: pminfo -h localhost -f sampledso.milliseconds
PMDA probe: pminfo -h localhost -f simple.numfetch
Primary pmlogger not running ...
chkconfig pmlogger on, and restart PMCD
PMCD process ... 6041
/tmp/pcp/share/pcp/lib/pmcd:
Warning: found no /tmp/pcp/var/run/pcp/pmcd.pid
and no /tmp/pcp/var/log/pcp/pmcd/pmcd.log.
Assuming an uninstall from a chroot: pmcd not killed.
If this is incorrect, "pmsignal -s TERM 6041" can be used.
Starting pmlogger ...
Arrgghhh ... pmlogger (primary) failed to start after 20 seconds
pmlogger log (/tmp/pcp/var/log/pcp/pmlogger/localhost/pmlogger.log) ...
cat: /tmp/pcp/var/log/pcp/pmlogger/localhost/pmlogger.log: No such file
or directory
pmlc output ...
pmlc -P
Unable to connect to primary pmlogger at local:: Connection refused
...
At some point -T 3 had been added to the system pmcd.options by QA
scripts but I had removed it when starting from scratch, for some reason
this doesn't happen any more but I had to manually add it, then I
finally saw:
...
1069 - output mismatch (see 1069.out.bad)
89,95c89
< s.combo
< util
< N/A
< N/A
< 1.001
< 1.001
< 1.001
---
> Failed to register derived metric: Invalid syntax (expected
> metric=expression).
Check local PMCD is still alive ...
...
But this is something else than a pmrep issue as after some debugging I
figured out the failing command:
pmrep -s 5 -t 2 --archive
/tmp/pcp/var/lib/pcp/testsuite/archives/sample-secs -z -e "sample.combo
= sample.seconds + (sample.milliseconds / 1000)" sample.combo
which indeed fails, but if I change the shebang from "/usr/bin/pcp
python" to "/usr/bin/env python2" (or python3), it works as expected
then (meaning that we're again using system PCP not the freshly
installed PCP). Then, finally, I managed to run this one QA test for
pmrep without errors.
All in all, to get to this point it took more time than to write the
xlsx output support for pmrep so perhaps this explains why I haven't
contributed any QA tests so far. And allowing sudo for scripts which
play with both local and system scripts and services is too risky and
something I won't allow again in the future.
Thanks,
--
Marko Myllynen
|