On 02/09/14 14:30, Nathan Scott wrote:
> ...
But, can you connect to the pmcd PID 30540 with gdb and get a stacktrace?
I'd be interested to know what its up to, it should have exited.
pmcd & libpcp are a bit short on symbols ...
Loaded symbols for /usr/lib/x86_64-linux-gnu/libdb-5.1.so
warning: no loadable sections found in added symbol-file system-supplied
DSO at 0x7fff82bfe000
0x00007f4c82f54e03 in __select_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) where
#0 0x00007f4c82f54e03 in __select_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f4c834801af in ?? () from /usr/lib/libpcp.so.3
#2 0x00007f4c838d2297 in ?? ()
#3 0x00007f4c838d114b in main ()
(gdb)
Looks like it never got the signal ... here's the pmcd log file at the
point of the hang
kenj@vm20:~/src/pcp/qa$ cat /tmp/861-30526.log
Log for pmcd on vm20 started Tue Sep 2 10:15:55 2014
active agent dom pid in out ver protocol parameters
============ === ===== === === === ======== ==========
pmcd 2 2 dso i:5
lib=/var/lib/pcp/pmdas/pmcd/pmda_pmcd.so entry=pmcd_init [0x7f4c81312d60]
Host access list empty: host-based access control turned off
User access list empty: user-based access control turned off
Group access list empty: group-based access control turned off
pmcd: PID = 30540, PDU version = 2
pmcd request port(s):
sts fd port family address
=== ==== ===== ====== =======
ok 1026 unix /tmp/861-30526.pmcd.socket
ok 1024 9876 inet INADDR_ANY
ok 1025 9876 ipv6 INADDR_ANY
So
kenj@vm20:~/src/pcp/qa$ tail -f !$
tail -f /tmp/861-30526.log
Group access list empty: group-based access control turned off
pmcd: PID = 30540, PDU version = 2
pmcd request port(s):
sts fd port family address
=== ==== ===== ====== =======
ok 1026 unix /tmp/861-30526.pmcd.socket
ok 1024 9876 inet INADDR_ANY
ok 1025 9876 ipv6 INADDR_ANY
[Tue Sep 2 15:21:07] pmcd(30540) Info: pmcd caught SIGTERM from
pid=8258 uid=0
[Tue Sep 2 15:21:07] pmcd(30540) Info: pmcd Shutdown
Log finished Tue Sep 2 15:21:07 2014
And in another window I did # sudo kill -TERM 30540
and the caught SIGHTERM lines appear immediately after, and qa/861 fails
with
[91%] 861 - output mismatch (see 861.out.bad)
3,7c3
<
< pmcd.hostname PMID: 2.0.21
< Data Type: string InDom: PM_INDOM_NULL 0xffffffff
< Semantics: discrete Units: none
< value "nosuchhost.com"
---
> pminfo: Cannot connect to PMCD on host "local:": Connection refused
Check local PMCD is still alive ...
PMDA probe: pminfo -h vm20 -f sample.milliseconds
PMDA probe: pminfo -h vm20 -f sampledso.milliseconds
PMDA probe: pminfo -h vm20 -f simple.numfetch
|