pcp
[Top] [All Lists]

Re: [pcp] qa/861 hanging

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] qa/861 hanging
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Tue, 02 Sep 2014 15:26:29 +1000
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1930154057.42145974.1409632207874.JavaMail.zimbra@xxxxxxxxxx>
References: <540533AF.3030308@xxxxxxxxxxxxxxxx> <1946971949.42117047.1409627496431.JavaMail.zimbra@xxxxxxxxxx> <54054425.9000804@xxxxxxxxxxxxxxxx> <1930154057.42145974.1409632207874.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0
On 02/09/14 14:30, Nathan Scott wrote:
> ...
But, can you connect to the pmcd PID 30540 with gdb and get a stacktrace?
I'd be interested to know what its up to, it should have exited.


pmcd & libpcp are a bit short on symbols ...

Loaded symbols for /usr/lib/x86_64-linux-gnu/libdb-5.1.so

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff82bfe000
0x00007f4c82f54e03 in __select_nocancel ()
    at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) where
#0  0x00007f4c82f54e03 in __select_nocancel ()
    at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f4c834801af in ?? () from /usr/lib/libpcp.so.3
#2  0x00007f4c838d2297 in ?? ()
#3  0x00007f4c838d114b in main ()
(gdb)

Looks like it never got the signal ... here's the pmcd log file at the point of the hang

kenj@vm20:~/src/pcp/qa$ cat /tmp/861-30526.log
Log for pmcd on vm20 started Tue Sep  2 10:15:55 2014


active agent dom   pid  in out ver protocol parameters
============ === ===== === === === ======== ==========
pmcd 2 2 dso i:5 lib=/var/lib/pcp/pmdas/pmcd/pmda_pmcd.so entry=pmcd_init [0x7f4c81312d60]

Host access list empty: host-based access control turned off
User access list empty: user-based access control turned off
Group access list empty: group-based access control turned off


pmcd: PID = 30540, PDU version = 2
pmcd request port(s):
  sts fd   port  family address
  === ==== ===== ====== =======
  ok  1026       unix   /tmp/861-30526.pmcd.socket
  ok  1024  9876 inet   INADDR_ANY
  ok  1025  9876 ipv6   INADDR_ANY


So

kenj@vm20:~/src/pcp/qa$ tail -f !$
tail -f /tmp/861-30526.log
Group access list empty: group-based access control turned off


pmcd: PID = 30540, PDU version = 2
pmcd request port(s):
  sts fd   port  family address
  === ==== ===== ====== =======
  ok  1026       unix   /tmp/861-30526.pmcd.socket
  ok  1024  9876 inet   INADDR_ANY
  ok  1025  9876 ipv6   INADDR_ANY
[Tue Sep 2 15:21:07] pmcd(30540) Info: pmcd caught SIGTERM from pid=8258 uid=0
[Tue Sep  2 15:21:07] pmcd(30540) Info: pmcd Shutdown

Log finished Tue Sep  2 15:21:07 2014

And in another window I did # sudo kill -TERM 30540

and the caught SIGHTERM lines appear immediately after, and qa/861 fails with

[91%] 861 - output mismatch (see 861.out.bad)
3,7c3
<
< pmcd.hostname PMID: 2.0.21
<     Data Type: string  InDom: PM_INDOM_NULL 0xffffffff
<     Semantics: discrete  Units: none
<     value "nosuchhost.com"
---
> pminfo: Cannot connect to PMCD on host "local:": Connection refused
Check local PMCD is still alive ...
PMDA probe: pminfo -h vm20 -f sample.milliseconds
PMDA probe: pminfo -h vm20 -f sampledso.milliseconds
PMDA probe: pminfo -h vm20 -f simple.numfetch

<Prev in Thread] Current Thread [Next in Thread>