pcp
[Top] [All Lists]

Re: [pcp] telnet-probe hanging in qa/835 ... spreading, now qa/443

To: pcp@xxxxxxxxxxx
Subject: Re: [pcp] telnet-probe hanging in qa/835 ... spreading, now qa/443
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Tue, 23 Sep 2014 08:19:17 +1000
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <46250282.53083281.1411381765101.JavaMail.zimbra@xxxxxxxxxx>
References: <46250282.53083281.1411381765101.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
On 22/09/14 20:29, Nathan Scott wrote:
Hi all,

I'm seeing a new, reliable telnet-probe hang in qa/835 ... anyone else
come across this one?  Haven't dug deeper yet, will do so tomorrow.

Hmm ... I'm not even getting to qa/835 now.

My last 3 QA runs are hung in qa/443 ... neither this test nor pmevent have been subject to recent changes.

kenj@bozo-vm:~/src/pcp/qa$ pstree 23294
checkâââshâââpmevent
           ââshâââsed

kenj@vm00:~/src$ pstree 12770
checkâââshâââpmevent
           ââshâââsed

kenj@grundy:~$ pstree 27358
checkâââshâââpmevent
           ââshâââsed

And here is the problem ... pmevent is not getting an error back for the bad -h arg ... and loops forever using the local pmcd as a context.


kenj@bozo:~$ pmevent -h no.such.host sample.event.records
host:      bozo
samples:   all
sample.event.records[fungus]: 2 event records
  08:15:48.916 --- event record [0] flags 0x1 (point) ---
    sample.event.type 1
  08:15:49.916 --- event record [1] flags 0x1 (point) ---
    sample.event.type 2
    sample.event.param_64 -3
sample.event.records[bogus]: 1 event records
  08:15:58.916 --- event record [0] flags 0x1 (point) ---
    sample.event.param_string "fetch #286"
sample.event.records[fungus]: 0 event records
sample.event.records[bogus]: 2 event records
  08:15:59.919 --- event record [0] flags 0x1 (point) ---
    sample.event.param_string "fetch #288"
  08:15:59.919 --- event record [1] flags 0x1 (point) ---
    sample.event.param_string "bingo!"
^C

And here is the root cause ...

kenj@bozo:~$ pmevent -Dcontext -h no.such.host sample.event.records
__pmSetSocketIPC: fd=3
IPC table fd(PDU version):
__pmDecodeXtendError: got error PDU (code=0, datum=385876226, version=2)
__pmSetVersionIPC: fd=3 version=2
IPC table fd(PDU version): 3(2,1)
__pmSendCreds: #0 = 1020000
__pmConnectPMCD(no.such.host): pmcd connection port=44321 fd=3 PDU version=2
IPC table fd(PDU version): 3(2,1)
pmNewContext(1, no.such.host) -> 0

Someone's broken pmNewContext() it appears to me.

<Prev in Thread] Current Thread [Next in Thread>