pcp
[Top] [All Lists]

[Bug 1158] pmcd observed to return PMNS_IDS PDU in response to FETCH

To: pcp@xxxxxxxxxxx
Subject: [Bug 1158] pmcd observed to return PMNS_IDS PDU in response to FETCH
From: bugzilla-daemon@xxxxxxxxxxx
Date: Thu, 28 Jul 2016 23:58:45 +0000
Auto-submitted: auto-generated
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <bug-1158-835@xxxxxxxxxxxxxxxx/bugzilla/>
References: <bug-1158-835@xxxxxxxxxxxxxxxx/bugzilla/>

Comment # 7 on bug 1158 from
A bit more analysis confirms -Dfetch is triggering the problem reported here
... I'd like to know what was the *original* problem that motivated running
with -Dfetch.

The normal pattern for pmwebd looks a lot like this (trace has been annotated
with a special tool).

[10995]pmXmitPDU: DESC_REQ fd=8 len=16 
000:       10     7004        0 4800000f 
<pmid 60.0.72>
[10995]pmGetPDU: DESC fd=8 len=32 from=0 
000:       20     7005        0 4800000f  1000000  100000f  1000000   200001 
<pmid 60.0.72 type U32 indom 60.1 sem counter units millisec>
[10995]pmXmitPDU: PMNS_IDS fd=8 len=24 
000:       18     700d        0        0  1000000 4800000f 
<pmid[0] 60.0.72>
[10995]pmGetPDU: PMNS_NAMES fd=8 len=52 from=0 
000:       34     700e        0 18000000        0  1000000 17000000 6b736964 
<name[0] disk.dev.read_rawactive>
008: 7665642e 6165722e 61725f64 74636177 7e657669 

Note the same metric disk.dev.read_rawactive (pmid 60.0.72) is involved in the
3 consecutive PDU exchanges.

Now prior to the IPC failure we see ...

[10995]pmXmitPDU: DESC_REQ fd=8 len=16
000:       10     7004        0 1600000f 
<pmid 60.0.22>
[10995]pmGetPDU: DESC fd=8 len=32 from=0
000:       20     7005        0 4900000f  1000000  100000f  1000000   200001 
<pmid 60.0.73 type U32 indom 60.1 sem counter units millisec>
[10995]pmXmitPDU: PMNS_IDS fd=8 len=24
000:       18     700d        0        0  1000000 1600000f 
<pmid[0] 60.0.22>
[10995]pmGetPDU: DESC fd=8 len=32 from=0
000:       20     7005        0 1600000f  3000000 ffffffff  1000000   200001 
<pmid 60.0.22 type U64 indom PM_INDOM_NULL sem counter units millisec>
[10995]pmXmitPDU: DESC_REQ fd=8 len=16
000:       10     7004        0 1400000f 
<pmid 60.0.20>
[10995]pmGetPDU: PMNS_NAMES fd=8 len=48 from=0
000:       30     700e        0 13000000        0  1000000 12000000 6e72656b 
<name[0] kernel.all.cpu.sys>
008: 612e6c65 632e6c6c 732e7570 7e7e7379 

The first DESC_REQ returns the wrong pmDesc, the real response does not come
until a bit later.

And the PMNS_IDS request is answered not in the next response, but the one
after.

This will happen if there is an extra response PDU in the socket at some point
because the client (pmwebd in this case) has sent a request and timedout before
the reponse arrives, but pmcd sends the response a bit later.

>From this point on, request #K to pmcd will see the response PDU from request
#(K-1) ... and the game is eventually over.


You are receiving this mail because:
  • You are on the CC list for the bug.
  • You are the assignee for the bug.
<Prev in Thread] Current Thread [Next in Thread>