Comment # 7
on bug 1158
from Ken McDonell
A bit more analysis confirms -Dfetch is triggering the problem reported here
... I'd like to know what was the *original* problem that motivated running
with -Dfetch.
The normal pattern for pmwebd looks a lot like this (trace has been annotated
with a special tool).
[10995]pmXmitPDU: DESC_REQ fd=8 len=16
000: 10 7004 0 4800000f
<pmid 60.0.72>
[10995]pmGetPDU: DESC fd=8 len=32 from=0
000: 20 7005 0 4800000f 1000000 100000f 1000000 200001
<pmid 60.0.72 type U32 indom 60.1 sem counter units millisec>
[10995]pmXmitPDU: PMNS_IDS fd=8 len=24
000: 18 700d 0 0 1000000 4800000f
<pmid[0] 60.0.72>
[10995]pmGetPDU: PMNS_NAMES fd=8 len=52 from=0
000: 34 700e 0 18000000 0 1000000 17000000 6b736964
<name[0] disk.dev.read_rawactive>
008: 7665642e 6165722e 61725f64 74636177 7e657669
Note the same metric disk.dev.read_rawactive (pmid 60.0.72) is involved in the
3 consecutive PDU exchanges.
Now prior to the IPC failure we see ...
[10995]pmXmitPDU: DESC_REQ fd=8 len=16
000: 10 7004 0 1600000f
<pmid 60.0.22>
[10995]pmGetPDU: DESC fd=8 len=32 from=0
000: 20 7005 0 4900000f 1000000 100000f 1000000 200001
<pmid 60.0.73 type U32 indom 60.1 sem counter units millisec>
[10995]pmXmitPDU: PMNS_IDS fd=8 len=24
000: 18 700d 0 0 1000000 1600000f
<pmid[0] 60.0.22>
[10995]pmGetPDU: DESC fd=8 len=32 from=0
000: 20 7005 0 1600000f 3000000 ffffffff 1000000 200001
<pmid 60.0.22 type U64 indom PM_INDOM_NULL sem counter units millisec>
[10995]pmXmitPDU: DESC_REQ fd=8 len=16
000: 10 7004 0 1400000f
<pmid 60.0.20>
[10995]pmGetPDU: PMNS_NAMES fd=8 len=48 from=0
000: 30 700e 0 13000000 0 1000000 12000000 6e72656b
<name[0] kernel.all.cpu.sys>
008: 612e6c65 632e6c6c 732e7570 7e7e7379
The first DESC_REQ returns the wrong pmDesc, the real response does not come
until a bit later.
And the PMNS_IDS request is answered not in the next response, but the one
after.
This will happen if there is an extra response PDU in the socket at some point
because the client (pmwebd in this case) has sent a request and timedout before
the reponse arrives, but pmcd sends the response a bit later.
>From this point on, request #K to pmcd will see the response PDU from request
#(K-1) ... and the game is eventually over.