Comment # 6
on bug 1158
from Ken McDonell
Just noticed that PDU trace output is suppressed in __pmDumpResult (happened
about 2 years ago, have not had to look there lately!).
So there are a bucketload of PDU exchanges ...
73748 [10995]pmXmitPDU: FETCH fd=8 len=168
...
74986 9840: 8588cc18
<IN HERE>
74987 pmResult dump from 0x7f5b9c393860 timestamp: 1469563519.653328
20:05:19.653 numpmid: 35
and if these timed out or other badness happened we'd be none the wiser other
than the <noname> metrics names and ??? instance names in the pmDumpResult
output that follows.
This means the PDU exchanges after the pmDumpResult output make no sense at
all!
773084 inst [1 or ???] value 3873205428 -5.2046292e+23 0xe6dc6cb4
<HERE>
773085 [10995]pmXmitPDU: DESC_REQ fd=8 len=16
773086 000: 10 7004 0 1600000f
773087 [10995]pmGetPDU: DESC fd=8 len=32 from=0
773088 000: 20 7005 0 4900000f 1000000 100000f 1000000
200001
Request PMID 1600000f annswer comes back for PMID 4900000f
773089 [10995]pmXmitPDU: PMNS_IDS fd=8 len=24
773090 000: 18 700d 0 0 1000000 1600000f
773091 [10995]pmGetPDU: DESC fd=8 len=32 from=0
773092 000: 20 7005 0 1600000f 3000000 ffffffff 1000000
200001
Protocol botch ... this looks like the answer for the DESC_REQ above.
773093 [10995]pmXmitPDU: DESC_REQ fd=8 len=16
773094 000: 10 7004 0 1400000f
773095 [10995]pmGetPDU: PMNS_NAMES fd=8 len=48 from=0
773096 000: 30 700e 0 13000000 0 1000000 12000000
6e72656b
773097 008: 612e6c65 632e6c6c 732e7570 7e7e7379
773098 [10995]pmXmitPDU: DESC_REQ fd=8 len=16
773099 000: 10 7004 0 2000000f
773100 [10995]pmGetPDU: DESC fd=8 len=32 from=0
773101 000: 20 7005 0 1400000f 3000000 ffffffff 1000000
200001
And more of the same.
I suspect a PDU has been sent from pmcd after pmweblog detected a timeout and
marked a context as expired or bad ... and thereafter the synchronous
pmcd-client protocol is all out of whack.