I am seeing qa/183 failing across lots of hosts in a full run, i.e. $
check (no args)
On investigation I see the logger and trace PMDA installed (which is
odd) and pmnewlog seems to be having trouble talking to pmlogger via
pmlc to get info about the "logger" metrics ...
Problem with lookup for metric "logger" ...
Reason: No PMCD agent for domain of request
So in an attempt to diagnose this I tried to Remove the logger PMDA and
this happened ...
[Sun Jul 28 07:08:46] pmcd(23023) Info: CleanupAgent ...
Cleanup "logger" agent (dom 106): unconfigured, exit(1)
->PMCD event trace: starting at Sun Jul 28 07:08:46 2013
-> New client: [1] -- unknown
?-> Xmit: ERROR PDU, fd=1028, err=0: No error
-> Recv: CREDS PDU, fd=1028, pdubuf=0xb8424000
-> Recv: CREDS PDU, fd=1028, pdubuf=0x1
-> Recv: PMNS_TRAVERSE PDU, fd=1028, pdubuf=0xb8422000
-> Xmit: PMNS_NAMES PDU, fd=1028, numpmid=1
-> Recv: PMNS_NAMES PDU, fd=1028, pdubuf=0xb8424000
-> Xmit: PMNS_IDS PDU, fd=1028, numpmid=1
-> Recv: PROFILE PDU, fd=1028, pdubuf=0xb8422000
-> Recv: FETCH PDU, fd=1028, pdubuf=0xb8424000
-> Xmit: RESULT PDU, fd=1028, numpmid=1
-> Recv: DESC_REQ PDU, fd=1028, pdubuf=0xb8422000
-> Xmit: DESC PDU, fd=1028, pmid=2.0.7
-> End client: fd=1028
-> Xmit: ERROR PDU, fd=10, err=-12391: Not Connected
-> Xmit: ERROR PDU, fd=12, err=-12391: Not Connected
-> Xmit: ERROR PDU, fd=16, err=-12391: Not Connected
-> Xmit: ERROR PDU, fd=18, err=-12391: Not Connected
-> Xmit: ERROR PDU, fd=20, err=-12391: Not Connected
-> Drop PMDA: domain=106, infd=16, outfd=17
[Sun Jul 28 07:08:46] pmcd(23023) Error: Unexpected signal 11 ...
Dumping to core ...
Now this is a non-negotiable release blocker.
pmcd is not allowed to dump core ... we're spent 10 years getting to
this point, and we're going to keep it that way.
The New client message is also a worry -- unknown \n? is neither
expected nor helpful.
And finally we've lost the procedure call traceback ... the relevant
code is guarded by
#if HAVE_TRACE_BACK_STACK
but NOTHING appears to define HAVE_TRACE_BACK_STACK under any
circumstances ... can anyone explain what happened here?
|