I had suspected, without any proof that PCP QA was running much slower.
I started to look at 169 failing with a error return of "Timeout waiting
for a response from PMCD" rather than "IPC protocol failure" which I
thought was a minor issue, but is in fact a regression ... when pmcd
times out on the pmda ipc, it used to (and should) send an ipc error
response to the client waiting on the pmda response.
This no longer happens ... the pmda timeout happens, but the client is
left hanging until its own timeout on the pmcd ipc goes off ... this is
wrong.
But much more seriously, in the process of investigating this, I turned
on all diags for pmcd and arrggghhh .... millions of line of output of
the form
__pmDataIPC: fd=974
__pmDataIPC: fd=974, data=0xb84623e0(sz=8)
where fd increments from 0 to 1027 (or there abouts) and this repeats 56
times in the short life of pmcd for qa/169.
This looks like a problem with the fd's for client ipc moving up into
the large 1024+ range and some sort of iteration over all possible fds.
This needs to be fixed before any 3.7.0 release is contemplated.
|