On 11/03/13 19:49, Ken McDonell wrote:
> On 10/03/13 07:21, Ken McDonell wrote:
>> I had suspected, without any proof that PCP QA was running much slower.
>>
>> ...
> BUT if you change 169 so that the pmcd tracing is not buffered, i.e.
>
> pmstore pmcd.control.tracenobuf 1
>
> after pmcd is reconfigured, then the test passes 20 out of 20 attempts.
OK, I've spent many hours on this one and finally cracked it ... the core of
the problem is this turdlet in the pmcd code ...
#if 0 /* TODO: IPv6 -- how to trace an ip address?? */
pmcd_trace(TR_ADD_CLIENT, ClientIPAddr(&client[i]), fd, client[i].seq);
#else /* For now so that the output is not completely missing. */
pmcd_trace(TR_ADD_CLIENT, 0, fd, client[i].seq);
#endif
Combine this with a call to gethostbyaddr() in pmcd's TR_ADD_CLIENT trace code
(a fundamentally bad idea to be doing reverse DNS lookup at this point, but
that is an earlier design error), and the gethostbyaddr() call always times out
looking up the address "0", after about 5 seconds, ... and bingo you have basis
for the qa/169 failure.
When pmcd's tracing is enabled and unbuffered, this delay is hidden in each
client connection and the QA test passes.
When pmcd's tracing is buffered and only reported on a serious error, then the
delay happens _after_ the PMDA timeout and the extra 5 seconds is enough to see
the client's PDU request timeout _before_ pmcd has cleaned up the bad PMDA and
returned PM_ERR_IPC to the client ... so the test fails.
I've changed the reporting to report the IP address and added code for the
{ipv4,ipv6}x{secure sockets yes, no} cases (I've checked both the ipv4 cases,
but have no way to test the ipv6 cases, so some extra eyes and qa would be
appreciated there).
The commit is coming soon (once I've rerun qa/169 on all my QA machines).
|