pcp
[Top] [All Lists]

Re: [pcp] URGENT potentially serious regression in 3.7.0

To: PCP Mailing List <pcp@xxxxxxxxxxx>
Subject: Re: [pcp] URGENT potentially serious regression in 3.7.0
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Tue, 02 Apr 2013 16:52:54 +1100
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <513D9A93.6080806@xxxxxxxxxxxxxxxx>
References: <513B99E4.7030007@xxxxxxxxxxxxxxxx> <513D9A93.6080806@xxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4
On 11/03/13 19:49, Ken McDonell wrote:
> On 10/03/13 07:21, Ken McDonell wrote:
>> I had suspected, without any proof that PCP QA was running much slower.
>>
>> ...
> BUT if you change 169 so that the pmcd tracing is not buffered, i.e.
> 
> pmstore pmcd.control.tracenobuf 1
> 
> after pmcd is reconfigured, then the test passes 20 out of 20 attempts.

OK, I've spent many hours on this one and finally cracked it ... the core of 
the problem is this turdlet in the pmcd code ...

#if 0 /* TODO: IPv6 -- how to trace an ip address?? */
    pmcd_trace(TR_ADD_CLIENT, ClientIPAddr(&client[i]), fd, client[i].seq);
#else /* For now so that the output is not completely missing. */
    pmcd_trace(TR_ADD_CLIENT, 0, fd, client[i].seq);
#endif

Combine this with a call to gethostbyaddr() in pmcd's TR_ADD_CLIENT trace code 
(a fundamentally bad idea to be doing reverse DNS lookup at this point, but 
that is an earlier design error), and the gethostbyaddr() call always times out 
looking up the address "0", after about 5 seconds, ... and bingo you have basis 
for the qa/169 failure.

When pmcd's tracing is enabled and unbuffered, this delay is hidden in each 
client connection and the QA test passes.

When pmcd's tracing is buffered and only reported on a serious error, then the 
delay happens _after_ the PMDA timeout and the extra 5 seconds is enough to see 
the client's PDU request timeout _before_ pmcd has cleaned up the bad PMDA and 
returned PM_ERR_IPC to the client ... so the test fails.

I've changed the reporting to report the IP address and added code for the 
{ipv4,ipv6}x{secure sockets yes, no} cases (I've checked both the ipv4 cases, 
but have no way to test the ipv6 cases, so some extra eyes and qa would be 
appreciated there).

The commit is coming soon (once I've rerun qa/169 on all my QA machines).

<Prev in Thread] Current Thread [Next in Thread>