pcp
[Top] [All Lists]

Re: Connection timeouts and getaddrinfo

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: Connection timeouts and getaddrinfo
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Tue, 03 May 2016 12:38:32 -0400
Cc: pcp <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1046898355.44816857.1462256769212.JavaMail.zimbra@xxxxxxxxxx>
References: <1046898355.44816857.1462256769212.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
On 05/03/2016 02:26 AM, Nathan Scott wrote:
Hi Dave,

I came across this quirky libpcp networking behaviour today as
I was looking further into Rares' recently reported issue ...

$ time /usr/libexec/pcp/bin/pmcd_wait -h oss.sgi.com -t 2

real    0m6.349s
user    0m0.001s
sys     0m0.005s

The -t 2 there sets PMCD_CONNECT_TIMEOUT.  So, what I think we
see here (timeout taking 3x longer than expected) is that the
getaddrinfo loop in __pmAuxConnectPMCDPort causes the timeout
to be (re-)applied for each address returned.  strace shows we
definitely see 3 connect() attempts in the above example.
Yes, that's definitely what's happening.
Not sure what the correct behaviour should be here - thoughts?
Seems like its probably not doing what users would expect atm.

The delay is applied during the call __pmSelectWrite(). One thing we could try would be to open a socket for each address, use the select to wait on all of them at once, and choose the one that's selected. If the timeout expires, then we can assume that they all timed out and we will have applied the timeout once for all of the addresses. I can't think of another way to apply one timeout while trying all of the addresses.

The downside is that PMCD will see several connections, some (most?) of which will succeed and then be abandoned. I assume that these will get logged in a similar way to the pmprobe connections that fche opened a bug about.

Dave

<Prev in Thread] Current Thread [Next in Thread>