pcp
[Top] [All Lists]

Re: [pcp] errors from socket code on Mac OS X

To: Nathan Scott <nathans@xxxxxxxxxx>, Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: [pcp] errors from socket code on Mac OS X
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Wed, 6 Jul 2016 17:05:50 +1000
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <2068385288.4119706.1467774342414.JavaMail.zimbra@xxxxxxxxxx>
References: <577C1045.1040108@xxxxxxxxxxxxxxxx> <577C1D0A.6040300@xxxxxxxxxx> <2068385288.4119706.1467774342414.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0
On 06/07/16 13:05, Nathan Scott wrote:
> Hi guys,
> 
> ----- Original Message -----
>> On 07/05/2016 03:53 PM, Ken McDonell wrote:
>>> I'm seeing this ...
>>>
>>> [DATE] pmcd(PID) Error: auxconnect.c:__pmSockAddrInit: Invalid address
>>> family: 0
>>> [DATE] pmcd(PID) Error: auxconnect.c:__pmSockAddrCompare: Invalid address
>>> family: 0
>>>
>>> in about half the failing tests on Mac OS X.
>>>
>>> Does anyone know how or why we'd be traversing the libpcp socket code for
>>> an AF of 0?
>>>
>>> Seems like a missing or broken guard somewhere higher up the call stack,
>>> but I have not been able to diagnose this, so I'm seeking assistance from
>>> those who know more.
>>>
>>> Thanks for any hints or suggestions.
>>>
>> This sequence of errors suggests to me that __pmSockAddrIsLoopBack() is
>> being called with an address containing family==0.
>>
>>    __pmSockAddrIsLoopBack first extracts the family from the given
>> address and the calls __pmLoopBackAddress(family), which in turn calls
>> __pmSockAddrInit() using that family (first error).
>>
>>    It then calls __pmSockAddrCompare() with the original address and the
>> manufactured loopback address, both of which will now have family==0
>> (second error).
>>
>> Possible candidates:
>> __pmAccAddClient(new client adress)
>>     __pmSockAddrIsLoopBack(const __pmSockAddr *addr)
>>
>> HandleClientInput(__pmFdSet *fdsPtr)
>>     DoCreds(addr from client table)
>>       __pmSockAddrIsLoopBack(const __pmSockAddr *addr)
>>
>> VerifyClient(addr from client table)
>>     __pmSockAddrIsLoopBack(const __pmSockAddr *addr)
>>
>> I hope this helps,
> 
> Ayup, certainly did.
> 
> I poked at this a bit today (hmm, "lldb" now eh? fun) ... and I think it may
> be that accept is not filling in the family Dave.  In your list above Dave I
> was able to reproduce it from __pmAccAddClient.  We have this libpcp code:
> 
> void
> __pmCheckAcceptedAddress(__pmSockAddr *addr)
> {
> #if defined(HAVE_STRUCT_SOCKADDR_UN)
>      /*
>       * accept(3) doesn't set the peer address for unix domain sockets.
>       * We need to do it ourselves. The address family
>       * is set, so we can use it to test. There is only one unix domain socket
>       * open, so we know its path.
>       */
>      if (__pmSockAddrGetFamily(addr) == AF_UNIX)
>          __pmSockAddrSetPath(addr, localSocketPath);
> #endif
> }
> 
> (via __pmAccept)
> 
> ... and it looks like we are seeing a sockaddr that is (still) completely
> zeroed after we accept on the fd in pmcd/client.c AcceptNewClient.  The
> attached patch seems to tidy it up for me ... whaddya think Dave?  Are we
> likely to see other places where this happens, I wonder?

Independently (because I don't read email often enough when we have a house 
full of grandchildren), I came to the same conclusion ... accept() is returning 
a valid fd and sa_family is AF_UNSPEC (0).

Here is a snippet of my augmented output running qa/066

> __pmAccept: fd=9 sa_len=16 sa_family=1 sa_data[...]=0000.0000.0000.0000
> AcceptNewClient:__pmAccept returns: fd=9 addr=/var/run/pcp/pmcd.socket
> CheckNewClient: cp->addr=/var/run/pcp/pmcd.socket
> __pmAccAddClient: hostid=/var/run/pcp/pmcd.socket
> getClientIds: hostid=/var/run/pcp/pmcd.socket
> __pmAccept: fd=9 sa_len=16 sa_family=2 sa_data[...]=fffffff93b.7f00.0001.0000
> AcceptNewClient:__pmAccept returns: fd=9 addr=127.0.0.1
> CheckNewClient: cp->addr=127.0.0.1
> __pmAccAddClient: hostid=127.0.0.1
> getClientIds: hostid=127.0.0.1
> __pmAccept: fd=10 sa_len=0 sa_family=0 sa_data[...]=0000.0000.0000.0000
> AcceptNewClient:__pmAccept returns: fd=10 addr=(null)
> CheckNewClient: cp->addr=(null)
> __pmAccAddClient: hostid=(null)
> getClientIds: hostid=(null)
> __pmAccDelClient: hostid=(null)
> getClientIds: hostid=(null)

Note the 3rd __pmAccept!

I have a minimal patch that makes the warnings go away without introducing any 
apparent badness ... I'll commit and post when I have a bit more QA time.

<Prev in Thread] Current Thread [Next in Thread>