On 06/07/16 13:05, Nathan Scott wrote:
> Hi guys,
>
> ----- Original Message -----
>> On 07/05/2016 03:53 PM, Ken McDonell wrote:
>>> I'm seeing this ...
>>>
>>> [DATE] pmcd(PID) Error: auxconnect.c:__pmSockAddrInit: Invalid address
>>> family: 0
>>> [DATE] pmcd(PID) Error: auxconnect.c:__pmSockAddrCompare: Invalid address
>>> family: 0
>>>
>>> in about half the failing tests on Mac OS X.
>>>
>>> Does anyone know how or why we'd be traversing the libpcp socket code for
>>> an AF of 0?
>>>
>>> Seems like a missing or broken guard somewhere higher up the call stack,
>>> but I have not been able to diagnose this, so I'm seeking assistance from
>>> those who know more.
>>>
>>> Thanks for any hints or suggestions.
>>>
>> This sequence of errors suggests to me that __pmSockAddrIsLoopBack() is
>> being called with an address containing family==0.
>>
>> __pmSockAddrIsLoopBack first extracts the family from the given
>> address and the calls __pmLoopBackAddress(family), which in turn calls
>> __pmSockAddrInit() using that family (first error).
>>
>> It then calls __pmSockAddrCompare() with the original address and the
>> manufactured loopback address, both of which will now have family==0
>> (second error).
>>
>> Possible candidates:
>> __pmAccAddClient(new client adress)
>> __pmSockAddrIsLoopBack(const __pmSockAddr *addr)
>>
>> HandleClientInput(__pmFdSet *fdsPtr)
>> DoCreds(addr from client table)
>> __pmSockAddrIsLoopBack(const __pmSockAddr *addr)
>>
>> VerifyClient(addr from client table)
>> __pmSockAddrIsLoopBack(const __pmSockAddr *addr)
>>
>> I hope this helps,
>
> Ayup, certainly did.
>
> I poked at this a bit today (hmm, "lldb" now eh? fun) ... and I think it may
> be that accept is not filling in the family Dave. In your list above Dave I
> was able to reproduce it from __pmAccAddClient. We have this libpcp code:
>
> void
> __pmCheckAcceptedAddress(__pmSockAddr *addr)
> {
> #if defined(HAVE_STRUCT_SOCKADDR_UN)
> /*
> * accept(3) doesn't set the peer address for unix domain sockets.
> * We need to do it ourselves. The address family
> * is set, so we can use it to test. There is only one unix domain socket
> * open, so we know its path.
> */
> if (__pmSockAddrGetFamily(addr) == AF_UNIX)
> __pmSockAddrSetPath(addr, localSocketPath);
> #endif
> }
>
> (via __pmAccept)
>
> ... and it looks like we are seeing a sockaddr that is (still) completely
> zeroed after we accept on the fd in pmcd/client.c AcceptNewClient. The
> attached patch seems to tidy it up for me ... whaddya think Dave? Are we
> likely to see other places where this happens, I wonder?
Independently (because I don't read email often enough when we have a house
full of grandchildren), I came to the same conclusion ... accept() is returning
a valid fd and sa_family is AF_UNSPEC (0).
Here is a snippet of my augmented output running qa/066
> __pmAccept: fd=9 sa_len=16 sa_family=1 sa_data[...]=0000.0000.0000.0000
> AcceptNewClient:__pmAccept returns: fd=9 addr=/var/run/pcp/pmcd.socket
> CheckNewClient: cp->addr=/var/run/pcp/pmcd.socket
> __pmAccAddClient: hostid=/var/run/pcp/pmcd.socket
> getClientIds: hostid=/var/run/pcp/pmcd.socket
> __pmAccept: fd=9 sa_len=16 sa_family=2 sa_data[...]=fffffff93b.7f00.0001.0000
> AcceptNewClient:__pmAccept returns: fd=9 addr=127.0.0.1
> CheckNewClient: cp->addr=127.0.0.1
> __pmAccAddClient: hostid=127.0.0.1
> getClientIds: hostid=127.0.0.1
> __pmAccept: fd=10 sa_len=0 sa_family=0 sa_data[...]=0000.0000.0000.0000
> AcceptNewClient:__pmAccept returns: fd=10 addr=(null)
> CheckNewClient: cp->addr=(null)
> __pmAccAddClient: hostid=(null)
> getClientIds: hostid=(null)
> __pmAccDelClient: hostid=(null)
> getClientIds: hostid=(null)
Note the 3rd __pmAccept!
I have a minimal patch that makes the warnings go away without introducing any
apparent badness ... I'll commit and post when I have a bit more QA time.
|