----- Original Message -----
> Here it is.
>
> 1. reproducible for me in qa/323 but only if some other tests are run first,
> e.g. a full run and I've made it happen again with with check 101-200 323
Good stuff.
> 2. traceback
>
> Procedure call traceback ...
> 0x7fe42c8924a0 [/lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fe42c8924a0]]
> 0x7fe42ce6a904 [/usr/lib/libpcp.so.3(__pmSockAddrGetFamily+0x4)
> [0x7fe42ce6a904]]
> 0x7fe42ce55dae [/usr/lib/libpcp.so.3(__pmSockAddrIsLoopBack+0x1e)
> [0x7fe42ce55dae]]
> 0x7fe42ce6287f [/usr/lib/libpcp.so.3(+0x4387f) [0x7fe42ce6287f]]
> 0x7fe42ce644f8 [/usr/lib/libpcp.so.3(__pmAccAddClient+0x48)
> [0x7fe42ce644f8]]
> 0x7fe42d2b6104 [/usr/lib/pcp/bin/pmcd(ParseRestartAgents+0x884)
> [0x7fe42d2b6104]]
> 0x7fe42d2b0494 [/usr/lib/pcp/bin/pmcd(SignalRestart+0x74) [0x7fe42d2b0494]]
> 0x7fe42d2b116f [/usr/lib/pcp/bin/pmcd(+0x716f) [0x7fe42d2b116f]]
> 0x7fe42d2af9bb [/usr/lib/pcp/bin/pmcd(main+0x5bb) [0x7fe42d2af9bb]]
> 0x7fe42c87d76d [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
> [0x7fe42c87d76d]]
> 0x7fe42d2afb99 [/usr/lib/pcp/bin/pmcd(+0x5b99) [0x7fe42d2afb99]]
>
> Dumping to core ...
>
So ... we're in the delayed responding-to-SIGHUP code in the main loop, we've
gone through all of the config re-parsing, agent terminating, restarting, and
general fiddling about, and have arrived at re-evaluating client access. We
do the __pmAccAddClient call on every(*) client in the client array and we're
peeking inside the sockaddr pointer (recently became a pointer to an aux data
structure - IIRC it used to be inline in the client structure (iow no sigsegv
used to be likely, even from a dodgey array entry access). If we continue to
follow the call path it would appear that __pmSockAddrIsLoopBack and its call
to __pmSockAddrGetFamily are the very first paths where we attempt to crack
open the now-not-inline sock-addr-pointer, and a fine explosion ensues.
(*) Every client? That doesn't look right - does the patch below help at all?
Theory: some client has been disconnected, at some point earlier, and its slot
in the client array has not yet been reclaimed. As a result, its addr pointer
is still null, and the CheckClientAccess call will fall over such an entry.
[ There are several opportunities for race conditions here, depending on order
of client disconnection, timing of arrival of sighup, etc... seems promising ]
diff --git a/src/pmcd/src/config.c b/src/pmcd/src/config.c
index c666889..472db7f 100644
--- a/src/pmcd/src/config.c
+++ b/src/pmcd/src/config.c
@@ -2511,6 +2511,8 @@ ParseRestartAgents(char *fileName)
for (i = 0; i < nClients; i++) {
ClientInfo *cp = &client[i];
+ if (cp->status.connected == 0)
+ continue;
if ((sts = CheckClientAccess(cp)) >= 0)
sts = CheckAccountAccess(cp);
if (sts < 0) {
cheers.
--
Nathan
|