pcp
[Top] [All Lists]

Re: [pcp] pmcd dumping core - multiple issues

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: [pcp] pmcd dumping core - multiple issues
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Mon, 29 Jul 2013 02:45:09 -0400 (EDT)
Cc: PCP Mailing List <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <51F44856.3090308@xxxxxxxxxxxxxxxx>
References: <51F44856.3090308@xxxxxxxxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: syeR+oMcwU4nBuJiG69AiltFpQW6qA==
Thread-topic: pmcd dumping core - multiple issues

----- Original Message -----
> I am seeing qa/183 failing across lots of hosts in a full run, i.e. $
> check (no args)
> 
> ...
> [Sun Jul 28 07:08:46] pmcd(23023) Error: Unexpected signal 11 ...
> 
> Dumping to core ...

I'm having no luck reproducing this locally - this test has now run
in a loop thousands of times successfully, and the dopey thing will
not fail.

> 
> The New client message is also a worry -- unknown \n? is neither
> expected nor helpful.
> 

And I can't seem to find where that message is coming from either
- does not seem to be any 'New client' message in the sources of
libpcp, pmcd, nspr, nss, or libsasl.  But its definitely there...

$ strings pmcd | grep 'New '
New client: [%d] 

Ah but wait - its hiding over in libpcp_pmcd.a ...

                case TR_ADD_CLIENT:
                    {
                        ClientInfo      *cip;

                        fprintf(f, "New client: [%d] ", trace[p].t_who);
                        cip = GetClient(trace[p].t_who);
                        if (cip == NULL) {
                            fprintf(f, "-- unknown\n?");
                        }

Well that's one mystery solved - '?' is an error message typo.
OOC, are the failing systems all secure-sockets builds?  Or not?
(any pattern there?)

> And finally we've lost the procedure call traceback ... the relevant
> code is guarded by
> #if HAVE_TRACE_BACK_STACK
> but NOTHING appears to define HAVE_TRACE_BACK_STACK under any
> circumstances ... can anyone explain what happened here?

If you have that backtrace call coded up, could you push that through?
I have some other test systems here I can access to try reproduce it.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>