pcp
[Top] [All Lists]

Re: [pcp] pcp-2.7.8-20081117 sig 11 in AcceptNewClient

To: Nathan Scott <nscott@xxxxxxxxxx>
Subject: Re: [pcp] pcp-2.7.8-20081117 sig 11 in AcceptNewClient
From: Scott Emery <emery@xxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 26 Feb 2009 16:14:35 -0600
Cc: Scott Emery <emery@xxxxxxx>, pcp@xxxxxxxxxxx, emery@xxxxxxx
In-reply-to: Your message of "Fri, 27 Feb 2009 07:55:54 +1100." <1235681754.4166.14.camel@xxxxxxxxxxxxxxxxxx>
In message <1235681754.4166.14.camel@xxxxxxxxxxxxxxxxxx>, Nathan Scott writes:
>Hi Scott,
>
>Could you mail "core" and "pmcd" to me please?
>

        Mail attachments are tricky for me.  Does uuencode/uudecode work
for you.  If I can anonymous ftp to a server, that would be most convenient.
The data is up at SGI.

>> [Thu Feb 26 00:41:01] pmcd(22456) Error: Unexpected signal 11 ...
>
>OK, pmcd took SIGSEGV ... (by definition, this is not the fault
>of their Perl PMDA, BTW, which is a separate process).
>

        agree

>> [Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_fsnames()
>> Use of uninitialized value in hash element at /var/lib/pcp/pmdas/lustre/pmda
>lustre.pl line 292.
>> Use of uninitialized value in concatenation (.) or string at /var/lib/pcp/pm
>das/lustre/pmdalustre.pl line 293.
>> Use of uninitialized value in hash element at /var/lib/pcp/pmdas/lustre/pmda
>lustre.pl line 293.
>
>Although the above points to bugs in the PMDA too.  FWIW
>(not much help here), Martins tree has a Lustre PMDA too:
>http://oss.sgi.com/projects/pcp/source.html points to his
>git tree.  I don't think the PMDA is the cause of their
>failure here though.
>

        I have reported that part to them.

>> service100 /var/log/pcp/pmcd # gdb pmcd core
>> warning: exec file is newer than core file.
>
>That's just cos you copied it from $PCP_BINADM_DIR right?
>

        yes.

>> #4  0x0000000000410823 in AcceptNewClient (reqfd=0) at client.c:69
>
>A "git-checkout pcp-2.7.8-20081117" points at this line:
>
>    FD_SET(fd, &clientFds);
>    __pmSetVersionIPC(fd, UNKNOWN_VERSION);     /* before negotiation */
>>>> client[i].fd = fd;
>    client[i].status.connected = 1;
>    client[i].status.changes = 0;
>
>Can you double-check the code you have built from matches that?
>The only address there that could have SIGSEGV'd is client[i] -
>we accessed client[i].addr a few lines higher up in the accept
>call ... very odd.  Hopefully, should be able to diagnose further
>with the binary & core file.
>

        I see no reason

>cheers.
>
>--
>Nathan
>

<Prev in Thread] Current Thread [Next in Thread>