pcp
[Top] [All Lists]

Re: [pcp] pcp-2.7.8-20081117 sig 11 in AcceptNewClient

To: Scott Emery <emery@xxxxxxx>
Subject: Re: [pcp] pcp-2.7.8-20081117 sig 11 in AcceptNewClient
From: Nathan Scott <nscott@xxxxxxxxxx>
Date: Fri, 27 Feb 2009 07:55:54 +1100
Cc: pcp@xxxxxxxxxxx
In-reply-to: <200902261811.n1QIBvJJ26790543@xxxxxxxxxxxxxxxxxxxxx>
References: <200902261811.n1QIBvJJ26790543@xxxxxxxxxxxxxxxxxxxxx>
Hi Scott,

On Thu, 2009-02-26 at 12:11 -0600, Scott Emery wrote:
> obtained pcp 2.7.8 from git to get at the perl PMDA bits.  Built their
> own perl-based Lustre PMDA.  This combination worked for many weeks.  Then
> I configured pmlogger.
> 
> service100 /var/log/pcp/pmcd # ls -altr
> total 5992
> -rw-r--r-- 1 root root     106 Nov 19 11:24 simple.log
> drwxr-xr-x 6 root root    4096 Dec  3 15:11 ..
> -rw-r--r-- 1 root root     790 Feb 25 18:41 pmcd.log.prev
> -rw-r--r-- 1 root root     939 Feb 25 18:41 lustre.log.prev
> -rw-r--r-- 1 root root     790 Feb 26 00:41 pmcd.log
> -rw-r--r-- 1 root root     939 Feb 26 00:41 lustre.log
> -rw------- 1 root root 5828608 Feb 26 00:41 core
> -rwxr-xr-x 1 root root  200006 Feb 26 07:32 pmcd

Could you mail "core" and "pmcd" to me please?

> [Thu Feb 26 00:41:01] pmcd(22456) Error: Unexpected signal 11 ...

OK, pmcd took SIGSEGV ... (by definition, this is not the fault
of their Perl PMDA, BTW, which is a separate process).

> [Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_fsnames()
> Use of uninitialized value in hash element at 
> /var/lib/pcp/pmdas/lustre/pmdalustre.pl line 292.
> Use of uninitialized value in concatenation (.) or string at 
> /var/lib/pcp/pmdas/lustre/pmdalustre.pl line 293.
> Use of uninitialized value in hash element at 
> /var/lib/pcp/pmdas/lustre/pmdalustre.pl line 293.

Although the above points to bugs in the PMDA too.  FWIW
(not much help here), Martins tree has a Lustre PMDA too:
http://oss.sgi.com/projects/pcp/source.html points to his
git tree.  I don't think the PMDA is the cause of their
failure here though.

> service100 /var/log/pcp/pmcd # gdb pmcd core
> warning: exec file is newer than core file.

That's just cos you copied it from $PCP_BINADM_DIR right?

> #4  0x0000000000410823 in AcceptNewClient (reqfd=0) at client.c:69

A "git-checkout pcp-2.7.8-20081117" points at this line:

    FD_SET(fd, &clientFds);
    __pmSetVersionIPC(fd, UNKNOWN_VERSION);     /* before negotiation */
>>> client[i].fd = fd;
    client[i].status.connected = 1;
    client[i].status.changes = 0;

Can you double-check the code you have built from matches that?
The only address there that could have SIGSEGV'd is client[i] -
we accessed client[i].addr a few lines higher up in the accept
call ... very odd.  Hopefully, should be able to diagnose further
with the binary & core file.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>