pcp
[Top] [All Lists]

Re: [pcp] pmcd dumps core on containers test

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pmcd dumps core on containers test
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 13 Feb 2015 06:42:06 +1100
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <245733123.4207829.1423733264690.JavaMail.zimbra@xxxxxxxxxx>
References: <54DBC8A0.8010805@xxxxxxxxxxxxxxxx> <911908095.3825780.1423693506632.JavaMail.zimbra@xxxxxxxxxx> <54DBDAC2.5020204@xxxxxxxxxxxxxxxx> <245733123.4207829.1423733264690.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
On 12/02/15 20:27, Nathan Scott wrote:
> ...  Can you do a debug build on
> one of the failing hosts (with -g, without -O2 in builddefs) and send
> through the stack trace once more?  (without all the "optimized away"
> parameter values).

Did that and the test no longer fails.

This is smelling like yet another gcc optimizer bug.

kenj@vm18:~/src/pcp/src/libpcp_pmda/src$ gcc --version
gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1

For a long time I turned off -O2 optimizations on Ubuntu because their gcc 
version we demonstrably broken ... that problem went away (not sure how or why) 
but seems to have reappeared or something similar has popped up.

I did these experiments to narrow the scope.
 
        libpcp  libpcp_pmda     libpcp_pmcd     pmcd    pminfo --container=test
build   -g -O2  -g -O2          -g -O2          -g -O2  fails
build   -g      -g              -g              -g      passes
build   -g -O2  -g              -g              -g      passes
build   -g -O2  -g              -g              -g -O2  passes
build   -g -O2  -g              -g -O2          -g -O2  passes
build   -g -O2  -g -O2          -g -O2          -g -O2  fails

So it is libpcp_pmda that has the bad code.

Digging further, with all of the rest of libpcp_pmda compiled -O2, the problem 
appears and disappears depending on whether pduroot.c is compiled -O2 or not.

Hmm ... pduroot.c seems benign, but wait ... pduroot.h contains the pdu buffer 
typedef with a name[0] field ... this looks like a smoking gun.  If -O2 is 
confused by the pointer assignment and thinks pdu->name is of size 0 and then 
checks before doing the inlined strncpy, then maybe KABOOM.

A small rework of pduroot.h and associated code in pduroot.c and we're laughing 
... all the container QA tests pass.

I'll push my code changes upstream for review.

<Prev in Thread] Current Thread [Next in Thread>