On 12/02/15 20:27, Nathan Scott wrote:
> ... Can you do a debug build on
> one of the failing hosts (with -g, without -O2 in builddefs) and send
> through the stack trace once more? (without all the "optimized away"
> parameter values).
Did that and the test no longer fails.
This is smelling like yet another gcc optimizer bug.
kenj@vm18:~/src/pcp/src/libpcp_pmda/src$ gcc --version
gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
For a long time I turned off -O2 optimizations on Ubuntu because their gcc
version we demonstrably broken ... that problem went away (not sure how or why)
but seems to have reappeared or something similar has popped up.
I did these experiments to narrow the scope.
libpcp libpcp_pmda libpcp_pmcd pmcd pminfo --container=test
build -g -O2 -g -O2 -g -O2 -g -O2 fails
build -g -g -g -g passes
build -g -O2 -g -g -g passes
build -g -O2 -g -g -g -O2 passes
build -g -O2 -g -g -O2 -g -O2 passes
build -g -O2 -g -O2 -g -O2 -g -O2 fails
So it is libpcp_pmda that has the bad code.
Digging further, with all of the rest of libpcp_pmda compiled -O2, the problem
appears and disappears depending on whether pduroot.c is compiled -O2 or not.
Hmm ... pduroot.c seems benign, but wait ... pduroot.h contains the pdu buffer
typedef with a name[0] field ... this looks like a smoking gun. If -O2 is
confused by the pointer assignment and thinks pdu->name is of size 0 and then
checks before doing the inlined strncpy, then maybe KABOOM.
A small rework of pduroot.h and associated code in pduroot.c and we're laughing
... all the container QA tests pass.
I'll push my code changes upstream for review.
|