Way back in May, I mentioned that SGI has an 'ancient' bug open
regarding pmlogreduce:
http://oss.sgi.com/archives/pcp/2011-05/msg00054.html
Commit 5f08513 fixed a memory leak which was thought to possibly
be related, but a root cause wasn't identified.
Well (of course) the bug has resurfaced. Here's the signature
(very similar to the report from May):
8<--------------------------- [snip] ---------------------------
basil:/var/tmp/984708 # /usr/lib64/pcp/bin/pmlogreduce -A 1min -t 1hour
20110926 tmp
*** glibc detected *** /usr/lib64/pcp/bin/pmlogreduce: munmap_chunk(): invalid
pointer: 0x000000000091dac0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75018)[0x7f30cc5a5018]
/usr/lib64/libpcp.so.3(__pmFreeResultValues+0x2d5)[0x7f30cc89c9f5]
/usr/lib64/libpcp.so.3(pmFreeResult+0x44)[0x7f30cc89ca53]
/usr/lib64/libpcp.so.3(+0x2a493)[0x7f30cc8b8493]
/usr/lib64/libpcp.so.3(+0x2b258)[0x7f30cc8b9258]
/usr/lib64/libpcp.so.3(__pmLogFetchInterp+0xdc4)[0x7f30cc8ba14b]
/usr/lib64/libpcp.so.3(__pmLogFetch+0x66)[0x7f30cc8b5c28]
/usr/lib64/libpcp.so.3(pmFetch+0x116)[0x7f30cc89c3a8]
/usr/lib64/pcp/bin/pmlogreduce[0x4029b2]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f30cc54ebc6]
/usr/lib64/pcp/bin/pmlogreduce[0x4022a9]
======= Memory map: ========
00400000-00407000 r-xp 00000000 08:02 38123551
/usr/lib64/pcp/bin/pmlogreduce
00606000-00607000 r--p 00006000 08:02 38123551
/usr/lib64/pcp/bin/pmlogreduce
00607000-00608000 rw-p 00007000 08:02 38123551
/usr/lib64/pcp/bin/pmlogreduce
00608000-00b6a000 rw-p 00000000 00:00 0 [heap]
7f30cc115000-7f30cc12b000 r-xp 00000000 08:02 67152496
/lib64/libgcc_s.so.1
7f30cc12b000-7f30cc32a000 ---p 00016000 08:02 67152496
/lib64/libgcc_s.so.1
7f30cc32a000-7f30cc32b000 r--p 00015000 08:02 67152496
/lib64/libgcc_s.so.1
7f30cc32b000-7f30cc32c000 rw-p 00016000 08:02 67152496
/lib64/libgcc_s.so.1
7f30cc32c000-7f30cc32e000 r-xp 00000000 08:02 67150603
/lib64/libdl-2.11.1.so
7f30cc32e000-7f30cc52e000 ---p 00002000 08:02 67150603
/lib64/libdl-2.11.1.so
7f30cc52e000-7f30cc52f000 r--p 00002000 08:02 67150603
/lib64/libdl-2.11.1.so
7f30cc52f000-7f30cc530000 rw-p 00003000 08:02 67150603
/lib64/libdl-2.11.1.so
7f30cc530000-7f30cc685000 r-xp 00000000 08:02 67150592
/lib64/libc-2.11.1.so
7f30cc685000-7f30cc884000 ---p 00155000 08:02 67150592
/lib64/libc-2.11.1.so
7f30cc884000-7f30cc888000 r--p 00154000 08:02 67150592
/lib64/libc-2.11.1.so
7f30cc888000-7f30cc889000 rw-p 00158000 08:02 67150592
/lib64/libc-2.11.1.so
7f30cc889000-7f30cc88e000 rw-p 00000000 00:00 0
7f30cc88e000-7f30cc8df000 r-xp 00000000 08:02 7215963
/usr/lib64/libpcp.so.3
7f30cc8df000-7f30ccadf000 ---p 00051000 08:02 7215963
/usr/lib64/libpcp.so.3
7f30ccadf000-7f30ccae0000 r--p 00051000 08:02 7215963
/usr/lib64/libpcp.so.3
7f30ccae0000-7f30ccae2000 rw-p 00052000 08:02 7215963
/usr/lib64/libpcp.so.3
7f30ccae2000-7f30ccae8000 rw-p 00000000 00:00 0
7f30ccae8000-7f30ccb07000 r-xp 00000000 08:02 67150784
/lib64/ld-2.11.1.so
7f30cccbe000-7f30cccc2000 rw-p 00000000 00:00 0
7f30cccfc000-7f30ccd06000 rw-p 00000000 00:00 0
7f30ccd06000-7f30ccd07000 r--p 0001e000 08:02 67150784
/lib64/ld-2.11.1.so
7f30ccd07000-7f30ccd08000 rw-p 0001f000 08:02 67150784
/lib64/ld-2.11.1.so
7f30ccd08000-7f30ccd09000 rw-p 00000000 00:00 0
7fff469d6000-7fff469eb000 rw-p 00000000 00:00 0 [stack]
7fff469ff000-7fff46a00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
[vsyscall]
Aborted
8<--------------------------- [snip] ---------------------------
Since the archive that causes this is large (~19 MBytes, compressed)
I haven't included it here, but can send it on request. I expect that
you may not see the same behavior, since almost any perturbation
(different glibc, etc.) will frighten the bug away.
(Merely running 'pmlogreduce' in a different directory from
where the archives are stored causes the bug to vanish. Really.
Maybe that's a good hint - it makes no sense to me, but it's
quite reproducible. Running under 'valgrind' also makes the bug
go away - and it doesn't produce any obviously related warnings,
either.)
'addr2line' identifies line 96 in src/libpcp/src/freeresult.c as the
place where things go bad:
26 void
27 __pmFreeResultValues(pmResult *result)
28 {
......
90 #ifdef PCP_DEBUG
91 if (pmDebug & DBG_TRACE_PDUBUF) {
92 fprintf(stderr, "free(" PRINTF_P_PFX "%p) vset
pmid=%s\n ",
93 pvs, pmIDStr(pvs->pmid));
94 }
95 #endif
96 free(pvs);
Turned on DBG_TRACE_PDUBUF and got (after a *very* long time):
.....
free pdubuf[size]: 0x86c000[34816] 0xa4e000[34816] 0xb1b000[92160]
0x91d000[34816] 0x887000[34816] 0x791000[34816] 0xa66000[92160] 0x8b5000[34816]
0x7b2000[34816] 0xa1e000[34816] 0xb04000[92160] 0x926000[34816] 0xa86000[34816]
0xab6000[92160] 0x72b000[22528] 0x743000[9216] 0x741000[6144] 0x7a2000[31744]
0x7aa000[31744] 0x79a000[31744] 0x746000[22528] 0x73b000[22528] 0x789000[31744]
0x864000[31744] 0x8ca000[31744] 0x650000[31744] 0x762000[31744] 0x739000[5120]
0x8c2000[31744] 0x8ad000[31744] 0x75c000[22528] 0x76f000[22528] 0x74f000[17408]
0x725000[2048] 0x733000[2048] 0x64a000[22528] 0x726000[17408] 0x731000[5120]
0x775000[22528] 0x77b000[22528] 0x754000[22528] 0x734000[17408] 0x76a000[17408]
0x75a000[5120] 0x74e000[2048] 0x74d000[1024] 0x74c000[1024]
pinned pdubuf[pincnt]: 0x914000[2] 0x875000[299] 0xb32000[36] 0x8e9000[102]
0x7bb000[499] 0x87e000[114] 0xa37000[24] 0xa07000[36] 0x92f000[18]
0x7cd000[567] 0x7d6000[2] 0x7c4000[30] 0x781000[1] 0xa7d000[1]
free(0x91d000) vset pmid=60.0.46
__pmPoolFree(0x920f00) pmValueBlock pmid=60.0.44 inst=-1
__pmPoolFree(0x920f00, 12)
__pmPoolFree(0x91da60) vset pmid=60.0.44
__pmPoolFree(0x91da60, 32)
__pmPoolFree(0x91da80) vset pmid=60.0.42
__pmPoolFree(0x91da80, 32)
__pmPoolFree(0x91daa0) vset pmid=60.0.41
__pmPoolFree(0x91daa0, 32)
free(0x91dac0) vset pmid=60.0.39
*** glibc detected *** /usr/lib64/pcp/bin/pmlogreduce: munmap_chunk(): invalid
pointer: 0x000000000091dac0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75018)[0x7fb09af55018]
/usr/lib64/libpcp.so.3(__pmFreeResultValues+0x2d5)[0x7fb09b24c9f5]
......
FWIW, 60.0.39 is:
# pminfo -m |grep "60.0.39"
disk.dev.write_bytes PMID: 60.0.39
Anything else I can do to help debug this?
--
Arthur
|