Arthur,
Any chance you can get the archive and a script to reproduce the bug
onto the host "grundy" (ip addr 192.48.174.18) within the SGI DMZ?
I could then debug it directly there.
On Thu, 2011-12-08 at 12:24 -0800, Arthur Kepner wrote:
> Way back in May, I mentioned that SGI has an 'ancient' bug open
> regarding pmlogreduce:
>
> http://oss.sgi.com/archives/pcp/2011-05/msg00054.html
>
> Commit 5f08513 fixed a memory leak which was thought to possibly
> be related, but a root cause wasn't identified.
>
> Well (of course) the bug has resurfaced. Here's the signature
> (very similar to the report from May):
>
> 8<--------------------------- [snip] ---------------------------
>
> basil:/var/tmp/984708 # /usr/lib64/pcp/bin/pmlogreduce -A 1min -t 1hour
> 20110926 tmp
> *** glibc detected *** /usr/lib64/pcp/bin/pmlogreduce: munmap_chunk():
> invalid pointer: 0x000000000091dac0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x75018)[0x7f30cc5a5018]
> /usr/lib64/libpcp.so.3(__pmFreeResultValues+0x2d5)[0x7f30cc89c9f5]
> /usr/lib64/libpcp.so.3(pmFreeResult+0x44)[0x7f30cc89ca53]
> /usr/lib64/libpcp.so.3(+0x2a493)[0x7f30cc8b8493]
> /usr/lib64/libpcp.so.3(+0x2b258)[0x7f30cc8b9258]
> /usr/lib64/libpcp.so.3(__pmLogFetchInterp+0xdc4)[0x7f30cc8ba14b]
> /usr/lib64/libpcp.so.3(__pmLogFetch+0x66)[0x7f30cc8b5c28]
> /usr/lib64/libpcp.so.3(pmFetch+0x116)[0x7f30cc89c3a8]
> /usr/lib64/pcp/bin/pmlogreduce[0x4029b2]
> /lib64/libc.so.6(__libc_start_main+0xe6)[0x7f30cc54ebc6]
> /usr/lib64/pcp/bin/pmlogreduce[0x4022a9]
> ======= Memory map: ========
> 00400000-00407000 r-xp 00000000 08:02 38123551
> /usr/lib64/pcp/bin/pmlogreduce
> 00606000-00607000 r--p 00006000 08:02 38123551
> /usr/lib64/pcp/bin/pmlogreduce
> 00607000-00608000 rw-p 00007000 08:02 38123551
> /usr/lib64/pcp/bin/pmlogreduce
> 00608000-00b6a000 rw-p 00000000 00:00 0
> [heap]
> 7f30cc115000-7f30cc12b000 r-xp 00000000 08:02 67152496
> /lib64/libgcc_s.so.1
> 7f30cc12b000-7f30cc32a000 ---p 00016000 08:02 67152496
> /lib64/libgcc_s.so.1
> 7f30cc32a000-7f30cc32b000 r--p 00015000 08:02 67152496
> /lib64/libgcc_s.so.1
> 7f30cc32b000-7f30cc32c000 rw-p 00016000 08:02 67152496
> /lib64/libgcc_s.so.1
> 7f30cc32c000-7f30cc32e000 r-xp 00000000 08:02 67150603
> /lib64/libdl-2.11.1.so
> 7f30cc32e000-7f30cc52e000 ---p 00002000 08:02 67150603
> /lib64/libdl-2.11.1.so
> 7f30cc52e000-7f30cc52f000 r--p 00002000 08:02 67150603
> /lib64/libdl-2.11.1.so
> 7f30cc52f000-7f30cc530000 rw-p 00003000 08:02 67150603
> /lib64/libdl-2.11.1.so
> 7f30cc530000-7f30cc685000 r-xp 00000000 08:02 67150592
> /lib64/libc-2.11.1.so
> 7f30cc685000-7f30cc884000 ---p 00155000 08:02 67150592
> /lib64/libc-2.11.1.so
> 7f30cc884000-7f30cc888000 r--p 00154000 08:02 67150592
> /lib64/libc-2.11.1.so
> 7f30cc888000-7f30cc889000 rw-p 00158000 08:02 67150592
> /lib64/libc-2.11.1.so
> 7f30cc889000-7f30cc88e000 rw-p 00000000 00:00 0
> 7f30cc88e000-7f30cc8df000 r-xp 00000000 08:02 7215963
> /usr/lib64/libpcp.so.3
> 7f30cc8df000-7f30ccadf000 ---p 00051000 08:02 7215963
> /usr/lib64/libpcp.so.3
> 7f30ccadf000-7f30ccae0000 r--p 00051000 08:02 7215963
> /usr/lib64/libpcp.so.3
> 7f30ccae0000-7f30ccae2000 rw-p 00052000 08:02 7215963
> /usr/lib64/libpcp.so.3
> 7f30ccae2000-7f30ccae8000 rw-p 00000000 00:00 0
> 7f30ccae8000-7f30ccb07000 r-xp 00000000 08:02 67150784
> /lib64/ld-2.11.1.so
> 7f30cccbe000-7f30cccc2000 rw-p 00000000 00:00 0
> 7f30cccfc000-7f30ccd06000 rw-p 00000000 00:00 0
> 7f30ccd06000-7f30ccd07000 r--p 0001e000 08:02 67150784
> /lib64/ld-2.11.1.so
> 7f30ccd07000-7f30ccd08000 rw-p 0001f000 08:02 67150784
> /lib64/ld-2.11.1.so
> 7f30ccd08000-7f30ccd09000 rw-p 00000000 00:00 0
> 7fff469d6000-7fff469eb000 rw-p 00000000 00:00 0
> [stack]
> 7fff469ff000-7fff46a00000 r-xp 00000000 00:00 0
> [vdso]
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
> [vsyscall]
> Aborted
>
> 8<--------------------------- [snip] ---------------------------
>
> Since the archive that causes this is large (~19 MBytes, compressed)
> I haven't included it here, but can send it on request. I expect that
> you may not see the same behavior, since almost any perturbation
> (different glibc, etc.) will frighten the bug away.
>
> (Merely running 'pmlogreduce' in a different directory from
> where the archives are stored causes the bug to vanish. Really.
> Maybe that's a good hint - it makes no sense to me, but it's
> quite reproducible. Running under 'valgrind' also makes the bug
> go away - and it doesn't produce any obviously related warnings,
> either.)
>
> 'addr2line' identifies line 96 in src/libpcp/src/freeresult.c as the
> place where things go bad:
>
> 26 void
> 27 __pmFreeResultValues(pmResult *result)
> 28 {
> ......
> 90 #ifdef PCP_DEBUG
> 91 if (pmDebug & DBG_TRACE_PDUBUF) {
> 92 fprintf(stderr, "free(" PRINTF_P_PFX "%p) vset
> pmid=%s\n ",
> 93 pvs, pmIDStr(pvs->pmid));
> 94 }
> 95 #endif
> 96 free(pvs);
>
>
> Turned on DBG_TRACE_PDUBUF and got (after a *very* long time):
> .....
> free pdubuf[size]: 0x86c000[34816] 0xa4e000[34816] 0xb1b000[92160]
> 0x91d000[34816] 0x887000[34816] 0x791000[34816] 0xa66000[92160]
> 0x8b5000[34816] 0x7b2000[34816] 0xa1e000[34816] 0xb04000[92160]
> 0x926000[34816] 0xa86000[34816] 0xab6000[92160] 0x72b000[22528]
> 0x743000[9216] 0x741000[6144] 0x7a2000[31744] 0x7aa000[31744] 0x79a000[31744]
> 0x746000[22528] 0x73b000[22528] 0x789000[31744] 0x864000[31744]
> 0x8ca000[31744] 0x650000[31744] 0x762000[31744] 0x739000[5120]
> 0x8c2000[31744] 0x8ad000[31744] 0x75c000[22528] 0x76f000[22528]
> 0x74f000[17408] 0x725000[2048] 0x733000[2048] 0x64a000[22528] 0x726000[17408]
> 0x731000[5120] 0x775000[22528] 0x77b000[22528] 0x754000[22528]
> 0x734000[17408] 0x76a000[17408] 0x75a000[5120] 0x74e000[2048] 0x74d000[1024]
> 0x74c000[1024]
> pinned pdubuf[pincnt]: 0x914000[2] 0x875000[299] 0xb32000[36]
> 0x8e9000[102] 0x7bb000[499] 0x87e000[114] 0xa37000[24] 0xa07000[36]
> 0x92f000[18] 0x7cd000[567] 0x7d6000[2] 0x7c4000[30] 0x781000[1] 0xa7d000[1]
> free(0x91d000) vset pmid=60.0.46
> __pmPoolFree(0x920f00) pmValueBlock pmid=60.0.44 inst=-1
> __pmPoolFree(0x920f00, 12)
> __pmPoolFree(0x91da60) vset pmid=60.0.44
> __pmPoolFree(0x91da60, 32)
> __pmPoolFree(0x91da80) vset pmid=60.0.42
> __pmPoolFree(0x91da80, 32)
> __pmPoolFree(0x91daa0) vset pmid=60.0.41
> __pmPoolFree(0x91daa0, 32)
> free(0x91dac0) vset pmid=60.0.39
> *** glibc detected *** /usr/lib64/pcp/bin/pmlogreduce: munmap_chunk():
> invalid pointer: 0x000000000091dac0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x75018)[0x7fb09af55018]
> /usr/lib64/libpcp.so.3(__pmFreeResultValues+0x2d5)[0x7fb09b24c9f5]
> ......
>
> FWIW, 60.0.39 is:
> # pminfo -m |grep "60.0.39"
> disk.dev.write_bytes PMID: 60.0.39
>
> Anything else I can do to help debug this?
>
|