Hi -
> > A brief scanny run now produces actual valgrind leak reports ...
>
> It could be this first one is part of the issue - does pmmgr create and
> destroy contexts with attributes alot by default? (if so, at what sort
> of rates? IOW, can we explain 1GB of memory this way?)
>
> > ==17174== by 0x4E6D346: __pmParseHostAttrsSpec (spec.c:937)
> > ==17174== by 0x4E49545: pmNewContext (context.c:468)
> > ==17174== by 0x116B8A: pmmgr_job_spec::compute_hostid(std::string const&)
pmmgr normally polls/discovers at a configured rate (default 60s), and
opens new brief pcp context for each live target. If slave
pmlogger/pmie or *conf processes fail, this rate can be faster. (This
is why I was asking for the pmmgr.log file.) Back-of-the envelope
calculations indicate that 1 GB over 54 days can be leaked at the low
low rate of 13K/minute = 200B/second. That would seem to require a
very tight poll loop, but that's consistent with the high CPU time
seen. pmmgr nowadays refuses to loop faster then 1Hz for any
particular target (since commit 43626c97, merged in 3.10.0).
> This second one is fine - its a once-off thing, not on-going. The QA
> tests guard against reporting this (see the _run_valgrind shell code).
>
> > ==17174== by 0x4E70EA3: __pmConfig (config.c:218)
> > ==17174== by 0x4E71069: pmGetConfig (config.c:242)
> > ==17174== by 0x10DA53: main (pmmgr.cxx:1201)
... as long as that getenv() protection on config.c:215 works, so that
we leak only as many strings as there are variables in pcp.conf, I
guess that's ok. (pmGetConfig()'s documentation gives no hint that it
leaks memory, so an app would be in the right to call it many times.)
- FChE
|