[pcp] Proposal for handling dynamic metric names (and hence dynamic metrics)
nscott at aconex.com
Wed Jul 8 20:20:30 CDT 2009
----- "Ken McDonell" <kenj at internode.on.net> wrote:
> I've been threatening to get this out for sometime now.
> There is no code to back any of this up (yet), it really is a
> proposal ... so please let me know if you think this is a good or bad
> idea, and holes being picked in the issues covered would be most
> welcome, as would better ideas.
Looks pretty good to me. One section that seems to be missing
is "Changes for pmcd", at least for completeness & to give a more
clear description of the PDU exchanges that'd be involved, maybe?
There's a misconception: "...update global PMNS and send pmcd a
SIGHUP signal". I also thought that was how it works, but that's
not what pmdammv actually does. I think that approach is deadlock
prone - the signal to pmcd seems to cause a request to the PMDA (I
can't remember whether I decoded which request that was now), but
the PMDA is blocked in kill(2) and never responds - pmcd ends up
terminating it due to the timeout, and (amusingly) also ends up
restarting it right away cos it gets a SIGHUP! Perhaps Max can
remember more the details of that mystery pmcd->pmda PDU.
What MMV actually does, is send PMCD a PM_ERR_NOTREADY and then a
PM_ERR_READY pair of error PDUs (see callers of mmv_reload_maybe).
The code in src/pmcd/src/pmcd.c HandleReadyAgents() returns "true"
when the READY PDU comes in and pmcd reloads the namespace (pmcd.c
around line 848).
So, one *big* problem with this approach (in addition to the "ugly,
error prone" rationale you have already) is that the NOTREADY gets
sent back to _clients_ too. Which means that whenever an agent is
reconfigured, even if that reconfiguration has nothing to do with
the (mmv) request/pmid in question, we end up seeing errors on the
client (which confuses pmie rules, which may send spurious and bad
status out, and means pmlogger gets no data for that sample - which
is annoying if the change had nothing to do with that particular
More information about the pcp