----- Original Message -----
> I have a strawman proposal for consideration.
>
> This is long I'm afraid, but the issues are tricky and warrant some
> detailed explanation.
>
> There are 2 use cases to consider:
> 1. slow start PMDA - Martins' case and
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1073658
> 2. PMDA distraction - where sometime after start up the PMDA notices
> things have changed and has to go reconfig itself, or make a new
> connection, or reparse a file, or ...
>
> None of this is relevant to DSO PMDAs ... if a PMDA is going to be slow
> it cannot be installed as a DSO (this precondition is checked and
> enforced in pmdaControl() below).
*nod*
> I have the bones of a proof of concept implementation, so it would be
> nice to get some feedback.
>
> === slow start PMDA ===
>
In this case, there is a problem in that pmcd currently assumes that
a PMDA which does not promptly respond (ie not within the first "-q"
pmcd(1) option seconds) is a version 1.x PMDA.
It may be time to retire 1.x PMDA support?
> [...]
> For this to work, pmdaConnect() has to have been called before the first
> pmdaControl() call so that the pmcd file descriptors are set up, which
> means pmdaInit() has to have been called earlier. If the PMDA is not
> sure of the available metrics and/or instance domains at this point
> (establishing this may well be the reason for the delay), then
> pmdaInit() has to be called with no metrics and/or no indoms, which is
> why I've added the helper methods pmdaSetIndoms() and pmdaSetMetrics()
> ... the C developer has always been able to do this directly, but we
> need functional interfaces for this to be available in Perl and Python.
BTW, it would suit the perl and python APIs if they could do this one
metric/indom at a time - currently they expose APIs behaving like that,
and the wrapper APIs perform the table building internally.
> === PMDA has to have a big think after start up ===
>
> The current request cannot be serviced before the long delay, so
> it will return PM_ERR_PMDANOTREADY to pmcd, then do what needs
> to be done, then return PM_ERR_PMDAREADY to pmcd.
> ...
> First call to pmdaControl() uses PMDA_CONTROL_NOREADY which forces the
NOTREADY? (typo?)
Its not really stated, but I guess this will mean that pmcd will now
begin to propagate PM_ERR_PMDANOTREADY back to clients, for the subset
of PMIDs that the tardy PMDA should have serviced? I think thats OK,
just wanted to be sure I have that bit understood.
We may want to tweak the wording on the error message:
$ pmerr -l | grep PMDA
-13394 PM_ERR_PMDANOTREADY PMDA is not yet ready to respond to requests
-13393 PM_ERR_PMDAREADY PMDA is now responsive to requests
As its now not going to be used only for PMDA startup (if my above
assumptions are correct), we should drop the "yet" from that first
message.
> PM_ERR_PMDANOTREADY error PDU to be sent to pmcd. The second call to
> pmdaControl() is a bit of a no-op, just cleans up some internal state so
> pmdaControl() can enforce valid state transitions. The PM_ERR_PMDAREADY
> error PDU is sent to pmcd from the libpcp_pmda library as a result of
> the return PM_ERR_PMDAREADY; from the function that was responsible for
> the delay.
I think this will work, and is a good improvement on the way some of us
- i.e. me :) - had assumed this would have to be done within pmcd (which
is problematic in terms of PMDA recovery, since it is not done with any
kind of control/cooperation wrt PDU exchanges. Nicely done!
cheers.
--
Nathan
|