I have a strawman proposal for consideration.
This is long I'm afraid, but the issues are tricky and warrant some
detailed explanation.
There are 2 use cases to consider:
1. slow start PMDA - Martins' case and
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1073658
2. PMDA distraction - where sometime after start up the PMDA notices
things have changed and has to go reconfig itself, or make a new
connection, or reparse a file, or ...
None of this is relevant to DSO PMDAs ... if a PMDA is going to be slow
it cannot be installed as a DSO (this precondition is checked and
enforced in pmdaControl() below).
I have the bones of a proof of concept implementation, so it would be
nice to get some feedback.
=== slow start PMDA ===
/* initialize with no metrics or indoms - see note below */
pmdaInit(&dispatch, NULL, 0, NULL, 0);
pmdaConnect(&dispatch);
/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_BUSY);
... start up code and long delay ...
/* optional */
/*new*/ pmdaSetIndoms(&dispatch, &indomtab, nindoms);
/* optional */
/*new*/ pmdaSetMetrics(&dispatch, &metrictab, nmetrics);
/* else */
pmdaSetFlags(&dispatch, PMDA_FLAG_EXT_HASHED);
pmdaRehash(&dispatch, &metrictab, nmetrics);
/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_READY);
pmdaMain(&dispatch);
pmdaControl() is the key new function. With the PMDA_CONTROL_BUSY
argument it will launch a pthread to mimic the libpcp_pmda mainloop,
blocking in select() on the recv from pmcd file descriptor ... if pmcd
sends anything to the PMDA, this thread returns an error PDU to pmcd
with the PM_ERR_PMDANOTREADY code, marks a local state to say this has
been done and the thread exits.
When pmdaControl() is called with the PMDA_CONTROL_READY argument, if
the pthread has sent the PM_ERR_PMDANOTREADY error PDU, then send
PM_ERR_PMDAREADY error PDU to pmcd, otherwise terminate the pthread.
For this to work, pmdaConnect() has to have been called before the first
pmdaControl() call so that the pmcd file descriptors are set up, which
means pmdaInit() has to have been called earlier. If the PMDA is not
sure of the available metrics and/or instance domains at this point
(establishing this may well be the reason for the delay), then
pmdaInit() has to be called with no metrics and/or no indoms, which is
why I've added the helper methods pmdaSetIndoms() and pmdaSetMetrics()
... the C developer has always been able to do this directly, but we
need functional interfaces for this to be available in Perl and Python.
=== PMDA has to have a big think after start up ===
At some point, one of the PMDA callback routines is called and
the PMDA knows it need to do something to reconfigure itself or
probe for something or ... anything that might take longer than
the pmcd timeout.
The current request cannot be serviced before the long delay, so
it will return PM_ERR_PMDANOTREADY to pmcd, then do what needs
to be done, then return PM_ERR_PMDAREADY to pmcd.
/*
* in one/all of the PMDA's callback routines that can return
* a value, so check() or fetchcallback(), and/or in the PMDA's
* PDU handling methods that may be wrappers to the libpcp_pmda
* method, so profile(), fetch(), desc(), instance(), text()
* store() ... basically in the places where the PMDA has
* control and the delay might be expected
*/
/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_NOREADY);
... long delay ...
/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_READY);
return PM_ERR_PMDAREADY;
No pthreads needed here as a single thread of execution in the PMDA is
all we need.
First call to pmdaControl() uses PMDA_CONTROL_NOREADY which forces the
PM_ERR_PMDANOTREADY error PDU to be sent to pmcd. The second call to
pmdaControl() is a bit of a no-op, just cleans up some internal state so
pmdaControl() can enforce valid state transitions. The PM_ERR_PMDAREADY
error PDU is sent to pmcd from the libpcp_pmda library as a result of
the return PM_ERR_PMDAREADY; from the function that was responsible for
the delay.
|