Hi, Ken -
kenj wrote:
> I have a strawman proposal for consideration.
Thanks!
> [...] None of this is relevant to DSO PMDAs ... if a PMDA is going
> to be slow it cannot be installed as a DSO [...]
(Over time, we should move away from DSO's anyway, for failure
tolerance. We've seen pmda bugs crash & bring down pmcd. I think
we've seen memory leaks.)
> [...]
> === PMDA has to have a big think after start up ===
>
> At some point, one of the PMDA callback routines is called and
> the PMDA knows it need to do something to reconfigure itself or
> probe for something or ... anything that might take longer than
> the pmcd timeout. [...]
In the general case, the pmda might not know ahead of time if a
request might blow a particular timeout because of transient overloads
or lock conflicts or whatever. It seem what we'd need more is
- a pmda-side watchdog thread, to reply with NOTREADY indication to pmcd
when/if the pmda callbacks are taking too long
- intelligence in pmcd to relay that NOTREADY to the pmapi client
- intelligence in pmcd to quench further requests to the pmda until it
is ready (so reply NOTREADY to other clients without even asking the pmda)
- yet more intelligence in pmcd to NOT forget about the original pmda
request that timed out, so that when the pmda eventually finishes,
the results can get dropped on the floor, then go back into full service
This approach could maybe apply to the pmda initialization problem
also: let these pmdas defer any possibly-timetaking initialization to
the first post-pmda-setup packet, and let the watchdog handle NOTREADY
as above.
- FChE
|