pcp
[Top] [All Lists]

Re: pmcd gives up on slow starting Perl PMDA

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: pmcd gives up on slow starting Perl PMDA
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Mon, 24 Mar 2014 11:35:24 -0400
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <532F56BC.9040500@xxxxxxxxxxxxxxxx> (Ken McDonell's message of "Mon, 24 Mar 2014 08:48:44 +1100")
References: <532C975F.4020808@xxxxxxxxxxx> <532F56BC.9040500@xxxxxxxxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
Hi, Ken -

kenj wrote:

> I have a strawman proposal for consideration.

Thanks!


> [...] None of this is relevant to DSO PMDAs ... if a PMDA is going
> to be slow it cannot be installed as a DSO [...]

(Over time, we should move away from DSO's anyway, for failure
tolerance.  We've seen pmda bugs crash & bring down pmcd.  I think
we've seen memory leaks.)


> [...]
> === PMDA has to have a big think after start up ===
>
> At some point, one of the PMDA callback routines is called and
> the PMDA knows it need to do something to reconfigure itself or
> probe for something or ... anything that might take longer than
> the pmcd timeout. [...]

In the general case, the pmda might not know ahead of time if a
request might blow a particular timeout because of transient overloads
or lock conflicts or whatever.  It seem what we'd need more is

- a pmda-side watchdog thread, to reply with NOTREADY indication to pmcd
  when/if the pmda callbacks are taking too long

- intelligence in pmcd to relay that NOTREADY to the pmapi client

- intelligence in pmcd to quench further requests to the pmda until it
  is ready (so reply NOTREADY to other clients without even asking the pmda)

- yet more intelligence in pmcd to NOT forget about the original pmda
  request that timed out, so that when the pmda eventually finishes,
  the results can get dropped on the floor, then go back into full service


This approach could maybe apply to the pmda initialization problem
also: let these pmdas defer any possibly-timetaking initialization to
the first post-pmda-setup packet, and let the watchdog handle NOTREADY
as above. 


- FChE

<Prev in Thread] Current Thread [Next in Thread>