pcp
[Top] [All Lists]

Re: [pcp] pmcd gives up on slow starting Perl PMDA

To: pcp@xxxxxxxxxxx
Subject: Re: [pcp] pmcd gives up on slow starting Perl PMDA
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Mon, 24 Mar 2014 08:48:44 +1100
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <532C975F.4020808@xxxxxxxxxxx>
References: <532C975F.4020808@xxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
I have a strawman proposal for consideration.

This is long I'm afraid, but the issues are tricky and warrant some detailed explanation.

There are 2 use cases to consider:
1. slow start PMDA - Martins' case and https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1073658 2. PMDA distraction - where sometime after start up the PMDA notices things have changed and has to go reconfig itself, or make a new connection, or reparse a file, or ...

None of this is relevant to DSO PMDAs ... if a PMDA is going to be slow it cannot be installed as a DSO (this precondition is checked and enforced in pmdaControl() below).

I have the bones of a proof of concept implementation, so it would be nice to get some feedback.

=== slow start PMDA ===

        /* initialize with no metrics or indoms - see note below */
        pmdaInit(&dispatch, NULL, 0, NULL, 0);
        pmdaConnect(&dispatch);
/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_BUSY);

        ... start up code and long delay ...

        /* optional */
/*new*/     pmdaSetIndoms(&dispatch, &indomtab, nindoms);
        /* optional */
/*new*/     pmdaSetMetrics(&dispatch, &metrictab, nmetrics);
        /* else */
            pmdaSetFlags(&dispatch, PMDA_FLAG_EXT_HASHED);
            pmdaRehash(&dispatch, &metrictab, nmetrics);
/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_READY);
        pmdaMain(&dispatch);

pmdaControl() is the key new function. With the PMDA_CONTROL_BUSY argument it will launch a pthread to mimic the libpcp_pmda mainloop, blocking in select() on the recv from pmcd file descriptor ... if pmcd sends anything to the PMDA, this thread returns an error PDU to pmcd with the PM_ERR_PMDANOTREADY code, marks a local state to say this has been done and the thread exits.

When pmdaControl() is called with the PMDA_CONTROL_READY argument, if the pthread has sent the PM_ERR_PMDANOTREADY error PDU, then send PM_ERR_PMDAREADY error PDU to pmcd, otherwise terminate the pthread.

For this to work, pmdaConnect() has to have been called before the first pmdaControl() call so that the pmcd file descriptors are set up, which means pmdaInit() has to have been called earlier. If the PMDA is not sure of the available metrics and/or instance domains at this point (establishing this may well be the reason for the delay), then pmdaInit() has to be called with no metrics and/or no indoms, which is why I've added the helper methods pmdaSetIndoms() and pmdaSetMetrics() ... the C developer has always been able to do this directly, but we need functional interfaces for this to be available in Perl and Python.

=== PMDA has to have a big think after start up ===

At some point, one of the PMDA callback routines is called and
the PMDA knows it need to do something to reconfigure itself or
probe for something or ... anything that might take longer than
the pmcd timeout.

The current request cannot be serviced before the long delay, so
it will return PM_ERR_PMDANOTREADY to pmcd, then do what needs
to be done, then return PM_ERR_PMDAREADY to pmcd.

        /*
         * in one/all of the PMDA's callback routines that can return
         * a value, so check() or fetchcallback(), and/or in the PMDA's
         * PDU handling methods that may be wrappers to the libpcp_pmda
         * method, so profile(), fetch(), desc(), instance(), text()
         * store() ... basically in the places where the PMDA has
         * control and the delay might be expected
         */
/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_NOREADY);

        ... long delay ...

/*new*/ pmdaControl(&dispatch, PMDA_CONTROL_READY);
        return PM_ERR_PMDAREADY;

No pthreads needed here as a single thread of execution in the PMDA is all we need.

First call to pmdaControl() uses PMDA_CONTROL_NOREADY which forces the PM_ERR_PMDANOTREADY error PDU to be sent to pmcd. The second call to pmdaControl() is a bit of a no-op, just cleans up some internal state so pmdaControl() can enforce valid state transitions. The PM_ERR_PMDAREADY error PDU is sent to pmcd from the libpcp_pmda library as a result of the return PM_ERR_PMDAREADY; from the function that was responsible for the delay.

<Prev in Thread] Current Thread [Next in Thread>