pcp
[Top] [All Lists]

pmReconnextContext

To: pcp <pcp@xxxxxxxxxxx>
Subject: pmReconnextContext
From: Mark Goodwin <mgoodwin@xxxxxxxxxx>
Date: Thu, 02 May 2013 13:13:38 +1000
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20130206 Thunderbird/14.0
This came up yesterday on IRC and it was suggested to discuss
it on the list.

What was the original rationale for pmReconnectContext? The use
case came when pmchart users complained their charts stopped
monitoring when a remote host bounced (and stayed stopped after
that host came back up or pmcd restarted, whatever). And so
pmReconnectContext was born.

But we could have hidden the functionality in libpcp e.g. pmstat
does this :

    if ((sts = pmFetch(nummetrics, s->pmids, s->res + s->flip)) < 0) {
        if (ctxType == PM_CONTEXT_HOST &&
            (sts == PM_ERR_IPC || sts == PM_ERR_TIMEOUT)) {
            puts (" Fetch failed. Reconnecting ...");
            if ( s->res[1-s->flip] != NULL ) {
                pmFreeResult(s->res[1-s->flip]);
                s->res[1-s->flip] = NULL;
            }
            pmReconnectContext (s->ctx);

... we could have checked sts == PM_ERR_IPC || sts == PM_ERR_TIMEOUT
in pmFetch and automatically reconnected/retried some number of
times. Or maybe that's too messy and we needed the error handling
out in the app layer? e.g. in the above case, pmstat would have seemed
to hang with no opportunity to inform the user whilst pmFetch reconnected
and retried. Can you remember Ken?

Cheers
-- Mark

<Prev in Thread] Current Thread [Next in Thread>