pcp
[Top] [All Lists]

Re: [pcp] pmclusterd versus other solutions

To: Jeff Hanson <jhanson@xxxxxxx>
Subject: Re: [pcp] pmclusterd versus other solutions
From: Mark Goodwin <mgoodwin@xxxxxxxxxx>
Date: Tue, 13 Sep 2016 10:28:26 +1000
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <CAFmffyV-g0gJ=xCvTRYeLDH6QbojX-grHu=+O1UCV=cdcfpQyQ@xxxxxxxxxxxxxx>
References: <3b551b84-ff74-5b9c-5854-3bdcba1c1212@xxxxxxx> <CAFmffyUkbMi1g3XScEE-XjEHBmdbd5WvHZ6UpGKN_eZtG6pm=g@xxxxxxxxxxxxxx> <CAFmffyVCraY1-idhONyyBt6euF1t6ijbFKEU3qagVEbC-odM1w@xxxxxxxxxxxxxx> <2f9d172e-ea4d-1235-24ff-ab3c8d1e49ea@xxxxxxx> <CAFmffyV-g0gJ=xCvTRYeLDH6QbojX-grHu=+O1UCV=cdcfpQyQ@xxxxxxxxxxxxxx>
actually, pmclusterd uses __pmEncodeResult() and the PMDA uses
__pmDecodeResult() so arch/endian shouldn't matter.

On Tue, Sep 13, 2016 at 9:57 AM, Mark Goodwin <mgoodwin@xxxxxxxxxx> wrote:
> Hi Jeff, the cluster PMDA log 'cluster.log' on the head node is full
> of errors, e.g. :
>
> [Mon Sep 12 08:27:34] pmdacluster(14849) Error: pmdaFetch: PMID
> 65.1.13 not handled by fetch callback
>
> which means cluster_fetchCallBack() isn't finding a matching PMID and
> instance for the requested metric/instance in the cached pmResult for
> each node. Are all your cluster nodes same arch/endian as the head
> node? The code assumes this since we're sending binary pmResult
> structures from each cluster node and caching them on the head node.
> Here's a code snippet :
>
>     /*
>      * Now find the pmid and instance in the cached result.  The domain and
>      * cluster for each PMID in the result will be for the sub-PMDA that
>      * returned it, so translate the pmDesc.pmID to match before comparing.
>      */
>     idsp->domain = subdom_dom_map[idsp->subdomain];
>     idsp->subdomain = 0;
>     sts = PM_ERR_PMID;
>     for (i=0, r = tc->result; i < r->numpmid; i++) {
>         if (pmid_domain( r->vset[i]->pmid) != pmid_domain( pmda_pmid) ||
>             pmid_cluster(r->vset[i]->pmid) != pmid_cluster(pmda_pmid) ||
>             pmid_item(   r->vset[i]->pmid) != pmid_item(   pmda_pmid) )
>             continue;
>         /* found the pmid, now look for the instance */
>         sts = PM_ERR_INST;
>         for (j=0; j < r->vset[i]->numval; j++) {
>             v = &r->vset[i]->vlist[j];
>             if (indom_int->serial == CLUSTER_INDOM || v->inst ==
> instp->node_inst) {
>                 /*
>                  * found
>                  */
>                 if (r->vset[i]->valfmt == PM_VAL_INSITU)
>                     memcpy(&atom->l, &v->value.lval, sizeof(atom->l));
>                 else
>                     pmExtractValue(r->vset[i]->valfmt, v,
> v->value.pval->vtype, atom, v->value.pval->vtype);
>
>                 return 1;
>             }
>         }
>     }
>     return sts;
>
> Also, the other log asked for is /var/log/pcp/pmclusterd.log on one or
> more of the cluster nodes. That log wont be present on the head node.
> Please attach.
> Regards
> -- Mark
>
> On Mon, Sep 12, 2016 at 11:40 PM, Jeff Hanson <jhanson@xxxxxxx> wrote:
>> On 09/12/2016 12:31 AM, Mark Goodwin wrote:
>>>>
>>>> But the real problem is that although pmclusterd exposes some 100 metrics
>>>> or
>>>> so but only 20 of them are actually able to be fetched.
>>>
>>>
>>
>> 88 default metrics, 22 are fetched.  IB ones seem to be an issue different
>> issue
>> from the rest.
>>
>>> Jeff, do you ever see "cluster_node_rw: spinning" in either
>>> /var/log/pcp/pmcd/cluster.log or /var/log/pcp/pmclusterd.log ?
>>
>>
>> No.
>>
>>> Can you send me these logs after reproducing the issue where only some
>>> (20 out of 100) metrics can be fetched but the others report the
>>> instance domain issue?
>>>
>>
>> Attached.
>>
>>> Thanks
>>> -- Mark
>>>
>>> On Thu, Sep 1, 2016 at 3:59 PM, Mark Goodwin <mgoodwin@xxxxxxxxxx> wrote:
>>>>
>>>> Hi Jeff, I don't think we ever open-sourced pmclusterd since it was
>>>> (at the time) SGI ICE specific,
>>>> so it's unlikely anyone outside SGI will know much about it.
>>>>
>>>> This is the daemon that aggregates indoms for per-cluster-node CPU
>>>> data on the head node, so
>>>> the client tools just monitor the head node, right? If that's the tool
>>>> framework you're referring to,
>>>> I always thought it was a bit of an abomination of the indom concept
>>>> (even though I wrote it!),
>>>> but designed it that way to be more scalable than monitoring every
>>>> cluster node individually.
>>>> WHat issues are you running in to?
>>>>
>>>> Regards
>>>> -- Mark
>>>>
>>>>
>>>> On Thu, Sep 1, 2016 at 2:43 AM, Jeff Hanson <jhanson@xxxxxxx> wrote:
>>>>>
>>>>> As we (SGI) explore what to do about the scaling issues with pmclusterd
>>>>> as it is currently written I am exploring other options.  For cluster
>>>>> configurations are people generally running pmcd locally on the cluster
>>>>> nodes
>>>>> and logging to the node?  Running pmcd locally on the cluster node with
>>>>> another system as the logger?  Other thoughts?
>>>>>
>>>>> Thanks.
>>>>> --
>>>>> -----------------------------------------------------------------------
>>>>> Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
>>>>>
>>>>> You can choose a ready guide in some celestial voice.
>>>>> If you choose not to decide, you still have made a choice.
>>>>> You can choose from phantom fears and kindness that can kill;
>>>>> I will choose a path that's clear
>>>>> I will choose freewill. - Peart
>>>>>
>>>>> _______________________________________________
>>>>> pcp mailing list
>>>>> pcp@xxxxxxxxxxx
>>>>> http://oss.sgi.com/mailman/listinfo/pcp
>>
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
>>
>> You can choose a ready guide in some celestial voice.
>> If you choose not to decide, you still have made a choice.
>> You can choose from phantom fears and kindness that can kill;
>> I will choose a path that's clear
>> I will choose freewill. - Peart

<Prev in Thread] Current Thread [Next in Thread>