actually, pmclusterd uses __pmEncodeResult() and the PMDA uses
__pmDecodeResult() so arch/endian shouldn't matter.
On Tue, Sep 13, 2016 at 9:57 AM, Mark Goodwin <mgoodwin@xxxxxxxxxx> wrote:
> Hi Jeff, the cluster PMDA log 'cluster.log' on the head node is full
> of errors, e.g. :
>
> [Mon Sep 12 08:27:34] pmdacluster(14849) Error: pmdaFetch: PMID
> 65.1.13 not handled by fetch callback
>
> which means cluster_fetchCallBack() isn't finding a matching PMID and
> instance for the requested metric/instance in the cached pmResult for
> each node. Are all your cluster nodes same arch/endian as the head
> node? The code assumes this since we're sending binary pmResult
> structures from each cluster node and caching them on the head node.
> Here's a code snippet :
>
> /*
> * Now find the pmid and instance in the cached result. The domain and
> * cluster for each PMID in the result will be for the sub-PMDA that
> * returned it, so translate the pmDesc.pmID to match before comparing.
> */
> idsp->domain = subdom_dom_map[idsp->subdomain];
> idsp->subdomain = 0;
> sts = PM_ERR_PMID;
> for (i=0, r = tc->result; i < r->numpmid; i++) {
> if (pmid_domain( r->vset[i]->pmid) != pmid_domain( pmda_pmid) ||
> pmid_cluster(r->vset[i]->pmid) != pmid_cluster(pmda_pmid) ||
> pmid_item( r->vset[i]->pmid) != pmid_item( pmda_pmid) )
> continue;
> /* found the pmid, now look for the instance */
> sts = PM_ERR_INST;
> for (j=0; j < r->vset[i]->numval; j++) {
> v = &r->vset[i]->vlist[j];
> if (indom_int->serial == CLUSTER_INDOM || v->inst ==
> instp->node_inst) {
> /*
> * found
> */
> if (r->vset[i]->valfmt == PM_VAL_INSITU)
> memcpy(&atom->l, &v->value.lval, sizeof(atom->l));
> else
> pmExtractValue(r->vset[i]->valfmt, v,
> v->value.pval->vtype, atom, v->value.pval->vtype);
>
> return 1;
> }
> }
> }
> return sts;
>
> Also, the other log asked for is /var/log/pcp/pmclusterd.log on one or
> more of the cluster nodes. That log wont be present on the head node.
> Please attach.
> Regards
> -- Mark
>
> On Mon, Sep 12, 2016 at 11:40 PM, Jeff Hanson <jhanson@xxxxxxx> wrote:
>> On 09/12/2016 12:31 AM, Mark Goodwin wrote:
>>>>
>>>> But the real problem is that although pmclusterd exposes some 100 metrics
>>>> or
>>>> so but only 20 of them are actually able to be fetched.
>>>
>>>
>>
>> 88 default metrics, 22 are fetched. IB ones seem to be an issue different
>> issue
>> from the rest.
>>
>>> Jeff, do you ever see "cluster_node_rw: spinning" in either
>>> /var/log/pcp/pmcd/cluster.log or /var/log/pcp/pmclusterd.log ?
>>
>>
>> No.
>>
>>> Can you send me these logs after reproducing the issue where only some
>>> (20 out of 100) metrics can be fetched but the others report the
>>> instance domain issue?
>>>
>>
>> Attached.
>>
>>> Thanks
>>> -- Mark
>>>
>>> On Thu, Sep 1, 2016 at 3:59 PM, Mark Goodwin <mgoodwin@xxxxxxxxxx> wrote:
>>>>
>>>> Hi Jeff, I don't think we ever open-sourced pmclusterd since it was
>>>> (at the time) SGI ICE specific,
>>>> so it's unlikely anyone outside SGI will know much about it.
>>>>
>>>> This is the daemon that aggregates indoms for per-cluster-node CPU
>>>> data on the head node, so
>>>> the client tools just monitor the head node, right? If that's the tool
>>>> framework you're referring to,
>>>> I always thought it was a bit of an abomination of the indom concept
>>>> (even though I wrote it!),
>>>> but designed it that way to be more scalable than monitoring every
>>>> cluster node individually.
>>>> WHat issues are you running in to?
>>>>
>>>> Regards
>>>> -- Mark
>>>>
>>>>
>>>> On Thu, Sep 1, 2016 at 2:43 AM, Jeff Hanson <jhanson@xxxxxxx> wrote:
>>>>>
>>>>> As we (SGI) explore what to do about the scaling issues with pmclusterd
>>>>> as it is currently written I am exploring other options. For cluster
>>>>> configurations are people generally running pmcd locally on the cluster
>>>>> nodes
>>>>> and logging to the node? Running pmcd locally on the cluster node with
>>>>> another system as the logger? Other thoughts?
>>>>>
>>>>> Thanks.
>>>>> --
>>>>> -----------------------------------------------------------------------
>>>>> Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
>>>>>
>>>>> You can choose a ready guide in some celestial voice.
>>>>> If you choose not to decide, you still have made a choice.
>>>>> You can choose from phantom fears and kindness that can kill;
>>>>> I will choose a path that's clear
>>>>> I will choose freewill. - Peart
>>>>>
>>>>> _______________________________________________
>>>>> pcp mailing list
>>>>> pcp@xxxxxxxxxxx
>>>>> http://oss.sgi.com/mailman/listinfo/pcp
>>
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
>>
>> You can choose a ready guide in some celestial voice.
>> If you choose not to decide, you still have made a choice.
>> You can choose from phantom fears and kindness that can kill;
>> I will choose a path that's clear
>> I will choose freewill. - Peart
|