Hi Jeff, the cluster PMDA log 'cluster.log' on the head node is full
of errors, e.g. :
[Mon Sep 12 08:27:34] pmdacluster(14849) Error: pmdaFetch: PMID
65.1.13 not handled by fetch callback
which means cluster_fetchCallBack() isn't finding a matching PMID and
instance for the requested metric/instance in the cached pmResult for
each node. Are all your cluster nodes same arch/endian as the head
node? The code assumes this since we're sending binary pmResult
structures from each cluster node and caching them on the head node.
Here's a code snippet :
/*
* Now find the pmid and instance in the cached result. The domain and
* cluster for each PMID in the result will be for the sub-PMDA that
* returned it, so translate the pmDesc.pmID to match before comparing.
*/
idsp->domain = subdom_dom_map[idsp->subdomain];
idsp->subdomain = 0;
sts = PM_ERR_PMID;
for (i=0, r = tc->result; i < r->numpmid; i++) {
if (pmid_domain( r->vset[i]->pmid) != pmid_domain( pmda_pmid) ||
pmid_cluster(r->vset[i]->pmid) != pmid_cluster(pmda_pmid) ||
pmid_item( r->vset[i]->pmid) != pmid_item( pmda_pmid) )
continue;
/* found the pmid, now look for the instance */
sts = PM_ERR_INST;
for (j=0; j < r->vset[i]->numval; j++) {
v = &r->vset[i]->vlist[j];
if (indom_int->serial == CLUSTER_INDOM || v->inst ==
instp->node_inst) {
/*
* found
*/
if (r->vset[i]->valfmt == PM_VAL_INSITU)
memcpy(&atom->l, &v->value.lval, sizeof(atom->l));
else
pmExtractValue(r->vset[i]->valfmt, v,
v->value.pval->vtype, atom, v->value.pval->vtype);
return 1;
}
}
}
return sts;
Also, the other log asked for is /var/log/pcp/pmclusterd.log on one or
more of the cluster nodes. That log wont be present on the head node.
Please attach.
Regards
-- Mark
On Mon, Sep 12, 2016 at 11:40 PM, Jeff Hanson <jhanson@xxxxxxx> wrote:
> On 09/12/2016 12:31 AM, Mark Goodwin wrote:
>>>
>>> But the real problem is that although pmclusterd exposes some 100 metrics
>>> or
>>> so but only 20 of them are actually able to be fetched.
>>
>>
>
> 88 default metrics, 22 are fetched. IB ones seem to be an issue different
> issue
> from the rest.
>
>> Jeff, do you ever see "cluster_node_rw: spinning" in either
>> /var/log/pcp/pmcd/cluster.log or /var/log/pcp/pmclusterd.log ?
>
>
> No.
>
>> Can you send me these logs after reproducing the issue where only some
>> (20 out of 100) metrics can be fetched but the others report the
>> instance domain issue?
>>
>
> Attached.
>
>> Thanks
>> -- Mark
>>
>> On Thu, Sep 1, 2016 at 3:59 PM, Mark Goodwin <mgoodwin@xxxxxxxxxx> wrote:
>>>
>>> Hi Jeff, I don't think we ever open-sourced pmclusterd since it was
>>> (at the time) SGI ICE specific,
>>> so it's unlikely anyone outside SGI will know much about it.
>>>
>>> This is the daemon that aggregates indoms for per-cluster-node CPU
>>> data on the head node, so
>>> the client tools just monitor the head node, right? If that's the tool
>>> framework you're referring to,
>>> I always thought it was a bit of an abomination of the indom concept
>>> (even though I wrote it!),
>>> but designed it that way to be more scalable than monitoring every
>>> cluster node individually.
>>> WHat issues are you running in to?
>>>
>>> Regards
>>> -- Mark
>>>
>>>
>>> On Thu, Sep 1, 2016 at 2:43 AM, Jeff Hanson <jhanson@xxxxxxx> wrote:
>>>>
>>>> As we (SGI) explore what to do about the scaling issues with pmclusterd
>>>> as it is currently written I am exploring other options. For cluster
>>>> configurations are people generally running pmcd locally on the cluster
>>>> nodes
>>>> and logging to the node? Running pmcd locally on the cluster node with
>>>> another system as the logger? Other thoughts?
>>>>
>>>> Thanks.
>>>> --
>>>> -----------------------------------------------------------------------
>>>> Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
>>>>
>>>> You can choose a ready guide in some celestial voice.
>>>> If you choose not to decide, you still have made a choice.
>>>> You can choose from phantom fears and kindness that can kill;
>>>> I will choose a path that's clear
>>>> I will choose freewill. - Peart
>>>>
>>>> _______________________________________________
>>>> pcp mailing list
>>>> pcp@xxxxxxxxxxx
>>>> http://oss.sgi.com/mailman/listinfo/pcp
>
>
>
> --
> -----------------------------------------------------------------------
> Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
>
> You can choose a ready guide in some celestial voice.
> If you choose not to decide, you still have made a choice.
> You can choose from phantom fears and kindness that can kill;
> I will choose a path that's clear
> I will choose freewill. - Peart
|