On 09/12/2016 07:57 PM, Mark Goodwin wrote:
Hi Jeff, the cluster PMDA log 'cluster.log' on the head node is full
of errors, e.g. :
[Mon Sep 12 08:27:34] pmdacluster(14849) Error: pmdaFetch: PMID
65.1.13 not handled by fetch callback
which means cluster_fetchCallBack() isn't finding a matching PMID and
instance for the requested metric/instance in the cached pmResult for
each node. Are all your cluster nodes same arch/endian as the head
node? The code assumes this since we're sending binary pmResult
structures from each cluster node and caching them on the head node.
Here's a code snippet :
/*
* Now find the pmid and instance in the cached result. The domain and
* cluster for each PMID in the result will be for the sub-PMDA that
* returned it, so translate the pmDesc.pmID to match before comparing.
*/
idsp->domain = subdom_dom_map[idsp->subdomain];
idsp->subdomain = 0;
sts = PM_ERR_PMID;
for (i=0, r = tc->result; i < r->numpmid; i++) {
if (pmid_domain( r->vset[i]->pmid) != pmid_domain( pmda_pmid) ||
pmid_cluster(r->vset[i]->pmid) != pmid_cluster(pmda_pmid) ||
pmid_item( r->vset[i]->pmid) != pmid_item( pmda_pmid) )
continue;
/* found the pmid, now look for the instance */
sts = PM_ERR_INST;
for (j=0; j < r->vset[i]->numval; j++) {
v = &r->vset[i]->vlist[j];
if (indom_int->serial == CLUSTER_INDOM || v->inst ==
instp->node_inst) {
/*
* found
*/
if (r->vset[i]->valfmt == PM_VAL_INSITU)
memcpy(&atom->l, &v->value.lval, sizeof(atom->l));
else
pmExtractValue(r->vset[i]->valfmt, v,
v->value.pval->vtype, atom, v->value.pval->vtype);
return 1;
}
}
}
return sts;
Also, the other log asked for is /var/log/pcp/pmclusterd.log on one or
more of the cluster nodes. That log wont be present on the head node.
There are no entries from this time period which is why I didn't attach it.
Nor did I explain that before. Sorry.
Please attach.
Regards
-- Mark
On Mon, Sep 12, 2016 at 11:40 PM, Jeff Hanson <jhanson@xxxxxxx> wrote:
On 09/12/2016 12:31 AM, Mark Goodwin wrote:
But the real problem is that although pmclusterd exposes some 100 metrics
or
so but only 20 of them are actually able to be fetched.
88 default metrics, 22 are fetched. IB ones seem to be an issue different
issue
from the rest.
Jeff, do you ever see "cluster_node_rw: spinning" in either
/var/log/pcp/pmcd/cluster.log or /var/log/pcp/pmclusterd.log ?
No.
Can you send me these logs after reproducing the issue where only some
(20 out of 100) metrics can be fetched but the others report the
instance domain issue?
Attached.
Thanks
-- Mark
On Thu, Sep 1, 2016 at 3:59 PM, Mark Goodwin <mgoodwin@xxxxxxxxxx> wrote:
Hi Jeff, I don't think we ever open-sourced pmclusterd since it was
(at the time) SGI ICE specific,
so it's unlikely anyone outside SGI will know much about it.
This is the daemon that aggregates indoms for per-cluster-node CPU
data on the head node, so
the client tools just monitor the head node, right? If that's the tool
framework you're referring to,
I always thought it was a bit of an abomination of the indom concept
(even though I wrote it!),
but designed it that way to be more scalable than monitoring every
cluster node individually.
WHat issues are you running in to?
Regards
-- Mark
On Thu, Sep 1, 2016 at 2:43 AM, Jeff Hanson <jhanson@xxxxxxx> wrote:
As we (SGI) explore what to do about the scaling issues with pmclusterd
as it is currently written I am exploring other options. For cluster
configurations are people generally running pmcd locally on the cluster
nodes
and logging to the node? Running pmcd locally on the cluster node with
another system as the logger? Other thoughts?
Thanks.
--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart
_______________________________________________
pcp mailing list
pcp@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/pcp
--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart
--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart
|