Recently SGI had a customer note that cluster pmda only produced 64 entries
no matter if there were 144 nodes that could have been reporting metrics.
Upon examination this was because we had hard coded 64. I made a simple
change to hard code (I know, I know I'll fix this better later) to 288.
Customer using python bindings and sampling at 10 second intervals a few
metrics gets a sigpipe of pmdacluster. He has 144 nodes. He also
reports that with
On the rack leader do
pmval cluster.mem.physmem &
pmval cluster.mem.freemem
with default 1sec polling interval, both metrics give values for all nodes.
after a while pmdacluster crashes, then the pmval only give (as expected):
pmval: pmFetch: No PMCD agent for domain of request
pmval: pmFetch: No PMCD agent for domain of request
pmval: pmFetch: No PMCD agent for domain of request
pmval: pmFetch: No PMCD agent for domain of request
On a similar sized system I have yet to reproduce. Assuming at some
point I can what would be good debugging tools for a sigpipe in a pmda?
Or general pmda debugging tools?
Thanks.
--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer
You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart
|