jhanson wrote:
> [...]
> Actually it's not pmdacluster that crashes but pmcd.
> pmcd log
> [...]
> [Tue Aug 16 13:14:56] pmcd(19253) Warning: pduread: timeout (after 5.000 sec)
> while attempting to read 12 bytes out of 12 in HDR on fd=19
> [Tue Aug 16 13:14:56] pmcd(19253) Info: CleanupAgent ...
> Cleanup "cluster" agent (dom 65): protocol failure for fd=19
OK, that's a pretty garden-variety situation, where a pmda request takes
more than 5s to gather the requested info. pmcd hangs up on it.
> I have an strace of the pmcd process which doesn't (yet) show me anything
> interesting. So new general question - pmcd debugging hints?
An strace of the cluster-pmda process would be more informative. It
may show the necessity of latency-tolerance measures such as what we
do for pmdarpm (background worker threads).
Come to think of it, there are few PMDAs that have NOT been hit by
this issue at some point. I wonder if it's time that a more systemic
solution be invented (not just restarting timed-out pmdas).
- FChE
|