Ken,
On 1/27/15 5:15 PM, Ken McDonell wrote:
How is the proc PMDA installed (process or dso)? This suggests pmcd
<--> client timeout, not pmcd <--> pmda timeout (which cannot happen
for dso pmdas!).
There are two different timeouts in play here: -t or
pmcd.control.timeout for pmcd and the $PMCD_*_TIMEOUT family. Which
are you using and what values are it/they set to?
Sorry for the lack of background, I had spoken to Nathan and Frank about
this previously. All timeouts are set to defaults, running the pmda as
a daemon.
The main issue I'm trying to solve is that for us, when a system gets
heavily loaded (seems to correlate to high I/O) and we have pmlogger
grabbing metrics from the proc pmda at regular intervals, we get the
following in the pmcd.log:
[Thu Jan 15 10:29:25] pmcd(15873) Warning: pduread: timeout (after 5.000
sec) while attempting to read 12 bytes out of 12 in HDR on fd=11
[Thu Jan 15 10:29:25] pmcd(15873) Info: CleanupAgent ...
Cleanup "proc" agent (dom 3): protocol failure for fd=11
I'd like to not increase the timeout since then we are reporting
incorrect timestamps for collected data, so I was going to use pmie to
restart pmcd when the pmda dies. Since there is no way to abandon a
request, I think we are better off getting no data and then trying again
after a restart.
Turns out, there is no problem with pmie, just my understanding of its
interaction with pmcd. Although, I did find an issue that i describe in
my other email in this thread.
But this is the basic timeout problem we are trying to solve.
Thanks
Martins
|