pcp
[Top] [All Lists]

[Bug 1036] pmcd should not permanently give up on tardy pmdas

To: pcp@xxxxxxxxxxx
Subject: [Bug 1036] pmcd should not permanently give up on tardy pmdas
From: bugzilla-daemon@xxxxxxxxxxx
Date: Tue, 19 Nov 2013 00:50:53 +0000
Auto-submitted: auto-generated
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <bug-1036-835@xxxxxxxxxxxxxxxx/bugzilla/>
References: <bug-1036-835@xxxxxxxxxxxxxxxx/bugzilla/>
changed bug 1036
What Removed Added
CC   nathans@debian.org

Comment # 1 on bug 1036 from
So, my 2c - as discussed on IRC, I don't really think of this as improving the
situation.  PMDA tardiness is a domain-specific problem, and to my mind
tackling it (as best one can, its not generally solvable & ultimately practical
tradeoffs end up being made) is thus best left to the individual PMDAs - IMO. 
It'd also add more code to pmcd, for what I feel is an error/corner case, which
also adds to my reluctance.

Finally, it feels like we'd be taking a stance of being more accepting of
mediocrity (these delays reduce the quality/accuracy of the data we export) -
if we are to add code, I'd prefer it to be along the lines of helping to find
root causes of those latency problems.  Perhaps timing mechanisms and new pmcd
metrics to help identify those PMDAs which are suffering latency spikes.  Even
if timeouts are not reached, where pmcd is seen to "overreact", those PMDAs are
still contributing to overall reduction of quality in terms of value/timestamp
accuracy.

Also as discussed on IRC, we could implement a scheme where pmie is used to
trigger restarts on those PMDAs that are timed out, using pmcd.agent.status. 
Counter-point being: pmie runs unprivileged, thus it can only sighup and not
restart pmcd (which means root/non-pcp PMDAs get no love). 
Counter-counter-point: a scheme where pmie touches a file in a safe place
(probably not a world-writable-sticky-bit set directory), could be checked by a
trivial root cronjob and restarted thusly.  Pretty horrifying, but then so is
being accepting of PMDA tardiness IMO. :)


In other news, I wonder if we should consider adding pmcd.agent.uid metrics to
export the user account identifier under which each PMDA is running?


You are receiving this mail because:
  • You are on the CC list for the bug.
<Prev in Thread] Current Thread [Next in Thread>