https://bugzilla.redhat.com/show_bug.cgi?id=1334815
Bug ID: 1334815
Summary: pmcd pmda auto-restart fails if failure encountered
during restart
Product: Fedora
Version: rawhide
Component: pcp
Assignee: nathans@xxxxxxxxxx
Reporter: fche@xxxxxxxxxx
QA Contact: extras-qa@xxxxxxxxxxxxxxxxx
CC: brolley@xxxxxxxxxx, fche@xxxxxxxxxx, lberk@xxxxxxxxxx,
mgoodwin@xxxxxxxxxx, nathans@xxxxxxxxxx,
pcp@xxxxxxxxxxx, scox@xxxxxxxxxx
Picture this. pcp 3.11.2, happily steaming along, until one of its pmdas
(usually proc or linux) times out. The pcp 3.11.2 pmcd code responds by
restarting the pmda. Normally that's fine, but what if the restart fails, by
another timeout right then? Then pmcd is unaware, and its auto-restart logic
doesn't trigger until the indefinite future (since the AgentDied flag is
cleared). This has been observed in the wild.
One possible cure is this patch, which passes hand-testing (running a tight
killall -9 or -STOP loop against a target pmda), but needs more thought &
probably proper QA:
diff --git a/src/pmcd/src/config.c b/src/pmcd/src/config.c
index 04d9db8bdb4f..ef92ce3230c0 100644
--- a/src/pmcd/src/config.c
+++ b/src/pmcd/src/config.c
@@ -1532,6 +1532,8 @@ AgentNegotiate(AgentInfo *aPtr)
else
fprintf(stderr, "pmcd: error at initial PDU exchange with "
"%s PMDA: %s\n", aPtr->pmDomainLabel, pmErrStr(sts));
+
+ AgentDied = 1; /* signal to request auto-restart */
return PM_ERR_IPC;
}
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug
https://bugzilla.redhat.com/token.cgi?t=rXsR4cWnon&a=cc_unsubscribe
|