pcp
[Top] [All Lists]

[Bug 1334815] New: pmcd pmda auto-restart fails if failure encountered d

To: pcp@xxxxxxxxxxx
Subject: [Bug 1334815] New: pmcd pmda auto-restart fails if failure encountered during restart
From: bugzilla@xxxxxxxxxx
Date: Tue, 10 May 2016 15:03:44 +0000
Auto-submitted: auto-generated
Delivered-to: pcp@xxxxxxxxxxx
https://bugzilla.redhat.com/show_bug.cgi?id=1334815

            Bug ID: 1334815
           Summary: pmcd pmda auto-restart fails if failure encountered
                    during restart
           Product: Fedora
           Version: rawhide
         Component: pcp
          Assignee: nathans@xxxxxxxxxx
          Reporter: fche@xxxxxxxxxx
        QA Contact: extras-qa@xxxxxxxxxxxxxxxxx
                CC: brolley@xxxxxxxxxx, fche@xxxxxxxxxx, lberk@xxxxxxxxxx,
                    mgoodwin@xxxxxxxxxx, nathans@xxxxxxxxxx,
                    pcp@xxxxxxxxxxx, scox@xxxxxxxxxx



Picture this.  pcp 3.11.2, happily steaming along, until one of its pmdas
(usually proc or linux) times out.  The pcp 3.11.2 pmcd code responds by
restarting the pmda.  Normally that's fine, but what if the restart fails, by
another timeout right then?  Then pmcd is unaware, and its auto-restart logic
doesn't trigger until the indefinite future (since the AgentDied flag is
cleared).  This has been observed in the wild.

One possible cure is this patch, which passes hand-testing (running a tight
killall -9 or -STOP loop against a target pmda), but needs more thought &
probably proper QA:


diff --git a/src/pmcd/src/config.c b/src/pmcd/src/config.c
index 04d9db8bdb4f..ef92ce3230c0 100644
--- a/src/pmcd/src/config.c
+++ b/src/pmcd/src/config.c
@@ -1532,6 +1532,8 @@ AgentNegotiate(AgentInfo *aPtr)
     else
        fprintf(stderr, "pmcd: error at initial PDU exchange with "
                "%s PMDA: %s\n", aPtr->pmDomainLabel, pmErrStr(sts));
+
+    AgentDied = 1; /* signal to request auto-restart */
     return PM_ERR_IPC;
 }

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug 
https://bugzilla.redhat.com/token.cgi?t=rXsR4cWnon&a=cc_unsubscribe
<Prev in Thread] Current Thread [Next in Thread>