pcp
[Top] [All Lists]

Re: [pcp] pcp update: pmcd agent auto-restart

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: [pcp] pcp update: pmcd agent auto-restart
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 19 Apr 2016 19:32:43 -0400 (EDT)
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20160415215730.GG21159@xxxxxxxxxx>
References: <20160414005241.GC23044@xxxxxxxxxx> <994226805.40108604.1460601050371.JavaMail.zimbra@xxxxxxxxxx> <20160415215730.GG21159@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: wiyXaLevUCZ0sU6EoQxe+nqGbckRXQ==
Thread-topic: pcp update: pmcd agent auto-restart

----- Original Message -----
> [...]
> Tweaked the first three with a followup patch, but not how you
> suggested (by adding an auto-restart override option), but by
> asserting auto-restart operation in the tests.

Yep, should be no problem provided existing functionality still tested
(which indeed seems to be the case, from my reading of the qa scripts).

Test qa/244 continues to fail consistently though...

$ ./check 244
244 31s ... - output mismatch (see 244.out.bad)
30c30
< hinv.ncpu: pmLookupDesc: No PMCD agent for domain of request
---
> hinv.ncpu: pmLookupDesc: IPC protocol failure
98,99d97
< [DATE] pmcd(PID) Warning: pduread: timeout (after 2.000 sec) while attempting 
to read 12 bytes out of 12 in HDR on fd=FD
< Cleanup "fake_irix" agent (dom 1): protocol failure for fd=FD, exit(0)

(same results, no matter how many times I run the test it seems, so I can
not seem to tickle any race here that might make it match your output)

I suspect the second diff hunk above is directly related to the first and
maybe local to your system (I don't understand why/how though).  Looking
at your change to the qa/244.out.* files:

@@ -27,7 +27,7 @@ disallow * : all;
 Expect "Unknown or illegal metric identifier" ...
 sampledso.control: pmLookupDesc: Unknown or illegal metric identifier
 Expect "IPC protocol failure" ...
-hinv.ncpu: pmLookupDesc: IPC protocol failure
+hinv.ncpu: pmLookupDesc: No PMCD agent for domain of request
 Expect 9 values available ...
 sample.bin 9 100 200 300 400 500 600 700 800 900


As the test states (line 30 above), we should see PM_ERR_IPC there, not
PM_ERR_NOAGENT (and this is indeed what I still see).  Odd though, the
updated output reports "timeout (after 2.000 sec)" twice here now - once
before "Auto-restarting agents." and then again after...?

I'll change that back to how it was, which will make the test pass here
and lets see if the buildbots match up with that (I think they might,
since this test was passing there before).  I don't follow the output
you're seeing though, which is a bit of a worry.  Was this test passing
for you before the pmcd code change?

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>