pcp
[Top] [All Lists]

RE: [pcp] qa/518 tweaks on pcpfans.git fche/dev

To: "'Frank Ch. Eigler'" <fche@xxxxxxxxxx>
Subject: RE: [pcp] qa/518 tweaks on pcpfans.git fche/dev
From: "Ken McDonell" <kenj@xxxxxxxxxxxxxxxx>
Date: Sun, 2 Nov 2014 07:11:15 +1100
Cc: "'pcp developers'" <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20141101010427.GF1913@xxxxxxxxxx>
References: <20141031201304.GE1913@xxxxxxxxxx> <002c01cff56c$6febc830$4fc35890$@internode.on.net> <20141101010427.GF1913@xxxxxxxxxx>
Thread-index: AQIWIQMIOeKcOGylRqOgmw6wi16M5QIkql93Aj2/Pb+bnMRKEA==
G'day Frank.

I've had another look at this test.

1. I think the non-determinism Frank observes can be handled in a different
way that does not make the test run for longer ... in other places the test
is correct if we get the expected outcome for N or N-1 or N+1 iterations.
I'll explore this and work with Frank (off the list, as no one else probably
cares and Frank has the environment in which the test was failing).

2. I am not sure why the kill's are there at all (thanks for pointing out
the extra one that I was not expecting) ... seems to me we can do a better
job of filtering the output to ignore the other pmie instances (there is one
block of output from pcp -P per pmie instance).  I'll take a look at this.

3. Frank's observations about pmie and signals is a bit concerning, although
I see evidence of "try TERM and if that does not work try KILL and repeat
until successful or timeout" in the pmie init script, so perhaps this is a
long standing problem that has been masked by hackery.  pmie does have a
TERM signal handler and a delayed exit but only after the nanosleep() ... so
if we are blocked somewhere else, or don't abandon expression evaluation
completely when an I/O returns with EINTR then we could be off in the weeds
long enough for some script to believe pmie has not died.

Any insight into 3. would be helpful.

> -----Original Message-----
> From: Frank Ch. Eigler [mailto:fche@xxxxxxxxxx]
> Sent: Saturday, 1 November 2014 12:04 PM
> To: Ken McDonell
> Cc: 'pcp developers'
> Subject: Re: [pcp] qa/518 tweaks on pcpfans.git fche/dev
> 
> ...
> I'll try to trace it with something like systemtap.  (Even with pmmgr I
> encountered cases where a single SIGTERM sent to pmie was blocked/ignored,
> so sudo is probably not a necessary component of the
> problem.)

<Prev in Thread] Current Thread [Next in Thread>