On 07/13/2014 10:00 PM, Nathan Scott wrote:
Hi Dave,
----- Original Message -----
On 07/09/2014 03:38 AM, Nathan Scott wrote:
I've pulled all your current work, but had some problems with a
few tests (see attached bad files). I found reverting the commit
below got us back into a healthy state, so pushed this for now -
when you figure it out, just "git revert" on my revert should do
a revert revert & then just apply your followup fix on top).
I just redid the change in my pcpfans brolley/dev branch. That in addition
to the fix which was to add EINPROGRESS handling to a few callers of
__pmConnect().
Ah, OK, sounds promising. From a full QA run though I have new failures
in tests 230, 775, 835, and 946 that all appear related to these changes.
835 is the new pmdamemcache test, but it was running reliably on Friday
- with the latest pull, not so much (it now times-out during the initial
pmcd connection).
OK. Sorry about that. It turns out that expecting callers of
__pmConnect() to handle EIPROGRESS with no timeout was the wrong way to
go. It turns out that callers who expect non-blocking connections always
call __pmConnectTo() which sets the FNDELAY flag and that callers who
expect blocking connections call __pmConnect() directly. So what is
really needed is for __pmConnect() and __pmConnectTo() to provide the
expected semantics. The commit below does that once and for all.
It fixes test 230 for me, but I don't seem to be set properly to run
test 835 ([not run] Noones home on the default memcached port 11211)
commit 122effcc48ba680e67311156e6ee61ae28922df1
Author: Dave Brolley <brolley@xxxxxxxxxx>
Date: Mon Jul 14 13:57:39 2014 -0400
Abstract __pmConnect() into __pmConnect and __pmConnectWithFNDELAY().
In PCP, control over whether a connection attempt made via
__pmConnect() blocks or not is controlled by setting the
FNDELAY (O_NONBLOCK) file status flag in for the file
descriptor. However, NSPR's PR_Connect ignores the FNDELAY
flags (more accurately, always sets it) and uses an explicit
timeout argument to control whether the connection attempt
blocks and, if so, for how long. The NSPR implementation of
__pmConnect(), therefore has no way of knowing what the caller's
intent was.
To solve this, we need an explicit __pmConnectWithFNDELAY(). This
new function is internal to libpcp and not exported. It is called
from within __pmConnectTo() which is the function called within
PCP when non-blocking behavior is needed.
Code within PCP that calls __pmConnect() directly does not expect
non-blocking behavior and so does not need to handle EINPROGRESS.
This commit reverts a previous change
(commit commit a705d6bb86d87fa0cdf0fb0accf5299212cef08d)
for which such handling was added.
This commit fixes regressions in qa tests 230 and 835.
I've attached all the bad files (ip addr for the host
is 192.168.122.1 - for the 775 and 946 failures).
I'm not quite sure what to do about these two failures. They are
actually a problem with the tests themselves. As part of these tests,
the -r and --resolve options to pmfind(1) are tested, however, in this
case, the address 192.168.122.1 is an address on which a legitimate
service has been discovered, but for which the DNS resolution fails.
Because of this, it is reported unresolved and the filters in the tests
report a failure. I'm not quite sure how else to test whether the -r and
--resolve options were respected.
Dave
|