On 06/01/15 02:56, Martins Innus wrote:
> Hi,
> I'm bringing up a bunch of VMs to do QA testing, and am seeing
> about 50 failures per machine. ...
Welcome to the club! I am currently running at an average failure rate of 5
per machine ... this is up a little over the long term average as I have not
given it much attention for the last 150 (or thereabouts) runs. So, we should
be able to quickly and easily get your failure rate down to about 1/10th what
you're seeing now.
I'd start by running qa/admin/check-vm on each VM ... this will pick up the bc
dependency, but also a whole lot of other obscure dependencies (like perl
modules you need from cpan on most distros).
We certainly could add the bc package dependency, but this would be needed for
all of the packaging options (not just rpm) and there are other dependencies
that go beyond packaging and some of the platforms don't even have real
packaging (like those using tar) ... so no objection in principle, but it will
only address a small part of the issues you're seeing.
Note that check-vm also knows about build dependencies so you'll see (a) build
failure causing problems before Makepkgs blunders into them, and (b) you'll see
the optional functionality that you're not including and testing. To ensure the
build remains robust and the qa "notrun" controls are robust when optional
prereqs not installed, we don't want all the prereqs on all platforms (this is
impossible because some are platform-specific), but increasing the number
optional prereqs installed will reduce the number of "not run" tests.
Also, checkout the qa/README (this will explain $PCPQA_CLOSE_X_SERVER) and run
the chk.setup script in the qa directory ... this is much older than the
qa/admin/check-vm script, but does a different class of pre-QA checks.
And audit common.config.
> So far I've found 3 that are pretty common:
>
> 1. "bc" is required by a bunch of tests but the testsuite package
> doesn't depend on it. ...
>
> 2. Some QA tests, for instance 276, try to run X based tests if the gui
> package is installed regardless of whether an Xserver is running. Should
> these not run? The relevant code seems to be in common.check:
> ...
This is OK once $PCPQA_CLOSE_X_SERVER is set appropriately ... if you don't
have xdpyinfo installed, then we have to assume $PCPQA_CLOSE_X_SERVER is
correct and "xhost +" has been run there.
> 3. Again on Ubuntu 14.04. Probably 20 tests fail with the following:
>
Hmm ... this looks like a permissions issue ... I run the QA as my user id out
of the git tree (that is force of habit, not a requirement) and what you're
doing is OK. I don't have 039 failing anywhere, nor any test failure that
matches your signature
kenj@bozo:~/Logs/by-vm$ find . -name "*.out" | wc -l
146
kenj@bozo:~/Logs/by-vm$ grep -r __pmBind .
./vm02/qa/533.full:auxconnect.c:__pmBind(fd=5, family=10, port=6261, addr=::1)
I'd wait and see if the various check scripts uncover anything interesting that
might explain this (note there is an earlier failure on the 039 case, before
the __pmBind lines).
Also of interest (maybe) are the scripts in qa/admin - pcp-daily and
pcp-qa-summary - these are my cat herder controls for running lots of QA
machines.
|