Hi Ken,
----- Original Message -----
> Things are looking better, but not yet back to where they were 6 months ago.
>
> 1108 is a mystery ... we get 2 primary pmloggers started from pmlogger_check
> (this is not supposed to happen, ever!). The failure is non-deterministic.
> I've been unable to track it down ... most likely it will be a race
> triggered by some earlier QA test (could be a long time before 1108 I think)
> and no one else notices until 1108 stumbles along.
Only clue I've come across so far is the second logger always seems to be
started 6 minutes after the first. The search continues though.
> 361 has gone a bit under the radar ... it is not passing _anywhere_ as it is
> not run or skipped (-) in the (new) full report on all but 4 hosts and it
> fails on the 4 hosts on which it is run. Note %fail is percentage of all
> hosts, not just percentage of hosts on which the test was run, which is why
> %fail for 361 is 11% and not 100%.
Fixed now.
> Apart from that, there are odd failures all over the landscape which make it
> very hard to progress any of this in a dramatic fashion ... if you really
> care about any of the failing tests below, I'd appreciate any assistance you
> could offer to smack 'em into submission.
381 is possibly due to pmlogger being more resilient to pmcd &| pmda restarts
now ... but I'd have expected it to see the same failure signature everywhere?
That 581 failure we've talked about before too I think - seems to be sensitive
to number of open fds in pmcd, and I wonder if this is related to that timeout
change from awhile back where we open multiple connections at once? I think
the right fix is to expect a range of fds in order 12-20 or so? (depends on
network config as to max #fds observable, if that theory is correct).
823 I'm certain is also a _notrun candidate - some versions of SASL seem buggy
and a newly created user becomes oddly invisible. It may be worth collecting
"pmconfig -L sasl_version" from the failing machines and looking for a pattern
that could be squashed by _notrun? Certainly passes reliably for me on recent
SASL library versions anyway.
cheers.
--
Nathan
|