On 07/25/2016 04:32 PM, Frank Ch. Eigler wrote:
Take for example, as reported back three months ago at [1]: the 4751
test case, as committed, was failing in that the last run timed out
and failed to produce output.
[1] http://oss.sgi.com/archives/pcp/2016-04/msg00202.html
See the current git head qa/4751{,.out} and 3.11.3 released binaries
fedora 24. On my machine, the 4751 test now happens to run
successfully most of the time. But:
valgrind --tool=helgrind ./src/multithread10 [....]
(where [....] is the 157 archives/ip-addresses for the last part of the
test case) produces numerous errors:
==12842== Thread #3: lock order "0x90839D0 before 0x52DA1E0" violated
==12842==
==12842== Observed (incorrect) order is: acquisition of lock at 0x52DA1E0
==12842== at 0x4C2FE9D: mutex_lock_WRK (hg_intercepts.c:901)
==12842== by 0x4C33D01: pthread_mutex_lock (hg_intercepts.c:917)
==12842== by 0x50ABE82: __pmLock (lock.c:278)
==12842== by 0x506D87B: pmDestroyContext (context.c:1494)
==12842== by 0x5072C01: pmDestroyFetchGroup (fetchgroup.c:1653)
==12842== by 0x109059: thread_fn (multithread10.c:65)
==12842== by 0x4C32A24: mythread_wrapper (hg_intercepts.c:389)
==12842== by 0x4E455C9: start_thread (pthread_create.c:333)
==12842== by 0x53E2EAC: clone (clone.S:109)
==12842==
==12842== followed by a later acquisition of lock at 0x90839D0
==12842== at 0x4C2FE9D: mutex_lock_WRK (hg_intercepts.c:901)
==12842== by 0x4C33D01: pthread_mutex_lock (hg_intercepts.c:917)
==12842== by 0x50ABE82: __pmLock (lock.c:278)
==12842== by 0x506D8C0: pmDestroyContext (context.c:1507)
==12842== by 0x5072C01: pmDestroyFetchGroup (fetchgroup.c:1653)
==12842== by 0x109059: thread_fn (multithread10.c:65)
==12842== by 0x4C32A24: mythread_wrapper (hg_intercepts.c:389)
==12842== by 0x4E455C9: start_thread (pthread_create.c:333)
==12842== by 0x53E2EAC: clone (clone.S:109)
==12842==
[...]
Every such lock order report is a potential deadlock site, several of
which have been actually observed to occur. Every one represents a
design flaw.
OK. I'm perhaps showing my misunderstanding of the content of your
branch. I had understood that several new specialized locks had been
introduced in an effort to improve performance, as opposed to fixing
bugs. Perhaps that is also the case (i.e. both kinds of changes exist on
the branch).
So let's start with the change referenced by [1] above. I'll have a look
at it.
Dave
|