pcp
[Top] [All Lists]

Re: libpcp multithreading - next steps

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: libpcp multithreading - next steps
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Tue, 26 Jul 2016 15:28:55 -0400
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20160725203257.GG5274@xxxxxxxxxx>
References: <20160603155039.GB26460@xxxxxxxxxx> <578D1AE1.6060307@xxxxxxxxxx> <y0my44xksjb.fsf@xxxxxxxx> <57965C89.40401@xxxxxxxxxx> <20160725203257.GG5274@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
On 07/25/2016 04:32 PM, Frank Ch. Eigler wrote:
Take for example, as reported back three months ago at [1]: the 4751
test case, as committed, was failing in that the last run timed out
and failed to produce output.

[1]  http://oss.sgi.com/archives/pcp/2016-04/msg00202.html

See the current git head qa/4751{,.out} and 3.11.3 released binaries
fedora 24.  On my machine, the 4751 test now happens to run
successfully most of the time.  But:

     valgrind --tool=helgrind ./src/multithread10 [....]

(where [....] is the 157 archives/ip-addresses for the last part of the
test case) produces numerous errors:

==12842== Thread #3: lock order "0x90839D0 before 0x52DA1E0" violated
==12842==
==12842== Observed (incorrect) order is: acquisition of lock at 0x52DA1E0
==12842==    at 0x4C2FE9D: mutex_lock_WRK (hg_intercepts.c:901)
==12842==    by 0x4C33D01: pthread_mutex_lock (hg_intercepts.c:917)
==12842==    by 0x50ABE82: __pmLock (lock.c:278)
==12842==    by 0x506D87B: pmDestroyContext (context.c:1494)
==12842==    by 0x5072C01: pmDestroyFetchGroup (fetchgroup.c:1653)
==12842==    by 0x109059: thread_fn (multithread10.c:65)
==12842==    by 0x4C32A24: mythread_wrapper (hg_intercepts.c:389)
==12842==    by 0x4E455C9: start_thread (pthread_create.c:333)
==12842==    by 0x53E2EAC: clone (clone.S:109)
==12842==
==12842==  followed by a later acquisition of lock at 0x90839D0
==12842==    at 0x4C2FE9D: mutex_lock_WRK (hg_intercepts.c:901)
==12842==    by 0x4C33D01: pthread_mutex_lock (hg_intercepts.c:917)
==12842==    by 0x50ABE82: __pmLock (lock.c:278)
==12842==    by 0x506D8C0: pmDestroyContext (context.c:1507)
==12842==    by 0x5072C01: pmDestroyFetchGroup (fetchgroup.c:1653)
==12842==    by 0x109059: thread_fn (multithread10.c:65)
==12842==    by 0x4C32A24: mythread_wrapper (hg_intercepts.c:389)
==12842==    by 0x4E455C9: start_thread (pthread_create.c:333)
==12842==    by 0x53E2EAC: clone (clone.S:109)
==12842==
[...]

Every such lock order report is a potential deadlock site, several of
which have been actually observed to occur.  Every one represents a
design flaw.
OK. I'm perhaps showing my misunderstanding of the content of your branch. I had understood that several new specialized locks had been introduced in an effort to improve performance, as opposed to fixing bugs. Perhaps that is also the case (i.e. both kinds of changes exist on the branch).

So let's start with the change referenced by [1] above. I'll have a look at it.

Dave

<Prev in Thread] Current Thread [Next in Thread>