pcp
[Top] [All Lists]

Re: [pcp] libpcp multithreading - next steps

To: Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: [pcp] libpcp multithreading - next steps
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Mon, 8 Aug 2016 15:26:53 -0400
Cc: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <57A8D9B6.8050800@xxxxxxxxxx>
References: <20160603155039.GB26460@xxxxxxxxxx> <578D1AE1.6060307@xxxxxxxxxx> <y0my44xksjb.fsf@xxxxxxxx> <57965C89.40401@xxxxxxxxxx> <20160725203257.GG5274@xxxxxxxxxx> <5797B9F7.2020701@xxxxxxxxxx> <71790d7d-377e-c28e-0adf-57fb221c3539@xxxxxxxxxxxxxxxx> <y0mshuwhzt1.fsf@xxxxxxxx> <57A8D9B6.8050800@xxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi, Dave -

> This discussion seems to have stalled again. I assume that one or
> both of you (Frank and/or Ken) is familiar with this particular
> [__pmHandleToPtr] race. Has there been a solution proposed? If so,
> was there an implementation? If so, where is it? (branch + commit).

It's at the first commit you were looking at: 3193cc1c5abf8e1 ==
b2208f06b4e on pcpfans.git.  It includes hunks for
libpcp/src/context.c that forks __pmHandleToPtr into a function that
locks the ctxp->c_lock **before** unlocking the __pmLock_libpcp (thus
resolving the race), and another function __pmHandleToPtr_unlocked
(which eschews locking, thus only to be used from functions that
already hold the relevant locks).  The first part is key.


> >>and would like to understand that (if it exists) before we embark on
> >>a development path that relies on helgrind to find lock inversions
> >>rather than design to avoid lock inversions.

> >It sounds as though you are suspicious that helgrind is unreliable:
> >that the lock inversion errors are mistaken.  Let me assure you that
> >every case I've studied, it was genuine.

> It might be helpful if you could point out each of these cases and how 
> the inversion was addressed by your changes.

Well, there were many cases.  In the git commits fixing them, I
generally described the nesting errors, outlined the testing procedure
that triggered them (usually the preexisting qa test cases), but
unfortunately didn't record the actual helgrind or gdb-crash message
for each.  It would be quite a bit of work to recreate them all now.
That's OK, if starting with just an example or two are sufficient to
prove that helgrind is not just imagining the problems.


- FChE

<Prev in Thread] Current Thread [Next in Thread>