pcp
[Top] [All Lists]

Re: libpcp multithreading - next steps

To: Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: libpcp multithreading - next steps
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Tue, 19 Jul 2016 21:27:36 -0400
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <578D1AE1.6060307@xxxxxxxxxx> (Dave Brolley's message of "Mon, 18 Jul 2016 14:07:29 -0400")
References: <20160603155039.GB26460@xxxxxxxxxx> <578D1AE1.6060307@xxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
brolley wrote:

> [...]  I think that initial step of breaking up the global holding
> of __pmLock_libpcp during pmNewContext(3) was a good one, with a
> measurable goal. It was low risk, since all it was doing was
> releasing the lock for blocks for code for which it was not
> necessary. I think that what happened was that additional, more
> ambitious work was added to the branch before the initial work was
> fully reviewed and merged. [...]

Which initial work are you referring to?


> What I proposed is to back up a bit. Let's isolate the original
> loosening of the __pmLock_libpcp lock, measure the performance
> improvement for the scenario which inspired it and create some qa to
> verify it. If some of this has already been done, the let's re-review
> those results.

If you're referring to commit efc0173ad era stuff from three months
ago, which were merged into 3.11.3, the performance effects are
obvious by inspection, and borne out by QA (e.g., qa/4751) and of
course hand testing.  (Concurrent pmNewContext PM_CONTEXT_HOST's no
longer serialized each other when contacting absent hosts.)

As previously stated on this list, that same test case indicated
instances where libpcp locking logic was incorrect, even before the
3.11.3 era changes, resulting in deadlocks.  As previously stated,
4751 can hang/crash 3.11.2, 3.11.3, and git master pcp, depending on
invocation.  In return, it was called "flakey" and commented out of
qa/group.  (I hadn't found an invocation that could hang/crash the
fche/multithread branch libpcp though.)


> Once we get that bit taken care of, let's tackle the remaining
> pieces, one at a time from the same point of view: proposed benefit,
> correctness and verification of both. 

The post-3.11.3 changes' proposed benefits were entirely the
correction of race conditions and/or elimination of deadlocks.  There
was no performance-tweaking argument proposed.  Absolute correctness
is tricky to guarantee / verify.  Incorrectness is clearer: "valgrind
--tool=helgrind" has been helpful in identifying at least some of the
preexisting problems.  It was by that tool's reports that many of the
subsequent small-lock patches were driven.


> I suggest starting with a new branch so as to reduce the chance of
> errors.

Can you explain what you mean?  What would you like to see on this new
branch?


- FChE

<Prev in Thread] Current Thread [Next in Thread>