pcp
[Top] [All Lists]

Re: [pcp] libpcp multithreading - next steps

To: pcp@xxxxxxxxxxx
Subject: Re: [pcp] libpcp multithreading - next steps
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Mon, 18 Jul 2016 14:07:29 -0400
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20160603155039.GB26460@xxxxxxxxxx>
References: <20160603155039.GB26460@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
On 06/03/2016 11:50 AM, Frank Ch. Eigler wrote:
Hi -

I need some advice about how to proceed with my recent work fixing
various multithreading problems within libpcp.
[ ... ]
It seems to me that our options are:

0) status quo as of v3.11.2; tolerate hangs etc.

1) roll back even v3.11.2 context.c changes to v3.11.1; tolerate hangs
    and show-stopper pmNewContext performance

2) merge fche/multithread and stop there, handling future bugs as/when
    they appear

3) merge or rework libpcp parts of fche/multithread, and continue work
    piecemeal; agree now on docs/testing/merging criteria in order to
    liberate from constraints of preserving idiosyncracies of current
    code base (e.g., move toward much less sharing of data between
    contexts; simpler locking model; conceivable deprecation of some
    functionality in multithreaded apps)

4) declare that libpcp is not multithread safe; rearchitect our
    various programs without multithreading


Option 3 makes most sense to me: in time, we can have both
thread-safety & high performance.  Are y'all ready to discuss further?

Let's see if we can get this moving again.

I think that initial step of breaking up the global holding of __pmLock_libpcp during pmNewContext(3) was a good one, with a measurable goal. It was low risk, since all it was doing was releasing the lock for blocks for code for which it was not necessary. I think that what happened was that additional, more ambitious work was added to the branch before the initial work was fully reviewed and merged. The branch is now in a state where no one feels comfortable with the correctness and benefit of that additional work.

What I proposed is to back up a bit. Let's isolate the original loosening of the __pmLock_libpcp lock, measure the performance improvement for the scenario which inspired it and create some qa to verify it. If some of this has already been done, the let's re-review those results.

Once we get that bit taken care of, let's tackle the remaining pieces, one at a time from the same point of view: proposed benefit, correctness and verification of both. I suggest starting with a new branch so as to reduce the chance of errors.

I suppose that this sounds a lot like option 3 above. I think that there is some good work on that branch. We just need to get the ball rolling again.

Dave

<Prev in Thread] Current Thread [Next in Thread>