On 06/03/2016 11:50 AM, Frank Ch. Eigler wrote:
Hi -
I need some advice about how to proceed with my recent work fixing
various multithreading problems within libpcp.
[ ... ]
It seems to me that our options are:
0) status quo as of v3.11.2; tolerate hangs etc.
1) roll back even v3.11.2 context.c changes to v3.11.1; tolerate hangs
and show-stopper pmNewContext performance
2) merge fche/multithread and stop there, handling future bugs as/when
they appear
3) merge or rework libpcp parts of fche/multithread, and continue work
piecemeal; agree now on docs/testing/merging criteria in order to
liberate from constraints of preserving idiosyncracies of current
code base (e.g., move toward much less sharing of data between
contexts; simpler locking model; conceivable deprecation of some
functionality in multithreaded apps)
4) declare that libpcp is not multithread safe; rearchitect our
various programs without multithreading
Option 3 makes most sense to me: in time, we can have both
thread-safety & high performance. Are y'all ready to discuss further?
Let's see if we can get this moving again.
I think that initial step of breaking up the global holding of
__pmLock_libpcp during pmNewContext(3) was a good one, with a measurable
goal. It was low risk, since all it was doing was releasing the lock for
blocks for code for which it was not necessary. I think that what
happened was that additional, more ambitious work was added to the
branch before the initial work was fully reviewed and merged. The branch
is now in a state where no one feels comfortable with the correctness
and benefit of that additional work.
What I proposed is to back up a bit. Let's isolate the original
loosening of the __pmLock_libpcp lock, measure the performance
improvement for the scenario which inspired it and create some qa to
verify it. If some of this has already been done, the let's re-review
those results.
Once we get that bit taken care of, let's tackle the remaining pieces,
one at a time from the same point of view:
proposed benefit, correctness and verification of both. I suggest
starting with a new branch so as to reduce the chance of errors.
I suppose that this sounds a lot like option 3 above. I think that there
is some good work on that branch. We just need to get the ball rolling
again.
Dave
|