pcp
[Top] [All Lists]

libpcp multithreading - next steps

To: pcp developers <pcp@xxxxxxxxxxx>
Subject: libpcp multithreading - next steps
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Fri, 3 Jun 2016 11:50:39 -0400
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mutt/1.4.2.2i
Hi -

I need some advice about how to proceed with my recent work fixing
various multithreading problems within libpcp.

We have been presenting libpcp as multithread-safe, but it just isn't.
A variety of problems (lock ordering -> deadlocks, race conditions)
exist in the source code, some of them acknowledged in black & white
in the code, and some reported as bugs for years.  Some of them are
hiding by virtue of passing pcpqa tests, but it's a false comfort
because these tests barely scratch the surface.  pmmgr / pmwebd /
qa-4751 can be much more thorough.

I've dived into the code somewhat deeply in the last few weeks, and
started some cleanup - see the pcpfans.git fche/multithread branch.
Despite a good showing on pcpqa, the code is being held up.  I
appreciate the review comments from the few folks who looked over it,
but those conversations have ground to a halt.  Admittedly, the work
is incomplete, but a path needs to be agreed-upon in order to justify
expending further effort.


It seems to me that our options are:

0) status quo as of v3.11.2; tolerate hangs etc.

1) roll back even v3.11.2 context.c changes to v3.11.1; tolerate hangs
   and show-stopper pmNewContext performance 

2) merge fche/multithread and stop there, handling future bugs as/when
   they appear

3) merge or rework libpcp parts of fche/multithread, and continue work
   piecemeal; agree now on docs/testing/merging criteria in order to
   liberate from constraints of preserving idiosyncracies of current
   code base (e.g., move toward much less sharing of data between
   contexts; simpler locking model; conceivable deprecation of some
   functionality in multithreaded apps)

4) declare that libpcp is not multithread safe; rearchitect our
   various programs without multithreading


Option 3 makes most sense to me: in time, we can have both
thread-safety & high performance.  Are y'all ready to discuss further?


- FChE

<Prev in Thread] Current Thread [Next in Thread>
  • libpcp multithreading - next steps, Frank Ch. Eigler <=