pcp
[Top] [All Lists]

Re: [pcp] pcp updates

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, pcp@xxxxxxxxxxx
Subject: Re: [pcp] pcp updates
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Thu, 05 May 2016 11:00:21 -0400
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <572AF7B8.2060909@xxxxxxxxxxxxxxxx>
References: <572AF7B8.2060909@xxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
See my email re: qa test 449. Is that test working ok for you?

Dave

On 05/05/2016 03:35 AM, Ken McDonell wrote:
Some of Frank's changes that I've been testing ...

Changes committed to git://git.pcp.io/kenj/pcp master

Frank Ch. Eigler (5):
       multithreading qa/4751
       PR1055: handle some multithreaded deadlocks & race conditions
       multithreaded testing: ipc debugging messages
       qa/4751 multithread: create new PCP_DEBUG subtest
       libpcp multithreading: un-nest tz_lock

  qa/4751                      |   38 +++++-
  qa/4751.out                  |  258 
++++++++++++++++++++++++++++++++++++-------
  src/include/pcp/impl.h       |    2
  src/libpcp/src/check-statics |   43 ++++---
  src/libpcp/src/context.c     |   54 ++++++---
  src/libpcp/src/ipc.c         |   75 ++++++------
  src/libpcp/src/logutil.c     |   43 ++++---
  src/libpcp/src/pdu.c         |   28 +++-
  src/libpcp/src/pdubuf.c      |   47 ++++---
  src/libpcp/src/tz.c          |   61 +++++-----
  10 files changed, 462 insertions(+), 187 deletions(-)

Details ...

commit 3b142c27d87d1eb1077ba890f657a598491bbd6d
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon Apr 25 09:44:45 2016 -0400

     libpcp multithreading: un-nest tz_lock
libpcp's historical use of recursive libpcp lock has allowed patterns
     of carefree intercalling of lock-taking functions.  With normal
     non-recursive locks, that's instant deadlock.  Remove nested locking
     in purely unnecessary cases.

commit f268647abab2417b240982536da054962b39ae28
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 19:23:26 2016 -0400

     qa/4751 multithread: create new PCP_DEBUG subtest
Running the big final test with PCP_DEBUG=-1 can slow it down
     enough to occasionally fail.  Add an intermediate length test
     that runs quicker but still covers a swath of context types.
Some higher values of PCP_DEBUG invoke taking locks in a nested,
     order-violating fashion.  This patch brings local lock goodness to
     libpcp/src/tz.c, moves dumping outside locking in pdubuf.c, and
     extends qa/4751 to test two sets of PCP_DEBUG runs.  DBG_TRACE_PDU
     is particularly vulnerable because it does (locky) PMNS ops.

commit f2d79d62b38572970ea05cb0ee4d7787c8a03b4e
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 18:35:31 2016 -0400

     multithreaded testing: ipc debugging messages
Similar to commit c7e9299f6a03, the ipc.c tracing operations also need
     to be moved outside the new non-recursive locks.  qa/4751 runs the
     last test with PCP_DEBUG=-1 to try to stress this aspect.

commit e7ff0fa0c00de729388bf9e15ff085bec7ffeddf
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 15:17:00 2016 -0400

     PR1055: handle some multithreaded deadlocks & race conditions
While running the qa/4751 test case at full scale, deadlocks reliably
     occur.  (In fact, the 4751.out file was initially checked in truncated
     due to an alarm() catching the deadlocked run, producing no output.)
     The same type of deadlock is also easily demonstrated on stock
     previous-version libpcp, so it exculpates the recent pmNewContext
     multithreading changes.
The valgrind "helgrind" tool is good at identifying problems of this
     nature, and should be routinely used for verifying code that deals
     with PM_*LOCK.
The gist of one problem is inconsistent lock ordering. The libpcp
     lock is sometimes taken nested within a context c_lock; and sometimes
     vice versa.  Two threads can easily lock each other out.  helgrind
     showed multiple different scenarios where the libpcp lock was taken
     unnecessarily by lower level code - where a smaller lock was
     sufficient.  This patchset adds a handful of small, non-recursive
     locks for these.
This patch also includes a fix to a nastier race condition in
     __pmHandleToPtr(), whereby a context-destruction could race against
     context-structure lookup.  Some work remains in the multi-archive code
     and elsewhere to avoid two mildly racy functions (__pmPtrToHandle and
     the new __pmHandleToPtr_unlocked).
qa/4751 and all other prexisting thread-group test cases look good
     now, no more deadlocks or lock-ordering-error reports there at least.
     (There are likely more hiding in the code: the libpcp lock is way
     overused.)

commit d2821e10df47c721aeea24cfd274d8494cd34026
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 14:55:25 2016 -0400

     multithreading qa/4751
Tweak the qa/4751 test case so that different unreachable-host type
     error codes are mapped to a uniform one.  Generate an actual proper
     output for the last test (the one with some 156 contexts/threads).

_______________________________________________
pcp mailing list
pcp@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/pcp

<Prev in Thread] Current Thread [Next in Thread>