pcp
[Top] [All Lists]

Re: [pcp] libpcp multithreading - next steps

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: [pcp] libpcp multithreading - next steps
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Tue, 9 Aug 2016 20:18:05 -0400
Cc: Dave Brolley <brolley@xxxxxxxxxx>, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1abb6fe2-434e-1a3d-34c2-39a37eeabdba@xxxxxxxxxxxxxxxx>
References: <20160603155039.GB26460@xxxxxxxxxx> <578D1AE1.6060307@xxxxxxxxxx> <y0my44xksjb.fsf@xxxxxxxx> <57965C89.40401@xxxxxxxxxx> <20160725203257.GG5274@xxxxxxxxxx> <5797B9F7.2020701@xxxxxxxxxx> <71790d7d-377e-c28e-0adf-57fb221c3539@xxxxxxxxxxxxxxxx> <y0mshuwhzt1.fsf@xxxxxxxx> <57A8D9B6.8050800@xxxxxxxxxx> <1abb6fe2-434e-1a3d-34c2-39a37eeabdba@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi, Ken -

Thanks for digging in.

> [...]
> There is a solution in Frank's tree, but I don't like the solution 
> because __pmHandleToPtr_unlocked() seems like a problem waiting to 
> explode abd indeed there appear to be several subsequent commits dealing 
> with fallout that seem to be implicated to this change.

FWIW, the _unlocked() variant is used almost nowhere in the code.


> I think there is a simpler solution.
> 
> The original lock design was that the big global lock __pmLock_libpcp 
> had to be recursive (the structure of nested calls in libpcp mandates 
> this) and that a context was only locked when __pmLock_libpcp was NOT 
> locked (this avoided deadlocks [...]

Right; it traded a race condition for avoiding one deadlock.


> To remove the context create-destroy race which is the root cause 
> trigger for these locking changes I think we can ...

Not quite: Deadlock (lock ordering) bugs were found in the code even
before this race condition was corrected.  It just seemed like a good
time to fix it.


> 1. check the assertion that each context struct is NOT relocated or 
> free'd, so once you have a __pmContext pointer that will remain valid 
> forever

Validity in the sense of non-freedness is probably not enough.  The
delete/create race was only one possibility.  We'd also have to ensure
the absence of *other* data races, e.g. precluding concurrent
pmNewContext's from assigning the same context / c_lock.  Proving the
absence of bugs like this could be more difficult than accepting the
proposed (libpcp->c_lock) lock nesting and dealing with the fallout.


> 2. if the c_lock is only acquired when __pmLock_libpcp is NOT locked, 
> we're OK ... this happens in only 2 places in context.c 
> pmDestroyContext() (OK) and __pmHandleToPtr() (there are 35 calls from 
> within libpcp, so that's going to take some auditing)

Yes, those 35 represent a problem.


> 3. the array of pointers contexts[] (and the associated contexts_len) 
> are local to context.c ... so we can implement a new and local lock to 
> protect these [...]

FWIW, this part is roughly what I started on the pcpfans.git
fche/multithread2 branch.


> [...]
> As an aside, it would be nice if we could add lock assertions in the 
> code, but I don't know of a way to do this for pthread mutexes ... in 
> particular for 2. above ... any hints in this area would be most 
> gratefully received.

I'd do it with pthread_mutex_trylock.  To assert non-lockedness,
assert that the trylock fails.  To assert lockedness, assert that
trylock succeeds (and then release the lock).

If we're to keep libpcp_lock recursive, then we won't be able to have
as strong assertions about non-lockedness as otherwise.


- FChE

<Prev in Thread] Current Thread [Next in Thread>