Hi, Ken -
Thanks for digging in.
> [...]
> There is a solution in Frank's tree, but I don't like the solution
> because __pmHandleToPtr_unlocked() seems like a problem waiting to
> explode abd indeed there appear to be several subsequent commits dealing
> with fallout that seem to be implicated to this change.
FWIW, the _unlocked() variant is used almost nowhere in the code.
> I think there is a simpler solution.
>
> The original lock design was that the big global lock __pmLock_libpcp
> had to be recursive (the structure of nested calls in libpcp mandates
> this) and that a context was only locked when __pmLock_libpcp was NOT
> locked (this avoided deadlocks [...]
Right; it traded a race condition for avoiding one deadlock.
> To remove the context create-destroy race which is the root cause
> trigger for these locking changes I think we can ...
Not quite: Deadlock (lock ordering) bugs were found in the code even
before this race condition was corrected. It just seemed like a good
time to fix it.
> 1. check the assertion that each context struct is NOT relocated or
> free'd, so once you have a __pmContext pointer that will remain valid
> forever
Validity in the sense of non-freedness is probably not enough. The
delete/create race was only one possibility. We'd also have to ensure
the absence of *other* data races, e.g. precluding concurrent
pmNewContext's from assigning the same context / c_lock. Proving the
absence of bugs like this could be more difficult than accepting the
proposed (libpcp->c_lock) lock nesting and dealing with the fallout.
> 2. if the c_lock is only acquired when __pmLock_libpcp is NOT locked,
> we're OK ... this happens in only 2 places in context.c
> pmDestroyContext() (OK) and __pmHandleToPtr() (there are 35 calls from
> within libpcp, so that's going to take some auditing)
Yes, those 35 represent a problem.
> 3. the array of pointers contexts[] (and the associated contexts_len)
> are local to context.c ... so we can implement a new and local lock to
> protect these [...]
FWIW, this part is roughly what I started on the pcpfans.git
fche/multithread2 branch.
> [...]
> As an aside, it would be nice if we could add lock assertions in the
> code, but I don't know of a way to do this for pthread mutexes ... in
> particular for 2. above ... any hints in this area would be most
> gratefully received.
I'd do it with pthread_mutex_trylock. To assert non-lockedness,
assert that the trylock fails. To assert lockedness, assert that
trylock succeeds (and then release the lock).
If we're to keep libpcp_lock recursive, then we won't be able to have
as strong assertions about non-lockedness as otherwise.
- FChE
|