On 09/08/16 05:12, Dave Brolley wrote:
...
This discussion seems to have stalled again. I assume that one or both
of you (Frank and/or Ken) is familiar with this particular race. Has
there been a solution proposed? If so, was there an implementation? If
so, where is it? (branch + commit).
From my end it has been blocked by QA wild fire duties ... thankfully
that is settling down.
There is a solution in Frank's tree, but I don't like the solution
because __pmHandleToPtr_unlocked() seems like a problem waiting to
explode abd indeed there appear to be several subsequent commits dealing
with fallout that seem to be implicated to this change.
I think there is a simpler solution.
The original lock design was that the big global lock __pmLock_libpcp
had to be recursive (the structure of nested calls in libpcp mandates
this) and that a context was only locked when __pmLock_libpcp was NOT
locked (this avoided deadlocks ... I've skipped some details relating to
other locks, but I believe they are not relevant in this discussion).
To remove the context create-destroy race which is the root cause
trigger for these locking changes I think we can ...
1. check the assertion that each context struct is NOT relocated or
free'd, so once you have a __pmContext pointer that will remain valid
forever
2. if the c_lock is only acquired when __pmLock_libpcp is NOT locked,
we're OK ... this happens in only 2 places in context.c
pmDestroyContext() (OK) and __pmHandleToPtr() (there are 35 calls from
within libpcp, so that's going to take some auditing)
3. the array of pointers contexts[] (and the associated contexts_len)
are local to context.c ... so we can implement a new and local lock to
protect these ... this separates the protection of the map (contexts[])
from the protection of the contexts themselves (2. above), but may
require additional c_lock locking in context.c (as we won't have the big
lock protection any more)
I've gone through all of the code in context.c and at first blush this
seems like it will work, will be thread-safe and involves relatively
little disturbance to the rest of libpcp.
I'm willing to try an implementation, but only after we've had some
discussion here to refine/confirm the general plan of attack.
As an aside, it would be nice if we could add lock assertions in the
code, but I don't know of a way to do this for pthread mutexes ... in
particular for 2. above ... any hints in this area would be most
gratefully received.
|