Sorry Frank this is taking so long I'm kinda busy at the moment with
lots of projects on the go.
On 10/08/16 10:18, Frank Ch. Eigler wrote:
...
FWIW, the _unlocked() variant is used almost nowhere in the code.
OK, so perhaps the knock-on effect requiring additional changes was due
to something else.
> ...
Right; it traded a race condition for avoiding one deadlock.
It was not really a trade ... it was an oversight when I did the
original implementation.
Not quite: Deadlock (lock ordering) bugs were found in the code even
before this race condition was corrected. It just seemed like a good
time to fix it.
OK, I'd like to drill down on this. Do you recall what lock ordering
bugs? Do you have the recipe or evidence from experiments that
demonstrate this? I would like to understand if these are (a) flaws in
the original design, (b) errors in the original implementation, or (c)
problems introduced by changes since the original implementation, e.g.
derived metrics and connection logic and fetchgroups and ...
There have been a lot of changes in libpcp since the time the locks were
first introduced, but I think we should be able to avoid all deadlock
situations related to lock ordering.
Validity in the sense of non-freedness is probably not enough. The
delete/create race was only one possibility. We'd also have to ensure
the absence of *other* data races, e.g. precluding concurrent
pmNewContext's from assigning the same context / c_lock. Proving the
absence of bugs like this could be more difficult than accepting the
proposed (libpcp->c_lock) lock nesting and dealing with the fallout.
Sorry, I don't follow this. The c_lock protocol is simple enough and
local locking in context.c would guard against any multiple assignment
of context / c_lock.
...
pmDestroyContext() (OK) and __pmHandleToPtr() (there are 35 calls from
within libpcp, so that's going to take some auditing)
Yes, those 35 represent a problem.
They are not a problem, just require some auditing.
3. the array of pointers contexts[] (and the associated contexts_len)
are local to context.c ... so we can implement a new and local lock to
protect these [...]
FWIW, this part is roughly what I started on the pcpfans.git
fche/multithread2 branch.
OK, I is it worth trying to work through the issues that thwarted this
approach. I see it is using __pmHandleToPtr_unlocked() which I'd like
to avoid if possible, and the comment in __pmPtrToHandle() seems like it
would be worth changing the __pmContext struct to provide a
self-referencing "handle" and avoid the race and locking issues in
__pmPtrToHandle() ...
I'd do it with pthread_mutex_trylock. To assert non-lockedness,
assert that the trylock fails. To assert lockedness, assert that
trylock succeeds (and then release the lock).
If we're to keep libpcp_lock recursive, then we won't be able to have
as strong assertions about non-lockedness as otherwise.
For recursive locks, pthread_mutex_trylock() won't help establish if the
lock is, or is not, already held.
Getting rid of the recursive nature of __pmLock_libpcp is probably a
goal for another (follow-on) project, as that will likely involve
extensive re-factoring in the places where we call pm* routines
recursively from within libpcp.
|