pcp
[Top] [All Lists]

Re: [pcp] libpcp multithreading - next steps

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: [pcp] libpcp multithreading - next steps
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 12 Aug 2016 06:56:58 +1000
Cc: Dave Brolley <brolley@xxxxxxxxxx>, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20160810001805.GD13748@xxxxxxxxxx>
References: <20160603155039.GB26460@xxxxxxxxxx> <578D1AE1.6060307@xxxxxxxxxx> <y0my44xksjb.fsf@xxxxxxxx> <57965C89.40401@xxxxxxxxxx> <20160725203257.GG5274@xxxxxxxxxx> <5797B9F7.2020701@xxxxxxxxxx> <71790d7d-377e-c28e-0adf-57fb221c3539@xxxxxxxxxxxxxxxx> <y0mshuwhzt1.fsf@xxxxxxxx> <57A8D9B6.8050800@xxxxxxxxxx> <1abb6fe2-434e-1a3d-34c2-39a37eeabdba@xxxxxxxxxxxxxxxx> <20160810001805.GD13748@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
Sorry Frank this is taking so long I'm kinda busy at the moment with lots of projects on the go.

On 10/08/16 10:18, Frank Ch. Eigler wrote:
...
FWIW, the _unlocked() variant is used almost nowhere in the code.

OK, so perhaps the knock-on effect requiring additional changes was due to something else.

> ...
Right; it traded a race condition for avoiding one deadlock.

It was not really a trade ... it was an oversight when I did the original implementation.

Not quite: Deadlock (lock ordering) bugs were found in the code even
before this race condition was corrected.  It just seemed like a good
time to fix it.

OK, I'd like to drill down on this. Do you recall what lock ordering bugs? Do you have the recipe or evidence from experiments that demonstrate this? I would like to understand if these are (a) flaws in the original design, (b) errors in the original implementation, or (c) problems introduced by changes since the original implementation, e.g. derived metrics and connection logic and fetchgroups and ...

There have been a lot of changes in libpcp since the time the locks were first introduced, but I think we should be able to avoid all deadlock situations related to lock ordering.

Validity in the sense of non-freedness is probably not enough.  The
delete/create race was only one possibility.  We'd also have to ensure
the absence of *other* data races, e.g. precluding concurrent
pmNewContext's from assigning the same context / c_lock.  Proving the
absence of bugs like this could be more difficult than accepting the
proposed (libpcp->c_lock) lock nesting and dealing with the fallout.

Sorry, I don't follow this. The c_lock protocol is simple enough and local locking in context.c would guard against any multiple assignment of context / c_lock.

...
pmDestroyContext() (OK) and __pmHandleToPtr() (there are 35 calls from
within libpcp, so that's going to take some auditing)

Yes, those 35 represent a problem.

They are not a problem, just require some auditing.

3. the array of pointers contexts[] (and the associated contexts_len)
are local to context.c ... so we can implement a new and local lock to
protect these [...]

FWIW, this part is roughly what I started on the pcpfans.git
fche/multithread2 branch.

OK, I is it worth trying to work through the issues that thwarted this approach. I see it is using __pmHandleToPtr_unlocked() which I'd like to avoid if possible, and the comment in __pmPtrToHandle() seems like it would be worth changing the __pmContext struct to provide a self-referencing "handle" and avoid the race and locking issues in __pmPtrToHandle() ...


I'd do it with pthread_mutex_trylock.  To assert non-lockedness,
assert that the trylock fails.  To assert lockedness, assert that
trylock succeeds (and then release the lock).

If we're to keep libpcp_lock recursive, then we won't be able to have
as strong assertions about non-lockedness as otherwise.

For recursive locks, pthread_mutex_trylock() won't help establish if the lock is, or is not, already held.

Getting rid of the recursive nature of __pmLock_libpcp is probably a goal for another (follow-on) project, as that will likely involve extensive re-factoring in the places where we call pm* routines recursively from within libpcp.

<Prev in Thread] Current Thread [Next in Thread>