Hi, Dave -
> This discussion seems to have stalled again. I assume that one or
> both of you (Frank and/or Ken) is familiar with this particular
> [__pmHandleToPtr] race. Has there been a solution proposed? If so,
> was there an implementation? If so, where is it? (branch + commit).
It's at the first commit you were looking at: 3193cc1c5abf8e1 ==
b2208f06b4e on pcpfans.git. It includes hunks for
libpcp/src/context.c that forks __pmHandleToPtr into a function that
locks the ctxp->c_lock **before** unlocking the __pmLock_libpcp (thus
resolving the race), and another function __pmHandleToPtr_unlocked
(which eschews locking, thus only to be used from functions that
already hold the relevant locks). The first part is key.
> >>and would like to understand that (if it exists) before we embark on
> >>a development path that relies on helgrind to find lock inversions
> >>rather than design to avoid lock inversions.
> >It sounds as though you are suspicious that helgrind is unreliable:
> >that the lock inversion errors are mistaken. Let me assure you that
> >every case I've studied, it was genuine.
> It might be helpful if you could point out each of these cases and how
> the inversion was addressed by your changes.
Well, there were many cases. In the git commits fixing them, I
generally described the nesting errors, outlined the testing procedure
that triggered them (usually the preexisting qa test cases), but
unfortunately didn't record the actual helgrind or gdb-crash message
for each. It would be quite a bit of work to recreate them all now.
That's OK, if starting with just an example or two are sufficient to
prove that helgrind is not just imagining the problems.
- FChE
|