kenj wrote:
> [...]
>> ... (Please note that the robustness of
>> multithreading support in libpcp* may not be as good as its sole user
>> (pcpqa) may lead one to believe; I recall seeing some problems in an
>> earlier review. These bugs might crawl out from under the rocks if we
>> kick them over.)
>
> Can you point me to that review?
Sorry, I only noted some in passing (on IRC). Just going for a random
drive-by in libpcp/src ...
- access.c, getClientIds(), a mismatched PM_UNLOCK in the loop around
line 1295. (By the way, the 'myhostid' variable is probably a
symptom of more inappropriate FQDN assumptions that will need to be
fixed.)
- context.c, pmNewContext(), contexts*, old_*context* used unprotected
in FAILED: path.
- context.c, pmDupContext(), quite possibly unsafe if another thread
is manipulating the oldcon at the same time; by the way, is that
condition (same pmUseContext by different threads) detected /
forbidden / permitted? The __pmHandleToPtr() ctxp->c_lock design
sounds fine, but it's not used in all context-structure users
- context.c __pmHandleToPtr race condition between the libpcp unlock
and the context lock (what if another thread deletes the context)
during this time?) (Flipping the locks around could maybe cause
deadlocks OTOH; __pmLogFetchInterp nests the libpcp lock within the
ctxp lock already.)
- interp.c cache_read: race condition between PM_UNLOCK and use of
lfup pointer; another thread may have nuked that cache[] slot in the
mean time; this general pattern recurs several places in the code
(look for UNLOCK followed by a return FOO->BAR, e.g. logmeta.c)
- util.c: using libpcp lock for all kinds of printing, possibly
nesting within context locks -> possible deadlock
- loop.c: no locking at all
... anyway, this was just about 30 mins' random browsing, not a
systematic audit. Even with that, a couple of non-trivial problems
showed up.
As difficult as it is to prove the absence of bugs in general, when
you add concurrency to it, it's many times worse. We should accept
the idea that if functionality is only used by small synthetic pcpqa
test cases, we should not trust it too much (see also pmDupContext).
After a more thorough audit & bug-fixing pass, we could seriously
multithread something like pmchart or pmie or pmwebd to get some
confidence. But even then, latent concurrency bugs won't make
themselves obvious.
- FChE
|