pcp
[Top] [All Lists]

Re: [pcp] pcp updates: more multithreaded fixes and then some

To: Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: [pcp] pcp updates: more multithreaded fixes and then some
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Tue, 10 May 2016 16:15:06 -0400
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <57323473.4090604@xxxxxxxxxx>
References: <20160508205432.GA7399@xxxxxxxxxx> <57323473.4090604@xxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi -


> I've a more detailed look at this now. A couple of questions [...]

Thanks!


> In derive.c, you release the lock around several calls to PMAPI 
> functions. Can you confirm that none of the static state protected by 
> the lock needs to be preserved across those calls? i.e. that this is not 
> an opportunity for another thread to take control and change that static 
> state destructively.

It's not obvious from looking at the code: this is what I was
referring to as "trades the possibility of data races for the
elimination of deadlocks" in the commit message.  I don't think those
particular cases are harmful, but I would appreciate mgoodwin / kenj
thinking about it, specifically about the smallest possible critical
sections for the registered.mutex.

Unfortunately, the locks in multiple parts of libpcp were not
rigorously thought out in the sense of representing ownership of a
particular piece of data.  Planning lock nesting was eschewed in
favour of recursive locks, but that could not protect against
deadlocks.  Some race conditions were tolerated, but since older
multithreading tests didn't stress those corners, we have had a false
sense of confidence.

I think things are getting better, and the smaller-grained locking is
a sound approach, but a few bits of "technical debt" have come due,
and we should just keep muddling through.


> In logutil.c: __pmLogLoadLabel, according to their man pages,
> dirname(3) and basename(3) are not thread safe since they may return
> pointers to reusable static memory.  [...]

Good catch.  It appears implementation-dependent whether these
functions modify their input strings (which is what the code
anticipates, in the form of copying the strings into auto/heap vars),
or whether they copy into a static buffer internally.  Linux glibc
does the former, so is safe.  For other platforms, a lock would be
nice.  Not the big libpcp lock, just a little local one; just added to
the branch.


> qa/449 is still behaving erratically for me. I'm sporadically getting an 
> output mismatch:
> 
> 449 - output mismatch (see 449.out.bad)
> 93a95
> > traverse: found 1052 metrics, sts PMNS not accessible

Can you describe your test machine more (e.g., # CPUs)?
env PCP_DEBUG=65535 ..../qa/src/multithread4 ?


- FChE

<Prev in Thread] Current Thread [Next in Thread>