First round of changes ... the lock-related ones are probably serious
enough to be considered as potential 3.6.4 release fodder ... but there
will be a second round to come for pmdumplog and pmlogextract (at least)
for some really ugly problems (seen on OpenIndiana) associated with
arithmetic errors caused by using a double to represent a timestamp ...
this has to be excised.
Changes committed to git://oss.sgi.com/kenj/pcp.git dev
build/sun/pcp.xml | 58 +++++++++++++++
src/dbpmda/src/util.c | 13 +++
src/include/pcp/impl.h | 12 +++
src/libpcp/src/check-statics | 1
src/libpcp/src/context.c | 79 +++++++++++++++++++--
src/libpcp/src/derive.c | 46 +++++++++++-
src/libpcp/src/interp.c | 1
src/libpcp/src/lock.c | 142 +++++++++++++++++++++++++++++++++++----
src/libpcp/src/util.c | 19 ++++-
src/libpcp_fault/README | 25 ++++++
src/libpcp_fault/src/GNUmakefile | 9 +-
src/pmdas/linux/cgroups.c | 31 +++++++-
src/pmdas/linux/pmda.c | 5 -
src/pmdas/trace/stub.c | 2
src/pmdbg/pmdbg.c | 2
src/pmdumplog/pmdumplog.c | 8 ++
src/pmie/src/show.c | 2
src/pmlogextract/pmlogextract.c | 7 +
src/pmlogger/fetch.c | 9 +-
src/pmlogger/pmlogger.c | 38 ++++++++--
20 files changed, 460 insertions(+), 49 deletions(-)
commit 3ea8062065e212dcdc3a1dcd993b6b208554f7f6
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sun May 20 08:49:38 2012 +1000
libpcp_fault.so - make symlink to libpcp.so
Only in workarea, not installed. Needed for QA (see 512 for bizarre
explanation).
commit 12679bfc7e18aeca8c34be37e5b4a17206cf222f
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat May 19 22:24:36 2012 +1000
Add checks for all pthread_*() calls
Check return values for all pthread_*() routines ... mostly
related to mutexes.
If the call fails report and exit(4) ... it is better to drop
dead than blunder on believing we're thread-safe when one of the
pthread_*() routines fails.
-
commit 29dbf1a2fc7bd89014732a47c491508d304c0303
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat May 19 21:35:04 2012 +1000
pmlogextract - bad lock recursion
Similar to the pmlogger case, this is a 3.6 regression as fallout
from making libpcp thread-safe on most platforms. I checked
all of the similar cases in libpcp, but missed this one when an
application (pmlogextract) is using an internal libpcp API that
has locking side-effects.
We were calling __pmHandleToPtr() (which returns with the context
locked) and not being careful about releasing the context lock
(c_lock) ... in the worst case, this was happening once per input
log record (data or metadata).
The fix is to release the context lock as soon as we've finished
poking around with the context.
commit 205af1e345a5cac21d58b7acdaf315f2040404ea
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat May 19 21:32:33 2012 +1000
dbpmda - add comment to explain why context remains locked throughout
commit 326ab916b583af358ff6a034793b5a07334a4fe9
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat May 19 18:11:03 2012 +1000
pmdumplog - add comment to explain why context remains locked throughout
commit 09278bb2b198e47ef333ac4a456069d8c1208aa5
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat May 19 17:56:45 2012 +1000
Enable lock debug tracing for libpcp_fault.
This variant of libpcp for QA includes both the fault injection
infrastructure _and_ the new lock debug tracing features.
commit a084372a4f7de89838b1d536dbd39f6d49408a31
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat May 19 17:42:17 2012 +1000
Add lock trace debugging for libpcp.
This is _not_ enabled by default. Need to #define PM_MULTI_THREAD_DEBUG
and rebuild libpcp.
When this is done, all PM_LOCK() and PM_UNLOCK() calls are
intercepted and optional diagnostics produced to report who is
locking/unlocking what and report unusual lock counts (except
for recursive locking we expect the lock count to be 0 before
PM_LOCK() and 1 before PM_UNLOCK()) ... reporting is enabled by
the [new] -Dlock debug option in concert with the -Dappl? options
to selectively enable diagnostics for particular lock classes -
appl0 for the libpcp global lock, appl1 for the per-context locks
and appl2 for everything else (including the ipc channel lock).
PCP QA is the only likely consumer of these features, unless we
have problems that _really_ look like they are lock-related.
commit 9c6c2d5dc430020fc2956aa36d6e14132b29d91a
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat May 19 17:21:04 2012 +1000
pmlogger - bad lock recursion
This is a 3.6 regression as fallout from making libpcp thread-safe on
most platforms. I checked all of the similar cases in libpcp, but missed
this one when an application (pmlogger) is using an internal libpcp API
that has locking side-effects.
We were calling __pmHandleToPtr() (which returns with the context locked)
and not being careful about releasing the context lock (c_lock) ... in the
worst case, this was in pmlogger's special fetch routine, so deeper
locking recursion every time we fetch metrics to be logged.
For many platforms, recursive locking seems to be allowed without obvious
limit (probably 2^32 or 2^64 depending on how the mutex is implemented),
but on Solaris (more specifically OpenIndiana) there appears to be a limit
of 500 or so ... which we can hit easily in QA.
The fix is to release the context lock as soon as we've finished poking
around with the context, and if necessary re-acquire the context pointer
and context lock later, then release it again ... repeat until bored.
commit 324673d77526f113e61af53233c5317e9efdeb1e
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri May 18 06:51:48 2012 +1000
Solaris init scripts rework.
The pcp.xml manifest was missed in the pcp -> pmcd and pmlogger
changes for the init scripts in PCP 3.6.
commit 71fbef22283e13c008ce4acc524dde4a203a1c83
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri May 18 06:51:16 2012 +1000
interp.c - fix small mem leak on unlikely error path
commit 686b96525ab3d26a6389ad7da821d4306a98c0ad
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Thu May 17 17:22:18 2012 +1000
trace demo stub.c - sys_nerr is not all that portable!
commit cd45147c3f60648b1898a785debf74360a93e6dd
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Tue May 15 08:32:37 2012 +1000
pmlogger/pmlc/libpcp - fix segv in diag dark corner
pmlc and pmlogger use pmResults in a perverse ways to encode
logging state ... this was causing pmlogger to dump core when
it was running with full diagnostics enabled and pmlc enquired
about the logging state for event record metrics!
Found by accident when investigating a QA 139 failure on one host,
a new QA 510 now checks explicitly for this unusual combination.
commit c51adc15849173f65cfba9f1abca6ea7f0d39a2a
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Tue May 15 08:31:38 2012 +1000
pmie - NaN fixup
Small error in the last round of NaN changes ... conditional code when
fpclassify() is not available was wrong.
commit 2fc49a724119505d80121fe004b4455f7a63937f
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Mon May 14 06:03:43 2012 +1000
linux pmda - fix unaligned memory access for cgroup.subsys.hierarchy
Code was incorrectly using opaque pointer to private data with
the pmdaCache*() routines causing unaligned memory accesses on
ia64 platforms.
Although this problem is fixed, this code still seems a bit
dodgey and would appear to do the wrong thing if /proc/cgroups
ever contained more than one subsystem hierarchy with the same
subsystem name in more than one hierarchy (not sure if that
is possible).
|