pcp
[Top] [All Lists]

Re: [pcp] PCP JMX PMDA

To: Marko Myllynen <myllynen@xxxxxxxxxx>, Paul Smith <psmith@xxxxxxxxxx>
Subject: Re: [pcp] PCP JMX PMDA
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Fri, 1 Apr 2016 20:32:52 -0400 (EDT)
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <56FC0E5B.3040708@xxxxxxxxxx>
References: <56D8858A.3020407@xxxxxxxxxx> <56E05862.7040707@xxxxxxxxxx> <282702840.33546644.1458721199633.JavaMail.zimbra@xxxxxxxxxx> <56F940C7.2080909@xxxxxxxxxx> <CF911C98-E061-46AF-A8DB-10A5361C413B@xxxxxxxxxx> <56FC0E5B.3040708@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: ux0rN1UkZ8tqWovotRY61OGfHX1esA==
Thread-topic: PCP JMX PMDA
Hi guys,

----- Original Message -----
> >> [...]
> I think for the sake of those following the thread but not knowing the
> history it could be mentioned that the earlier PCP/Java PMDA was the
> pmdajstat [1] which was basically a parser for jstat(1) [2] output:

Oh yep, +1 - thanks Marko.  It was a parser for multiple jstats for
multiple monitored apps, with various life-cycles relative to the
life-cycle of the PMDA.  And modeling all metrics using the apps as
instance names.

Similar in those respects, but far less grand than pmdajmx.  Also, with
complexities such that we tossed it after a few months of production
use.  Direct Parfait use inside java apps took off at that site so the
jstat PMDA ended up redundant and on the scrapheap.

> >> Yes, I think none of the other PMDAs come close to JSON/JMX PMDAs
> >> in terms of their flexibility and generic nature.

Hmm, well, I would consider the MMV PMDA to go beyond both in terms of
being flexible and generic.  Its light years ahead on the performance
curve and has the many-years-of-actual-production-use factor too.

> > - Threads are inherently more complex than not using threads.  :)
> >   Threads in both pmdajmx and its java helper is ... alot of threads.
>
> I don't think it's fair to criticize about pdmajmx using one extra
> thread when we all know it was to workaround the Perl PMDA API bug I
> discovered [5]. Now that it's fixed [6] we could perhaps eliminate that
> thread but OTOH I'm not sure would it be worth it after all, going thru
> libpcp/Perl PMDA API instead of having everything in-pmda is not
> necessarily that much simpler. But either way suits me, no biggie.

Ah, crossed wires there - I wasn't referring to that issue.

The point was, there are threads *everywhere* in one approach - pmdajmx
perl code is multi-threaded, the java helper is multi-threaded, and the
java apps themselves are always multi-threaded (for anything beyond the
basic "hello world" kind of java app).  In the other approach, we piggy
back in via an additional thread in the (already-threaded) java app but
nothing more than that (no new PCP code, and MMV doesn't need threads).

> >> For comparison, how would the corresponding
> >> diagram look like with the Parfait agent approach you suggest
> >> below? Would it be something like:
> >> 
> >> pmdammv <-> parfait{-agent,} <-...-> multiple-java-apps
> > 
> > Not quite, [...] I would probably write it as:
> > 
> > pmdammv <-> { parfait-agent<->java app},  {parfait-agent<->java app},
> > ...
> 
> Ok, thanks for the clarification. Do we already know how this would look
> like with multiple Java apps?

Yes, the ASCII art from Paul is showing multiple Java apps.  If we consider
the "<->" to be IPC (shared memory for pmdammv, sockets/pipes for pmdajmx),
and "+" to show things being in-one-process, it's:

pmcd+pmdammv.so <-> {java-app+parfait-agent},{java-app+parfait-agent},...

and

pmcd <-> perl+pmdajmx <-> java+jmx-helper <-> {java-app},{java-app},...

> 
> PCP User/Admin guides don't give any examples how to actually use MMV
> 

(The Programmers Guide is where this is hiding, and I'm sure it could be
further improved.  That would help all programmers, for all languages,
not only Java.)

> > 
> > I'm not sure in the days of even VM's with minimum RAM sizes in GB
> > that this memory discussion goes very far, however...
> > 
> > IT's significantly less than the overhead of another JVM.  My
> > WetFinger is Parfait is several MB's compared with the overhead of
> > the full JVM of at least 10's of MBs.  While I personally don't think
> > the overhead of another JVM is that big a deal, I think the
> > _perception_ of having a a second JVM for monitoring being 'heavy' is
> > definitely real.
> 
> Ok, I guess we can do some more precise measurements later once the code
> matures (if anyone cares that much, I agree that we're not seeing
> anything anywhere near the show-stopper category here).
> 

FWIW, I don't agree ... neither the perceived impact (thats not just that
from my POV as a PCP developer trying to design/write efficient code, but
from sysadmins/prodops folk administering systems that I've observed too)
nor the actual impact are problems we should dismiss as non-critical IMO.

> [...] the mere installation of tools.jar would
> be somehow problematic (as in getting the jar in place)

This is more an issue about not forcing people to install the full JDK, when
the JRE would do & perhaps is what they use today - telling prodops folk they
must now install the JDK as well - and e.g. into a minimal app container - so
that they can monitor an app?  Its a problem, not to be lightly dismissed IMO.

OTOH, providing a package with a standalone parfait-agent jar in it ... works
for both JRE and JDK installs, has no dependency chain (not even on PCP), and
is easy to fit into that hypothetical app's container build.

> even more of a problem for Parfait since Parfait isn't available for
> some/most major distributions (e.g., Fedora / CentOS/EPEL / RHEL)?

Hmm, this logic doesn't really make sense to me - we will be packaging either
one or the other or both - pcp-pmda-jmx (pulling in a tools.jar & jdk rpm dep
& perl dep & pcp-libs dep) and/or a pcp-parfait noarch Java-only package (with
no deps).

pmdajmx is not available in some/most/any major distributions either, so we
can't suggest that Parfait not being there is a reason to favour one over
the other.

> As a
> Fedora user I expect that if PCP is to support Java monitoring, then all
> I need to do is something along the lines "yum install ... ; cd ... ;
> ./Install".

(this is two steps more than I would want to do as a user - see below).

> Or is the plan to embed Parfait in PCP so that the
> parfait-agent.jar possibly coming with PCP would contain all of Parfait
> and its dependencies?

Right, "embed" in the same way we embed Vector and all of its dependencies
I'd expect.  I think we've crossed-wires here in terms of what "supported"
means to each of us...

> > - eventually allow PCP maintainers to focus on the core PCP components,
> >   and Java gurus to focus on the Java components in a real Java project
>
> Again, this is of course a tempting approach if you merely look at
> pcp.git but who's going to fix the issues users will eventually find
> out? Users don't care whether it's a Spring issue or an MMV PMDA issue
> or something in between, they either get the metrics from PCP or not.

I was thinking of "supported" more in the PCP upstream developer sense not
the Linux-distribution / Red-Hat-as-a-company / any-other-end-user sense.

That involves pcp/qa test writing, running, maintaining for a rather vast
matrix of java versions & having PCP maintainers fixing QA failures there.
It involves different build/install toolchains, different static analysis
tools, different ... well, different everything really.  Different people
with different backgrounds and skill sets, too, for the most part.

This model has been shown to work extremely well with Vector.  The PCP /
Vector relationship makes for a good analogy - theres different languages
(javascript), different target platforms (browsers), different developers
(working together when needed though), different build/test/release model
and so on.

> >> No code changes is definitely good. Having to configure Parfait
> >> agent for each JVM is perhaps not ideal but still reasonable.

By "configure Parfait agent" we're talking about "-java-agent=parfait"
being added to the java command line, right?  Or to a java properties
file - both are widely-used well-understood ways of doing things.

OK, so, good - it sounds like we're all agreeing there's no reason not
to make use of a -java-agent based approach at some level.

> >> However, could you please describe the configuration scheme [...]
> >> how does the agent deal with "unknown" components, e.g.,

Oh, I need to make this clearer I think.  IMO pmdajmx doesn't deal with
"unknown" components in a viable way.  There is no viable approach for
automating this at PMDA runtime with no involvement from a human, in
practice.  The information needed simply doesn't exist from JMX, and
it requires domain knowledge to provide it.

We cannot simply make up PCP metadata about JMX values for which we have
no idea what the metric is.  Saying "great, we have 15000 new metrics!"
but having no idea which are counters, which ones are measures of time,
what the time and/or size units are - this is a big maintenance problem
(from historical experience, not just IMO).

Both PCP developers and PCP users end up picking up the broken pieces
of that approach (corrective tools like pmlogrewrite then become needed;
and tools other than pmlogger - like pmie, pmchart, etc - also behave
incorrectly when they get correct metadata from one host but conflicting
metadata from another - think about how/when that will happen, because it
will) ... there are a whole host of issues we've seen from attempting to
fudge metadata (or just getting it wrong, then fixing it) in the past.

But there are things we can do to tackle this problem.  We can provide
tooling to say "I know about a, b and c managed beans, and will export
those now - however x, y, and z are new to me and we need to classify
them before I can export them".  And then help with the classification
of x, y and z ... aiding to getting those definitions built-in (or run-
time loadable via config file which gets added to the built-in set over
time) for everyone to enjoy.

Having said all of that, a java-agent approach could (but wont!) do the
same thing the pmdajmx java helper does and make it up - so this is not
a reason to favour one approach over the other really.

> So far I've tested OpenJDK JVM 1.6/1.7/1.8 and IBM JVM 1.7.1/1.8 and I
> didn't see any differences there. Do you have concrete examples or was
> this a hypothetical scenario?

This referred to the case in the existing metrics in the parfait-agent
code where different (mutually exclusive) garbage collection algorithms
present the same metric(s) with differently named JMX values - these can
be reduced to a single metric in the Parfait+MMV world.

Its one of many possible examples though - scanning through the first few
hundred lines of that Cassandra sample JMX data showed this pattern to be
pervasive - many individual values there would more ideally be modeled as
instances of one metric for example.

> Mandating so would just limit the usability of PCP on this front and
> drive potential users away.

Providing correct PCP metric metadata is very important, and it is not
something we'll be discarding because its difficult or inconvenient.  I
think we can solve this perceived usability limit though.

> So the best we can do is to provide such mappings for few ubiquitous
> components (like the JVM) and then provide reasonable/working defaults

That's not the best we can do - we can (and must) do better than that.

> > I think Nathan has in place a plan to automatically scan standard
> > Java metrics.  If the model of following what other agents like
> > NewRelic do, I would bet there's a 'scanner' module that is looking
> > for known patterns of standard frameworks/contains like JBoss, Tomcat
> > etc, they all emit standard JMX namespaces, so I can envisage doing
> > something similar, scanning for JMX patterns and auto-registering
> > ones you find.  This is similar to the pattern Parfait is already
> > doing with the Java memory space, looking for a few patterns of JMX
> > with some optionality for some differences in JVM memory
> > configurations.

Right.  For an example of the kind of patterns Paul is referring to here
see the java.memory.eden.committed metric definition in parfait-agent.

> automation / dynamic approach is a must.

Yep  (though, those terms are both open to interpretation and I believe
we've been thinking of approaching them quite differently so far).

> However, I can't comment much
> more as the agent current code int git looks pretty static.

The end solution has to be at least partially static, because all metrics
must be correctly defined and the metric metadata must persist.

> Makes me actually wonder could we re-use the some pmdajmx code snippets
> for this..

*nod*, I suspect so too Marko.

Next week I'll begin working on the code to do the "I know about a,b,c
but not x,y,z metrics"... in parfait-agent - that will then be used to
drive expansion of the known-and-classified metric set.

> > If PCP is looking to create something that behaves like modern Java
> > tracing facilities, the JVMTI mechanism is generally the pattern [...]

*nod*.

> Do we have any estimates on the needed effort for this, are we talking
> about weeks or months?

I'm 100% convinced parfait-agent is the right approach for us to take now,
so it's my main development priority atm and I'm actively seeking helpers.

I expect we'll have something available within weeks that would suit use in
production environments.  I'm aiming to get the packaging side of things
(i.e. RPMs via the Vector model) done for the next PCP release.

> Thanks, much appreciated. I think I need to clarify that I'm not in
> principle against the Parfait agent approach (would be less work for
> me!) but I'm concerned that it will take a very long time [...]

That's promising, and I think there's similar amounts of work to getting
either approach here to a releasable state.  To my mind, the no-new-code
of a parfait-agent+MMV approach, and the way it handles both JMX and also
non-JMX values swings the pendulum significantly in its favour... so that
is where I'm spending my available time and effort.

> even more importantly, since there is nothing concrete available about
> the planned user interface and usage in general, 

Sorry, here's what I intended the parfait-agent code to express so far:

This initial parfait-agent uses Spring configuration (XML) to tackle the
"specify-metrics-correctly-yet-dynamically-too" aspect.  We'll ship a jar
with the pre-classified set, and add more metrics and configs over time as
the metric classification process proceeds.  There's also ways to pass in
command line properties and/or system definitions we can use to manually
configure parts of the agent, should that be needed.  The early code shows
the use of both JMX and non-JMX metrics as well.  Finally, the code builds
a single standalone jar with no external runtime dependencies.

We could certainly add a way for people to pass configs to the -java-agent
to add in more metrics without building the agent themselves ... there are
lots of options and possibilities.

> I failed to build it  [...]

Oh OK - what was the build failure?  Which platform?  Which version of Java?
And how was the build invoked?  Could you fpaste all of that, or open a new
parfait github issue & I'll take a look (though Paul will probably spot the
problem more quickly that me).

> [...] Whatever the end result is, I think the user
> interface would at least need to match the ease of pmdajmx configuration.

>From an end user point of view, I'd like to see this work "out of the box".
i.e. install PCP, install parfait-agent - start a java app with -java-agent
option.  Then metrics are immediately visible live, and logging auto-starts
if pmlogger has been enabled.

No new PMDA, no ./Install, no perl/jdk-installation-needed-for-java-metrics,
no external jar dependencies, & all the classified JMX (and non-JMX) metrics
available as soon as the java process starts.

So, no shortage of Big Hairy Audacious Goals there :) -- but achievable, and
we'll have a compelling Java instrumentation story for PCP in the end.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>