Hi,
On 2016-03-23 10:19, Nathan Scott wrote:
> ----- Original Message -----
>> On 2016-03-03 20:42, Marko Myllynen wrote:
>>>
>>> PCP has very complete coverage for system and supporting applications /
>>> infrastructure metrics (like containers, 389 Directory Server, KVM,
>>> Oracle, PostgreSQL, etc.) but there are lots of places where Java
>>> performance metrics would be essential to have in the mix as well.
>
> +1 ... it's high time we tackled this area, so thanks for kick-starting
> a new effort in this direction, Marko. Apologies up-front that my reply
> has turned into an essay too, like your earlier mail, and taken awhile.
It's good to hear that I'm not the only one interested in this area, it
perhaps looked that way given that there's nothing Java related on PCP
roadmap (http://pcp.io/roadmap/).
>>> https://myllynen.fedorapeople.org/pcp-jmx/
>
> The approach pmdajmx has taken has some drawbacks to my eye (that eye
> being jaded^Wfiltered based on experience doing several years of Java
> analysis in a previous life) ...
I think for the sake of those following the thread but not knowing the
history it could be mentioned that the earlier PCP/Java PMDA was the
pmdajstat [1] which was basically a parser for jstat(1) [2] output:
1) http://oss.sgi.com/cgi-bin/gitweb.cgi?p=pcp/pcp.git;a=commit;h=58f3940
2) http://docs.oracle.com/javase/6/docs/technotes/tools/share/jstat.html
> - Running a separate Java process (in addition to a separate perl PMDA)
> is a relatively complex architecture.
>
> pmdajmx.perl <-> PCPJMXConnector.java <-...-> multiple-java-apps
>
> It causes a need for some fancy footwork on the part of pmdajmx, to
> dodge the intermittent high latencies in socket-based communication
> between the java processes, PCPJMXConnector.java, and the perl PMDA.
> (to put in context, this is all more complex than any other PMDA we
> have in PCP today - except perhaps for pmdajson).
Yes, I think none of the other PMDAs come close to JSON/JMX PMDAs in
terms of their flexibility and generic nature. But at least with pmdajmx
this is all pretty well hidden at the most important level, the user
interface. For comparison, how would the corresponding diagram look like
with the Parfait agent approach you suggest below? Would it be something
like:
pmdammv <-> parfait{-agent,} <-...-> multiple-java-apps
> - Stop-the-world GC activity at unfortunate times anywhere to the right
> of ".perl", above, is a major latency problem that has to be handled.
> i.e. all processes in this design are using threads to try to hide
> that latency, or rather the potential for latency once in a while.
To put this in context, JMX is the standard for Java monitoring method
[3] and for over a decade this hasn't been a show-stopper there. But as
Paul pointed out [4], also Parfait provides the last values that were
set and some polling would be delayed since not much is happening in the
JVM anyway. And wrt his main point, "PMCD is not blocked from reporting
_some_ values back to the caller," there is no difference between
pmdajmx vs Parfait.
Also, the Java helper threads are not there trying to "hide" anything
(see below for more).
3) http://docs.oracle.com/javase/8/docs/technotes/guides/management/
4) http://oss.sgi.com/pipermail/pcp/2016-March/010016.html
> - Threads are inherently more complex than not using threads. :)
> Threads in both pmdajmx and its java helper is ... alot of threads.
I don't think it's fair to criticize about pdmajmx using one extra
thread when we all know it was to workaround the Perl PMDA API bug I
discovered [5]. Now that it's fixed [6] we could perhaps eliminate that
thread but OTOH I'm not sure would it be worth it after all, going thru
libpcp/Perl PMDA API instead of having everything in-pmda is not
necessarily that much simpler. But either way suits me, no biggie.
Wrt the Java helpers threads: when there are N targets (JVMs) we need to
do N similar tasks (queries) to get their metrics. That's pretty much a
textboox example when to apply threading in an application, I can't
think of any benefits of not doing this in parallel.
5) http://oss.sgi.com/pipermail/pcp/2016-March/009771.html
6)
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=pcp/pcp.git;a=commitdiff;h=22fdcd7
> - That separate multiplexing java process has a fairly large footprint
> in terms of memory utilisation (in Java 8, approx 80-100MB is steady
> state RSS). We can improve that via non-default command line options
> and/or properties file settings, etc.
Great to hear that you didn't find any memory leaks when testing! ;) But
yes, Java applications tend to consume more memory than C programs. Have
you already measured how much the Parfait agent + Parfait (+ perhaps
Spring, if it's also used?) will consume extra memory?
> However that introduces java implementation and version dependencies
> in pmdajmx/PCP, and its always likely to consume more memory than the
> rest of a PCP collector no matter how much its tweaked, unfortunately.
> That is bad, and reflects poorly on PCP (i.e. gives something for the
> nay-sayers to point to and say - "see PCP eats all your memory").
This might indeed be an interesting difference in the approaches we have
on the table if I've understood the Parfait agent approach correctly: we
already pretty much know how much pmdajmx uses memory and since we only
have one instance of pmdajmx running, it's not going to change much no
matter how many JVMs you're monitoring. But with the Parfait agent
approach, since it's per-JVM, wouldn't it mean that the more JVMs you're
monitoring on a host, the more memory is needed by PCP?
> - The model used, mapping one metric to one JMX value is not ideal --
> if modelled more "ideally", many of these values would be in just one
> metric (since all the metadata is the same) using the PCP instances
> to represent set-values. But, the instance domain is "taken up" by
> the target java processes. This burned pmdajstat many years ago when
> it took the same model.
Sorry, here I don't understand what you're proposing. Currently, pmdajmx
uses similar approach to different JVMs as other PMDAs use for processes
(proc/hotproc) so I don't think it can be all wrong. Perhaps you could
illustrate with a concrete example what you're after? Let's say that I'm
running 3 instances of JavaTestClient and 1 instance of MyJavaEEApp (on
top of JBoss/WildFly). For the sake of keeping it simple, we're
monitoring only two metrics. With pmdajmx today we'd get something like
this:
$ pminfo -f jmx
jmx.java.lang.runtime.systemproperties.sun_java_command
inst [0 or "1234"] value "JavaTestClient -clientID 1"
inst [1 or "1235"] value "JavaTestClient -clientID 2"
inst [2 or "1236"] value "JavaTestClient -clientID 3"
inst [3 or "1237"] value "MyJavaEEApp -serverParam"
jmx.java.lang.threading.threadcount
inst [0 or "1234"] value 12
inst [1 or "1235"] value 10
inst [2 or "1236"] value 11
inst [3 or "1237"] value 47
So how would you present these metrics instead? And how would the output
look like with the Parfait agent approach? Since I'm not sure what
you're after I'm not yet convinced any changes would be helpful but
since this is basically a matter of how to populate the Perl metrics
hash I don't think this would be a show-stopper even if some changes
would indeed be needed.
> Falls down a bit when different JVM versions
> have different semantics for similar metrics. :(
So far I've tested OpenJDK JVM 1.6/1.7/1.8 and IBM JVM 1.7.1/1.8 and I
didn't see any differences there. Do you have concrete examples or was
this a hypothetical scenario?
> - In the original pmdajmx code there was basically no PCP metadata at
> all (since JMX provides only pmDesc.type for us). Since then, I see
> you took on that issue by starting to add PCP metadata for individual
> JMX values in %semantics, %encoding, perl hashes (needs %helptext?)
>
> The plan here being we'd have Java programmers adding JMX values into
> their Java code, then updating the Perl code in PCP (or more likely,
> someone else who knows about PCP doing it on their behalf).
> A solution where Java programmers update Java code in a real Java
> project would be more likely to succeed I think (but dunno for sure).
Not sure are you talking about pmdajmx or Parfait agent approach above
so let me try to clarify how I see things working with pmdajmx:
As mentioned [7] after your reply, it's unreasonable to expect Java
developers or PCP users (or anyone else, for that matter) to evaluate
and create mappings for PCP for all the literally tens or hundreds of
thousands of Java/JMX metrics [8] already available today. Whether it'd
be in Java code or an XML description file or a Perl configuration is
not going to change the fact that such mappings will never be complete.
Mandating so would just limit the usability of PCP on this front and
drive potential users away.
So the best we can do is to provide such mappings for few ubiquitous
components (like the JVM) and then provide reasonable/working defaults
for the rest and, in case volunteers surface or a developer/user really
thinks it's worth the time, allow them to provide more detailed/refined
metrics information on per-application basis (e.g., there could be such
information available for the most crucial JBoss/WildFly metrics but the
less crucial and all non-JVM/JBoss metrics would still use the
defaults). This is what pmdajmx does today (ok, the hash currently in
the Perl code but it's a 5 line change to e.g. have the PMDA to scan a
directory for the mapping files). (%helptext could indeed be taken
automatically from the description that is available over JMX for (some)
metrics.)
7) http://oss.sgi.com/pipermail/pcp/2016-March/010036.html
8) Just JVM/JBoss/Cassandra take us over 20k available metrics
> - That external tools.jar dependency is unfortunate for some users; all
> external dependencies cause pain for users and pain for PCP developers
> getting requests for help when bits aren't installed. (minor issue,
> we like talking to users really - but not everyone will followup).
This one I certainly don't see as a problem at all. First of all,
tools.jar is part of the standard JDK every Java developer on the planet
already have installed on their system. Also, since it's part of the
standard JDK it's easier to have it accepted into use in some
environments/organizations than an alien component that hasn't been used
ever before (e.g., Parfait).
But it's true that tools.jar is only part of the JDK, not the JRE. Then
again, how large do you think is the set of people who are able to read
and analyze something like JVM garbage collection information but, at
the same time, are unable to locate the tools.jar? Also, why do you see
this an issue for users when we have package managers taking care of
these kinds of dependencies automatically?
Lastly, if you check in more detail, you'll see that tools.jar enables
the use of Attach API [9] so, as a convenience for the users, JMX does
not have to be separately enabled. Of course it's already possible to
use earlier enabled JMX with pmdajmx without Attach API but I think the
key is the flexibility we need as we simply can't imagine all the
possible use cases and scenarios users might have so the best we can do
is to be as flexible as possible. Thus, ripping out tools.jar dependency
would not make pmdajmx unusable at all but given the explanations above,
I think it'd be a bad idea.
9) https://docs.oracle.com/javase/8/docs/technotes/guides/attach/
> - pmdajmx makes all the add_metrics() calls in its mainline - there is
> no capability to add metrics on the fly. This is not pmdajmx's fault
> as there are assumptions being made in the perl PMDA API that prevent
> this (the perl wrapper predates dynamic metrics!). When David Smith
> wrote pmdajson, this also wasn't foreseen, and a ton of extra effort
> had to be made to extend the python PMDA wrapper. (major issue)
Ok, so one thing that should be fixed is the Perl PMDA API [10] - it's a
bit shame that both Perl and Python PMDA APIs have they own set of
limitations [11] so selecting either one is always a trade-off.
10) https://bugzilla.redhat.com/show_bug.cgi?id=1321587
11) https://bugzilla.redhat.com/show_bug.cgi?id=1316179
I've tried to keep things as dynamic as possible since, as you also
point out, the world today is dynamic and static configuration files are
largely a thing of the past. However, here we could actually use a
static configuration to overcome this issue (nevertheless, could you
describe in which kind of scenarios you see this as a "major" issue with
Java monitoring and how does the Parfait agent deal with this?).
Since one pretty much knows in advance what components are in play
(something like JBoss simply doesn't appear in your stack out of the
blue), the user could create a metrics CSV with the Java helper and
then, instead of querying the Java helper when the PMDA is starting,
just cat the CSV file - it's one-liner configuration change and later if
a JVM using other components that were not running earlier appears,
pmdajmx would start happily reporting them.
> - Also similar to early pmdajson, nothing is done to provide stability
> of PMIDs, so logging jmx.* metrics is going to explode (pmlogrewrite).
True, this would be work-in-progress. But since it was solved for
pmdajson already, shouldn't be impossible for pmdajmx either.
> - Current PCPJMXConnector.java code is >1000 lines of code already, and
> its only two weeks old. ;)
But that's all there is to it, there are no external dependencies so
it's very contained. No software is perfect so if there's an issue with
Spring or Parfait or Parfait agent, who's going to debug and fix it
then? Given that even the mailing list for Parfait is defunct [12] it
doesn't give impression of a super active community. Also, at ~1100
lines of code (of which half is logging/config/etc) we're already
dealing with tens of thousands of metrics so IMHO it's not that bad.
12) http://oss.sgi.com/pipermail/pcp/2016-March/010015.html
> To explore what I think will help tackle these problem areas while still
> meeting all our Java needs, I started hacking on Parfait a bit. This is
> now showing signs of life, so I'm keen to share the early results. This
> uses a -java-agent .jar approach, where the JMX (and other) values are
> accessed directly by the -java-agent (which is runtime-loaded into each
> application
>
> - no Java source code modifications
No code changes is definitely good. Having to configure Parfait agent
for each JVM is perhaps not ideal but still reasonable. However, could
you please describe the configuration scheme you've envisioned? Let's
say I'm interested in java.lang and jboss.org metrics, with pmdajmx I'd do:
pcpjmxconnector.attrfilter = java.lang:*!jboss.org:*
Also, how does the agent deal with "unknown" components, e.g., something
I've developed in-house and it provides a thousand or two metrics over
JMX, will the agent be able to pick them all up without any/much
configuration?
> - use the existing proven code in both PCP and Parfait, rather than
> starting from scratch with a Java-specific addition to PCP source.
Although the rules are different for PCP upstream, at least vendors like
Red Hat can't simply wash their hands saying "it's a Parfait issue, talk
to them."
Also, when I tried to build parfait-agent, I had to first install maven
(68 packages got installed on my laptop) and then the build system
started to download countless packages from the internets and even after
that the build failed. It's most probably a local issue but I admit I've
already lost the track of dependencies what are needed for the agent.
> - leverage the existing Parfait JMX extraction code instead of writing
> it again from scratch and putting it in PCP :P
Let us continue this discussion once you've reached 10k metrics limit :P
> - eventually allow PCP maintainers to focus on the core PCP components,
> and Java gurus to focus on the Java components in a real Java project
> (i.e. maven, not autoconf/make)
Again, this is of course a tempting approach if you merely look at
pcp.git but who's going to fix the issues users will eventually find
out? Users don't care whether it's a Spring issue or an MMV PMDA issue
or something in between, they either get the metrics from PCP or not.
(Not sure what you mean with maven/make, there's neither for pmdajmx,
adding one (either maven or make) is of course trivial.)
> - allow PCP maintainers to improve *one* core PCP component (pmdammv),
> which benefits multiple languages (i.e. not just for Java).
See above. We all know that both approaches have their pros and cons so
not much point repeating them over and over again.
> - allow arbitrary modelling of metric names, instances, and allow for
> correct PCP metric metadata. All via configuration files, no code.
pmdajmx also allows defining correct PCP metric metadata. No hardcoding.
> - allow for more than just JMX as the source of our metric values
Here's one key different between Parfait agent and pmdajmx then. pmdajmx
is a generic JMX bridge, it's not even trying to be more. But it's
intended to be generic so whatever you get from JMX you'll get with
pmdajmx. So I think this is somewhat apples vs oranges comparison. And I
think here we come to the point I mentioned in my first email: different
approaches have different use cases and it's fine since they are not
mutually exclusive.
> - allow for better PCP metric modelling - using instances for set-values
> and not forcing a transformation based on JMX names.
(See my earlier reply above wrt concrete examples.)
> So, it aims to tackle all of the pmdajmx areas-for-improvement I listed
> above, without adding any new code in PCP.
As seen above after a review of the feedback we're down with very few
real areas-for-improvement.
> And, it turns out, with very
> little new code in Parfait too (~90 lines of Java code so far).
But it's still at HelloWorld level. I think a good indication how things
will look like in reality is when you have for example every metric from
JVM and/or JBoss and/or those non-JMX metrics you've mentioned.
> As I'd hoped, it turned out Parfait does 99% of what we need and quite
> efficiently. pmcd timeouts simply cannot happen, and the applications
> continue to export correct values under stop-the-world GC conditions
See above, no difference with pmdajmx here.
> Have a browse through my Parfait tree - I'll send a pointer to it out
> shortly. Keep in mind it's early days, there's plenty more to be done
It certainly looks thin so far but said there are pros and cons with
both approaches. I think the most important thing to acknowledge upfront
is that there will be no one-size-fits-all type solution - as you say,
it's an inherently difficult topic.
But before I make any sort of real conclusions about the Parfait agent
approach I'd like to have some of hands-on experience with it, now it
feels a bit too early to start testing. Perhaps most importantly, even
if it takes some time to get all bits in place, I think it would be very
helpful to have a bit of description of the planned user interface since
for the vast majority of people that's going to be the most important
element of it regardless of internal implementation details.
Thanks,
--
Marko Myllynen
|