pcp
[Top] [All Lists]

Re: [pcp] PCP JMX PMDA

To: Marko Myllynen <myllynen@xxxxxxxxxx>
Subject: Re: [pcp] PCP JMX PMDA
From: Paul Smith <psmith@xxxxxxxxxx>
Date: Wed, 30 Mar 2016 10:03:52 +1100
Cc: Nathan Scott <nathans@xxxxxxxxxx>, pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <56F940C7.2080909@xxxxxxxxxx>
References: <56D8858A.3020407@xxxxxxxxxx> <56E05862.7040707@xxxxxxxxxx> <282702840.33546644.1458721199633.JavaMail.zimbra@xxxxxxxxxx> <56F940C7.2080909@xxxxxxxxxx>
Just adding a couple of items I can see from my perspective. 

> 
> Yes, I think none of the other PMDAs come close to JSON/JMX PMDAs in
> terms of their flexibility and generic nature. But at least with pmdajmx
> this is all pretty well hidden at the most important level, the user
> interface. For comparison, how would the corresponding diagram look like
> with the Parfait agent approach you suggest below? Would it be something
> like:
> 
> pmdammv <-> parfait{-agent,} <-...-> multiple-java-apps

Not quite, the Parfait Agent+Java app is a pair.  The app uses Parfait 
internally to the process to write to the MMV file, and then the pmdammv is 
incorporating it into the PCP metric space.  Since it's an Memory Mapped File, 
PMCD is never blocked from getting metrics.

I would probably write it as:

pmdammv <-> { parfait-agent<->java app},  {parfait-agent<->java app},  ...

> 
>> - Stop-the-world GC activity at unfortunate times anywhere to the right
>>  of ".perl", above, is a major latency problem that has to be handled.
>>  i.e. all processes in this design are using threads to try to hide
>>  that latency, or rather the potential for latency once in a while.
> 
> To put this in context, JMX is the standard for Java monitoring method
> [3] and for over a decade this hasn't been a show-stopper there. But as
> Paul pointed out [4], also Parfait provides the last values that were
> set and some polling would be delayed since not much is happening in the
> JVM anyway. And wrt his main point, "PMCD is not blocked from reporting
> _some_ values back to the caller," there is no difference between
> pmdajmx vs Parfait.

While the Parfait agent may be stalled, the PMCD->pmdammv is not.  If I 
understand the pmdajmx correctly, it connects to the application process via 
the JMX socket communication protocol.  Inside the JVM the JMX socket connector 
has a matching set of threads that provide the mechanism to respond to these 
requests.  

In the case of a stalled JVM (heavy GC pause) these threads are also blocked, 
and so the socket communication stalls, which I believe would flow on back up 
to the pmdajmx and stall that.  This is of course, unless the pmdajmx decouples 
this already, so that PMCD can always get the last values that pmdajmx has 
received (even if a receiving thread within pmdajmx is blocked waiting on 
values over the wire).  


> 
>> - That separate multiplexing java process has a fairly large footprint
>>  in terms of memory utilisation (in Java 8, approx 80-100MB is steady
>>  state RSS).  We can improve that via non-default command line options
>>  and/or properties file settings, etc.
> 
> Great to hear that you didn't find any memory leaks when testing! ;) But
> yes, Java applications tend to consume more memory than C programs. Have
> you already measured how much the Parfait agent + Parfait (+ perhaps
> Spring, if it's also used?) will consume extra memory?

I'm not sure in the days of even VM's with minimum RAM sizes in GB that this 
memory discussion goes very far, however... I don't believe anyone has measured 
it empirically,  but understanding the code the overhead as I can see it is:

* The size of the MMV, it's memory mapped in, so it would fall outside the Java 
Heap memory space, but does count on the process memory space, but it's pretty 
tiny (and this size would be well understood by the people familiar with MMV).  
Does Linux effectively count the MMV file as 'once' from a memory point of 
view? (both the pmdammv and Parfait sharing the same memory, that's how Mmap 
works right?)
* The MetricRegistry, a glorified Java map, mostly dominated by the Metric 
names & descriptor as strings.
* the Parfait Monitored Timer scheduler thread (a thread consumes 1MB of Stack 
size by default, but that is OS dependent), and a QuescientRegistryListener 
(listening out for more metrics to be registered to auto-reconfigure the MMV 
space)

IT's significantly less than the overhead of another JVM.  My WetFinger is 
Parfait is several MB's compared with the overhead of the full JVM of at least 
10's of MBs.  While I personally don't think the overhead of another JVM is 
that big a deal, I think the _perception_ of having a a second JVM for 
monitoring being 'heavy' is definitely real.   

> 
> 
>> - That external tools.jar dependency is unfortunate for some users; all
>>  external dependencies cause pain for users and pain for PCP developers
>>  getting requests for help when bits aren't installed.  (minor issue,
>>  we like talking to users really - but not everyone will followup).
> 
> This one I certainly don't see as a problem at all. First of all,
> tools.jar is part of the standard JDK every Java developer on the planet
> already have installed on their system. Also, since it's part of the
> standard JDK it's easier to have it accepted into use in some
> environments/organizations than an alien component that hasn't been used
> ever before (e.g., Parfait).
> 

The issue with tools.jar is that it contains among other things a way to 
compile java code, so I believe one of the reasons to separate the JDK (for 
developing and compiling the Java application) and the JRE (runtime, to execute 
the code) is to reduce the security footprint of the running java process.  
"Evil processes" could detect the availability of the tools.jar and use it for 
leverage for code execution.  The JDK is a heftier install than a JRE.

I'm not sure I understood the Parfait-never-been-used-before bit.  People would 
use tools like NewRelic just purely because someone else has used it before, of 
course they have a broader market than Parfait which has been used by us here 
for over 10 years now.  So we're a sample of one.  Someone had to start using, 
say, NewRelic at some point.


> 
> 
> I've tried to keep things as dynamic as possible since, as you also
> point out, the world today is dynamic and static configuration files are
> largely a thing of the past. However, here we could actually use a
> static configuration to overcome this issue (nevertheless, could you
> describe in which kind of scenarios you see this as a "major" issue with
> Java monitoring and how does the Parfait agent deal with this?).
> 

If an application registers new metrics at runtime, the 
QuescientRegistryListener dynamically reconfigures the MMV (it waits a certain 
amount of time for a quite period of new metrics being registered before 
triggering it{.


> 
> But that's all there is to it, there are no external dependencies so
> it's very contained. No software is perfect so if there's an issue with
> Spring or Parfait or Parfait agent, who's going to debug and fix it
> then? Given that even the mailing list for Parfait is defunct [12] it
> doesn't give impression of a super active community. Also, at ~1100
> lines of code (of which half is logging/config/etc) we're already
> dealing with tens of thousands of metrics so IMHO it's not that bad.
> 

The old google code mailing list is 'defunct' and wasn't replaced, because 
Github is a more modern place for the community.  I don't see that really as an 
issue.  TBH, actually having a mailing list looks a bit 'old school'...

> 
> No code changes is definitely good. Having to configure Parfait agent
> for each JVM is perhaps not ideal but still reasonable. However, could
> you please describe the configuration scheme you've envisioned? Let's
> say I'm interested in java.lang and jboss.org metrics, with pmdajmx I'd do:
> 
> pcpjmxconnector.attrfilter = java.lang:*!jboss.org:*
> 
> Also, how does the agent deal with "unknown" components, e.g., something
> I've developed in-house and it provides a thousand or two metrics over
> JMX, will the agent be able to pick them all up without any/much

I think Nathan has in place a plan to automatically scan standard Java metrics. 
 If the model of following what other agents like NewRelic do, I would bet 
there's a 'scanner' module that is looking for known patterns of standard 
frameworks/contains like JBoss, Tomcat etc, they all emit standard JMX 
namespaces, so I can envisage doing something similar, scanning for JMX 
patterns and auto-registering ones you find.  This is similar to the pattern 
Parfait is already doing with the Java memory space, looking for a few patterns 
of JMX with some optionality for some differences in JVM memory configurations.

> 

If PCP is looking to create something that behaves like modern Java tracing 
facilities, the JVMTI mechanism is generally the pattern that NewRelic, 
AppDynamics etc follow.  Using Socket-based JMX communication will work, but 
after a lot of experience with Java, I still believe the MMV approach is what I 
would use.

Some other things to consider:

* I believe but could be wrong that Java processes do not expose JMX connector 
unless configured (this may have changed, and could be due to the new way the 
JVM Attach API is working, it relies on writing details to a known temporary 
directory per Java process so things can 'discover' running JVMs).
* Exposing JMX is a potential security risk, as once the connection is there, 
any JMX value can be queried _as well_ as changed (if the JMX object exposes 
that). It's kinda a back door. It's a pretty darn useful backdoor if you want 
to control your application, but not everyone will want to do this.
* the pmdajmx relies on polling, so one is limited to the granularity of the 
polling frequency on the sampling.  While Parfait supports some JMX polling if 
you need it, the main way of exposing metrics is an explicit metric creation, 
and any metric value update is _immediately_ present in the MMV, an so if you 
need to sample through the local PMCD at a higher frequency for some metrics 
you can.  This may be an advanced use case, but there's certainly circumstances 
when we've valued being able to look into metric values more frequently.  Just 
something to consider.
* the ActiveMQ pmda we contributed effectively works a lot like pmdajmx.  It 
happened to rely on the fact that ActiveMQ exposes the Jolokia JMX REST 
interface, and so polls the metrics via a socket via HTTP.  Also it's Perl.  
And that hurt Andy's & my brain a lot..  The pattern definitely works, but we 
tried to see if we could inject Parfait into a running Java process and 
couldn't find a good way without going the JVMTI method and we didn't really 
want to commit to trying that.  We thought the Perl->Jolokia REST would be 
easy, but it was more annoying than we thought it was.  That experience has 
made me favour the JVMTI style agent model as the preferred.  When ActiveMQ 
hangs due to things like GC pauses, the ActiveMQ pmda also hangs, some reason 
as outlined above about the use of sockets to get the values.  it's a bit icky.

I'l try and follow this conversation as much as I can, and contribute my 
thoughts when time permits.

Paul


<Prev in Thread] Current Thread [Next in Thread>