Just adding a couple of items I can see from my perspective.
>
> Yes, I think none of the other PMDAs come close to JSON/JMX PMDAs in
> terms of their flexibility and generic nature. But at least with pmdajmx
> this is all pretty well hidden at the most important level, the user
> interface. For comparison, how would the corresponding diagram look like
> with the Parfait agent approach you suggest below? Would it be something
> like:
>
> pmdammv <-> parfait{-agent,} <-...-> multiple-java-apps
Not quite, the Parfait Agent+Java app is a pair. The app uses Parfait
internally to the process to write to the MMV file, and then the pmdammv is
incorporating it into the PCP metric space. Since it's an Memory Mapped File,
PMCD is never blocked from getting metrics.
I would probably write it as:
pmdammv <-> { parfait-agent<->java app}, {parfait-agent<->java app}, ...
>
>> - Stop-the-world GC activity at unfortunate times anywhere to the right
>> of ".perl", above, is a major latency problem that has to be handled.
>> i.e. all processes in this design are using threads to try to hide
>> that latency, or rather the potential for latency once in a while.
>
> To put this in context, JMX is the standard for Java monitoring method
> [3] and for over a decade this hasn't been a show-stopper there. But as
> Paul pointed out [4], also Parfait provides the last values that were
> set and some polling would be delayed since not much is happening in the
> JVM anyway. And wrt his main point, "PMCD is not blocked from reporting
> _some_ values back to the caller," there is no difference between
> pmdajmx vs Parfait.
While the Parfait agent may be stalled, the PMCD->pmdammv is not. If I
understand the pmdajmx correctly, it connects to the application process via
the JMX socket communication protocol. Inside the JVM the JMX socket connector
has a matching set of threads that provide the mechanism to respond to these
requests.
In the case of a stalled JVM (heavy GC pause) these threads are also blocked,
and so the socket communication stalls, which I believe would flow on back up
to the pmdajmx and stall that. This is of course, unless the pmdajmx decouples
this already, so that PMCD can always get the last values that pmdajmx has
received (even if a receiving thread within pmdajmx is blocked waiting on
values over the wire).
>
>> - That separate multiplexing java process has a fairly large footprint
>> in terms of memory utilisation (in Java 8, approx 80-100MB is steady
>> state RSS). We can improve that via non-default command line options
>> and/or properties file settings, etc.
>
> Great to hear that you didn't find any memory leaks when testing! ;) But
> yes, Java applications tend to consume more memory than C programs. Have
> you already measured how much the Parfait agent + Parfait (+ perhaps
> Spring, if it's also used?) will consume extra memory?
I'm not sure in the days of even VM's with minimum RAM sizes in GB that this
memory discussion goes very far, however... I don't believe anyone has measured
it empirically, but understanding the code the overhead as I can see it is:
* The size of the MMV, it's memory mapped in, so it would fall outside the Java
Heap memory space, but does count on the process memory space, but it's pretty
tiny (and this size would be well understood by the people familiar with MMV).
Does Linux effectively count the MMV file as 'once' from a memory point of
view? (both the pmdammv and Parfait sharing the same memory, that's how Mmap
works right?)
* The MetricRegistry, a glorified Java map, mostly dominated by the Metric
names & descriptor as strings.
* the Parfait Monitored Timer scheduler thread (a thread consumes 1MB of Stack
size by default, but that is OS dependent), and a QuescientRegistryListener
(listening out for more metrics to be registered to auto-reconfigure the MMV
space)
IT's significantly less than the overhead of another JVM. My WetFinger is
Parfait is several MB's compared with the overhead of the full JVM of at least
10's of MBs. While I personally don't think the overhead of another JVM is
that big a deal, I think the _perception_ of having a a second JVM for
monitoring being 'heavy' is definitely real.
>
>
>> - That external tools.jar dependency is unfortunate for some users; all
>> external dependencies cause pain for users and pain for PCP developers
>> getting requests for help when bits aren't installed. (minor issue,
>> we like talking to users really - but not everyone will followup).
>
> This one I certainly don't see as a problem at all. First of all,
> tools.jar is part of the standard JDK every Java developer on the planet
> already have installed on their system. Also, since it's part of the
> standard JDK it's easier to have it accepted into use in some
> environments/organizations than an alien component that hasn't been used
> ever before (e.g., Parfait).
>
The issue with tools.jar is that it contains among other things a way to
compile java code, so I believe one of the reasons to separate the JDK (for
developing and compiling the Java application) and the JRE (runtime, to execute
the code) is to reduce the security footprint of the running java process.
"Evil processes" could detect the availability of the tools.jar and use it for
leverage for code execution. The JDK is a heftier install than a JRE.
I'm not sure I understood the Parfait-never-been-used-before bit. People would
use tools like NewRelic just purely because someone else has used it before, of
course they have a broader market than Parfait which has been used by us here
for over 10 years now. So we're a sample of one. Someone had to start using,
say, NewRelic at some point.
>
>
> I've tried to keep things as dynamic as possible since, as you also
> point out, the world today is dynamic and static configuration files are
> largely a thing of the past. However, here we could actually use a
> static configuration to overcome this issue (nevertheless, could you
> describe in which kind of scenarios you see this as a "major" issue with
> Java monitoring and how does the Parfait agent deal with this?).
>
If an application registers new metrics at runtime, the
QuescientRegistryListener dynamically reconfigures the MMV (it waits a certain
amount of time for a quite period of new metrics being registered before
triggering it{.
>
> But that's all there is to it, there are no external dependencies so
> it's very contained. No software is perfect so if there's an issue with
> Spring or Parfait or Parfait agent, who's going to debug and fix it
> then? Given that even the mailing list for Parfait is defunct [12] it
> doesn't give impression of a super active community. Also, at ~1100
> lines of code (of which half is logging/config/etc) we're already
> dealing with tens of thousands of metrics so IMHO it's not that bad.
>
The old google code mailing list is 'defunct' and wasn't replaced, because
Github is a more modern place for the community. I don't see that really as an
issue. TBH, actually having a mailing list looks a bit 'old school'...
>
> No code changes is definitely good. Having to configure Parfait agent
> for each JVM is perhaps not ideal but still reasonable. However, could
> you please describe the configuration scheme you've envisioned? Let's
> say I'm interested in java.lang and jboss.org metrics, with pmdajmx I'd do:
>
> pcpjmxconnector.attrfilter = java.lang:*!jboss.org:*
>
> Also, how does the agent deal with "unknown" components, e.g., something
> I've developed in-house and it provides a thousand or two metrics over
> JMX, will the agent be able to pick them all up without any/much
I think Nathan has in place a plan to automatically scan standard Java metrics.
If the model of following what other agents like NewRelic do, I would bet
there's a 'scanner' module that is looking for known patterns of standard
frameworks/contains like JBoss, Tomcat etc, they all emit standard JMX
namespaces, so I can envisage doing something similar, scanning for JMX
patterns and auto-registering ones you find. This is similar to the pattern
Parfait is already doing with the Java memory space, looking for a few patterns
of JMX with some optionality for some differences in JVM memory configurations.
>
If PCP is looking to create something that behaves like modern Java tracing
facilities, the JVMTI mechanism is generally the pattern that NewRelic,
AppDynamics etc follow. Using Socket-based JMX communication will work, but
after a lot of experience with Java, I still believe the MMV approach is what I
would use.
Some other things to consider:
* I believe but could be wrong that Java processes do not expose JMX connector
unless configured (this may have changed, and could be due to the new way the
JVM Attach API is working, it relies on writing details to a known temporary
directory per Java process so things can 'discover' running JVMs).
* Exposing JMX is a potential security risk, as once the connection is there,
any JMX value can be queried _as well_ as changed (if the JMX object exposes
that). It's kinda a back door. It's a pretty darn useful backdoor if you want
to control your application, but not everyone will want to do this.
* the pmdajmx relies on polling, so one is limited to the granularity of the
polling frequency on the sampling. While Parfait supports some JMX polling if
you need it, the main way of exposing metrics is an explicit metric creation,
and any metric value update is _immediately_ present in the MMV, an so if you
need to sample through the local PMCD at a higher frequency for some metrics
you can. This may be an advanced use case, but there's certainly circumstances
when we've valued being able to look into metric values more frequently. Just
something to consider.
* the ActiveMQ pmda we contributed effectively works a lot like pmdajmx. It
happened to rely on the fact that ActiveMQ exposes the Jolokia JMX REST
interface, and so polls the metrics via a socket via HTTP. Also it's Perl.
And that hurt Andy's & my brain a lot.. The pattern definitely works, but we
tried to see if we could inject Parfait into a running Java process and
couldn't find a good way without going the JVMTI method and we didn't really
want to commit to trying that. We thought the Perl->Jolokia REST would be
easy, but it was more annoying than we thought it was. That experience has
made me favour the JVMTI style agent model as the preferred. When ActiveMQ
hangs due to things like GC pauses, the ActiveMQ pmda also hangs, some reason
as outlined above about the use of sockets to get the values. it's a bit icky.
I'l try and follow this conversation as much as I can, and contribute my
thoughts when time permits.
Paul
|