Hi Marko,
----- Original Message -----
> [...]
> > I wonder if the best we can do here is something like:
> > - disable these two clusters by default
> > - add oracle.control metrics for each
> > - add pmstore support to allow people to opt-in to these clusters.
>
> But if opting in for these means that the timeout is hit pretty much
> guaranteed, not sure what's the point then?
The point was to give you a working agent (with all the other metrics).
The agent is working fine for "everyone" else ... (although thats a small
set at this stage, I suspect).
The not-yet-understood root cause of this particular problematic platform
or Oracle version combination are the reason we're contemplating these
(quite horrible) workarounds.
For my system (Fedora 23, Oracle 12.1), the Red Hat perf folks system and
the Intel folks who have been hacking on this PMDA too - we haven't seen
these problems you are seeing, so we'd just continue on with everything
enabled. But, it would give you (& anyone else who hits this) with a way
to get up and running with all the other Oracle metrics even with this
problematic Oracle version (or platform, or host setup, or whatever it is).
> Ok, initially oracle.file
> fetch might be possible but with both it seems to be guaranteed that it
> won't work.
For your system, yep, its guaranteed. So unless we get to the bottom of
it, and fix the real cause, we'll need a workaround - hence, the earlier
suggestions.
> > Its not ideal but I don't think there's much else we're going to be able to
> > do to improve things on our end of the connection, and this would stabilize
> > things for you at least. Thoughts?
>
> I checked with some local DB folks - they haven't used the object_cache
> metrics anywhere so for them it's nice-to-have category. But the file
> metrics are important.
>
> The above timings are with almost completely unloaded DB instance so not
> sure how they would look like under extreme load, I wouldn't be
> surprised if they'd be higher then. But that'd be the time when the
> metrics are needed the most to see what was going on.
>
> So we're back to the initial question of the thread, can we for example
> adjust the 5 second timer for the Oracle PMDA to be more forgiving or
> come up with some other approach here? It seems that we can't affect how
> much it takes for Oracle to respond and on the PMDA side the actual
> select query seems to be as efficient as it can be.
Adjusting the timeout isn't great - that introduces other, nasty problems.
What we'd need is a background thread that fetches these metrics on a timer
and serve up cached values (but, that's also quite a horrible solution).
I'd really prefer to understand what it is about your system/setup that has
this pathologically slow query behaviour & fix that instead of doing any of
these workarounds TBH. Could you try different hosts, operating systems &|
Oracle versions? (so we can try to isolate which might be causing it).
cheers.
--
Nathan
|