Hi David,
----- Original Message -----
> On 07/22/2015 02:16 AM, Nathan Scott wrote:
> > - Domain isolation
> > We have inadvertently circumvented the checks-and-balances pmcd has
> > for keeping different *domains* of performance data at arms length.
> >
> > What this means, in practice, is that a blocking refresh from one
> > domain can (ultimately) cause loss of data from other domains, i.e.
> > a problem on the Ceph socket might cause all systemtap metrics to
> > stop refreshing when pmcd terminates the tardy PMDA. Multiply this
> > out by more and more domains within this one 'json' domain, and it
> > could become quite a problem. Worse, its probably not going to be
> > a trivial debugging exercise to figure out which sources are at the
> > root of such a problem, and which are the innocent bystanders.
>
> The blocking refresh problem sounds like a pcp generic problem that the
> json pmda just happens to exercise. Any pmda that runs a command to get
> some/all of its metrics has the exact same problem.
Hmm, its a bit different here. Any PMDA (irrespective of running a
command, or using syscalls, or whatever) can potentially block - even
the kernel PMDAs, under extremely adverse conditions.
I was involved in a production incident once where a single process had
traversed a code path where it took a spinlock but failed to release it,
and the /proc-read(2) path for some of the proc.* metrics needed to take
that lock. This blocked pmdaproc indefinitely, but the critical kernel
(pmdalinux) and application (pmdammv) metrics remained available, so we
could continue analysis and monitoring until a scheduled downtime.
Anyway, the difference here is we have no isolation, so the problem will
compound the more JSON sources we add, and severe blocking in one domain
causes immediate failure in all the other JSON domains unfortunately.
> As far as fixing this from within the JSON pmda goes, the thing that
> pops into my head would be to poll for the data at a user-specified
> interval, then when a request comes in give the data from the last poll.
Yeah, everything involves tradeoffs - so, there's downsides to taking
that approach too FWIW (though some existing PMDAs do indeed choose a
similar route for their domain). But let's set this issue aside for
awhile...
> > - Refresh script complexity
> > "generate_ceph_metadata" script is approaching the complexity of other
> > script PMDAs now - in the back of my mind this is a bit of a worry, as
>
> Let's start here with making sure you understand how
> "generate_ceph_metadata" works.
*nod* - thanks for the explanation, definitely makes things clearer.
> Note that "generate_ceph_metadata" is probably an outlier as being a bit
> tricky. The JSON schema/metadata produced by ceph is *quite* odd,
> especially when it comes to types. the biggest issue is that ceph uses
> very non-JSON-like type specifiers.
>
> As to your question of "how much does someone wanting to support a new
> JSON data source have to know?", the answer is "just enough". This
> person would need to understand how to get his data source to produce
> JSON, understand the JSON format, and understand JSON pointers. He
> really wouldn't need to understand too much about PCP.
I guess my concern is that if people don't understand the PCP concepts
relating to representing counter vs instant vs discrete metrics, and
bytes vs kbyes vs msec vs counts vs bytes-per-sec, etc, we may end up
with a bunch of metrics with poor/incorrect metadata associated with
'em.
But again, maybe not an immediate concern we've got to address - lets
give it time & see how it goes. I agree the Ceph instrumentation has
some really obscure stuff (e.g. it seems to be producing time metrics
without any time units? and things like "bytes_wb" have same metadata
as "ios_wb" - no bytes units represented for the former?).
> > ... some of these areas we can tackle via continued hacking on pmdajson
> > and extending its schema, its config file, interfaces to data-exec'ed
> > scripts and so on. But, I wanted to step back and think about whether
> > effort in core PMDA libraries might make some sense at this stage, for
> > some of the above items (which? -- all of the above can/are inherently
> > handled by separate PMDAs using JSON instead of data-exec'd scripts, of
> > course, its just extra effort - perhaps making that easier is a better
> > way to solve some of 'em, however).
>
> It sounds like what this boils down to is a problem with one of the
> basic features of the JSON pmda - the fact that it uses JSON pointers to
> generically identify where to find the JSON data. Therefore the JSON
> pmda can support multiple data sources at the same time.
Yes, its an interesting approach; just has some unforeseen side-effects
that maybe we can tackle in other ways, while keeping the core ideas.
> If this is now seen as a problem,
Well, what are your thoughts there? It seems to me theres some potential
issues there (nothing immediately urgent, but things worth thinking about
anyway, for the mid->long term).
> one idea would be to "break up" the
> JSON pmda a bit, and move a good bit of its functionality into a python
> library.
*nod*
> Then several pmdas could use the python library to export data
> for their particular source.
+1
> This would solve several of your worries,
> like domain isolation and wanting different top level domains. At a
> first cut you'd have 2 new pmdas, a systemtap one and a ceph one that
> were both thin wrappers around the python library. I'm not sure what the
> level of effort would be there.
Nor I - hence all my questions. I've been thinking about jsonpointers in
C a bit, doesn't seem too bad (in theory). I might hack on this soon-ish
using the pmdaroot code as a bit of a test case, as that would make the
Docker code there alot cleaner anyway I think. I'll report back in a few
weeks with some code & will ping you for any thoughts you have there, if
you don't mind.
> If later you wanted to rewrite bits of the python library into C to
> support C clients (and then the python library would just wrap around
> the C layer), that might be doable.
Yep, I think we're on roughly the same page here.
cheers.
--
Nathan
|