Hi, Ken -
> > ... but according to normal practice, a "restarted pmda" means
> > "restarted pmcd", which means active pcp contexts are dropped,
> > which means clients are to reconnect & recalculate name->pmid
> > mappings.
>
> I think we need to be a bit more careful.
>
> Clients may or may not re-explore the PMNS in this situation ... I just
> experimented with pmie and it does NOT do this.
That's unfortunate. (OTOH, in src/pmie/eval.c enable() / reconnect(),
it calls reinitMetric(), which does redo all the pmLookup* business;
in what scenario doesn't and cannot pmie do this?)
> Most clients that keep running when a PMDA or PMCD is missing have
> been written with some assumptions about the consistency of metric
> names and metadata across invocations of a PMDA. [...]
Can you point at another example by any chance?
> But there is another class of issues around archives. pmlogextract
> (and indeed anything that processes a set of archives that claim to
> be from the same host) must assume that the PMNS and the metadata
> for the metrics are consistent from one archive to the next.
This sounds like that single tool's limitation. (Certainly pmwebd is
not affected, and it looks like pmchart isn't either.) It would not be
a big deal for pmlogextract to canonicalize the output PMNS as it goes
along. (Similar canonicalization logic could come in handy for making
brolley's multi-archive-transparency project even more powerful.)
> [...] And in this context, the sort of extension to the pmdaCache
> services being discussed yesterday might provide a means to maintain
> a persistent "name" to "id" map where the "id" values are
> constrained to the range that would work for a pmid's cluster
> component. [...]
(For what it's worth, I'm not suggesting that more persistent
numbering would be harmful.) But even such a cache is by nature
temporary & lossy, so sooner or later those pmids will be mixed up
again, and the tooling will have to deal. Why not make the consumer
tooling robust now?
- FChE
|