On 04/11/14 01:26, Frank Ch. Eigler wrote:
...
One glitch with that could be PMDAs whose PMNS is dynamic from run to
run (like the papi pmda, which is almost able to be used as a logged
data source). The name-to-PMID mapping may vary there (as new PAPI
versions come, or host CPU changes, new counters(=metrics) may appear
in some odd sequence, so get scrambled PMIDs), but the name & pmDesc
(semantics etc.) would remain the same and previous values comparable.
>
I don't know how pervasive this situation would be, but if it's not
too hard to support, we should. (e.g., we could track metrics across
archives by name rather than pmid.)
This is not pervasive at all ... this is the only PMDA I am aware of
that behaves like this.
And I think I would be lobbying for the PMDA to be different, not the
archive handling to be different.
There are already services available to a PMDA that allow the instance
domain mapping (name <--> id) to be consistent across PMDA restarts. It
would only be a small perversion of the pmdaCache* routine usage to
allow the papi pmda to maintain a consistent and persistent metric name
to low-order bits of the PMID mapping.
Note this does not need to be consistent between hosts, just consistent
for a single host across repeated PMDA invocations (and possible changes
in version and configuration of the PMDA).
Returning to the multi-archive work that triggered all of this, where
there is a contradiction in the name <---> PMID mapping between
archives, we have several cases and options:
Case 1
Multiple names map to the same PMID.
Options:
1a. Assume this is same metric and the PMID is correct, add both names
to the PMNS (the data structure and libpcp routines support this, so for
example, given the PMID pmNameAll() will return all the corresponding
names). No rewriting of pmDesc or pmResult data structures is required.
1b. Assume these are different metrics. Invent a new (and unique) PMID
for the subsequent ones, add all the names and their unique PMIDs to the
PMNS. Synthesize a new pmDesc for each metric that gets an invented
PMID. Rewrite pmResult data structures to map from the old PMID to an
invented PMID, but in the context of the archive in which the name was
mapped (need to choose the PMID based on which archive the pmResult came
from).
Case 2
Multiple PMIDs map to the same name.
Options:
2a. Assume these are the same metric. Pick one PMID (probably the first
one encountered ... Dave this is the "first" one is the winner part I
was cryptically referring to in the earlier mail) and add mappings from
all the names to the same PMID into the PMNS. There is only one pmDesc
as a result, and in each pmResult any of the loser PMIDs need to be
mapped to the winner PMID.
2b. Assume these are different metrics. Invent a new (and unique) name
for the subsequent ones, add all the names and their associated PMIDs to
the PMNS. No rewriting on pmDesc of pmResult data structures is required.
Case 3
N:M mapping between PMIDs and names.
Forget it, don't even thing about this one.
I was suggesting that initially none of this is supported, i.e. the name
<--> PMID mapping was consistent across all archives in the set (Stage 0).
Then options 1a and 2b might be suitable for a Stage 1.
And Stage 2 might add support for all 4 options above (Frank's
suggestion is 1b I think).
Note that all of the options above can be supported OUTSIDE libpcp today
using (admittedly manual) pmlogrewrite step on one or more archives to
remove the inconsistencies before the archive set is processed as a unit.
|