pcp
[Top] [All Lists]

Re: [pcp] Proposal for handling dynamic metric names (and hence dynamic

To: kenj@xxxxxxxxxxxxxxxx
Subject: Re: [pcp] Proposal for handling dynamic metric names (and hence dynamic metrics)
From: Mark Goodwin <goodwinos@xxxxxxxxx>
Date: Thu, 09 Jul 2009 11:03:58 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <1247041911.7833.4.camel@bozo>
References: <1247041911.7833.4.camel@bozo>
User-agent: Thunderbird 2.0.0.21 (X11/20090320)
Looks like a decent well thought out proposal. I'm just wondering
whether you've captured all the scenarios that might break by the
introduction of non-leaf nodes that have a pmID?

Since we'll be introducing a pmns syntax change for the new
dynamic non-leaf nodes, do we know of any scripts that interpret
the ascii pmns syntax directly, or apps that will not be expecting
'*' from pmIDStr()?

And having a 'pad' field in the middle of a structure seems kind
of funky, but I can see the reasoning for wanting it there. Maybe
just use up the two existing pad bits and call it 'flags'? (with
room for three more flag values in the future, one of which could
be to flag an extended range of domain values).

Cheers
-- Mark


Ken McDonell wrote:
I've been threatening to get this out for sometime now.

There is no code to back any of this up (yet), it really is a
proposal ... so please let me know if you think this is a good or bad
idea, and holes being picked in the issues covered would be most
welcome, as would better ideas.

The motivation here is to get pmcd out of the way for cases where it is
the PMDA that knows what metrics are available, not the PMNS loaded by
pmcd, e.g. an mmv-like PMDA where the available metrics are discovered
from mmap'd files when the PMDA starts.


------------------------------------------------------------------------


    Proposal for Supporting Dynamic PCP Performance Metric Namespaces

Ken McDonell
kenj@xxxxxxxxxxxxxxxx

Initially in PCP, the Performance Metrics Namespace (PMNS) was local to each machine where PCP was being used. This made it difficult to co-ordinate the PMNS versions on multiple monitoring machines with the PMDAs installed on the collector machines and was quickly identified as a weakness and replaced by the "Distributed PMNS" we have today where the PMNS is maintained on the PCP collector machine or within the PCP archive. Monitoring applications ship their namespace requests to the relevant source of metrics, namely a /pmcd/ or a PCP archive.

Aside from the rare use of a local PMNS (with the *–n* option) by PCP monitoring applications, the principal use of the PMNS is to be loaded (or reloaded) by /pmcd/ and then used by /pmcd/ to respond directly to remote requests from PCP monitoring applications using /pmLookupName()/ (or the asynchronous equivalent pair /pmRequestNames()/ and /pmReceiveNames()/), /pmNameID()/ (or the asynchronous equivalent pair /pmRequestNameID()/ and /pmRecieveNameID()/), /pmNameAll()/ (or the asynchronous equivalent pair /pmRequestNameAll()/ and /pmReceiveNameAll()/ — although the former is defined, _documented but not implemented_!), /pmGetChildren()/, /pmGetChildrenStatus()/ (or the asynchronous equivalent pair /pmRequestNamesOfChildern()/ and /pmReceiveNamesOfChildren()/) and /pmTraversePMNS()/ (or the asynchronous equivalent pair /pmRequestTraversePMNS()/ and /pmReceiveTraversePMNS()/).

The PMNS on a collector machine is maintained as a single file with entries added and deleted as a part of the installation and removal of a PMDA.

While this regime has served PCP well for most PMDAs, there have been a small number of cases where the static nature of the PMNS has not been appropriate, e.g.

    * PCP's one-dimensional instance domain data model imposes the need
      for remapping when data naturally occurs across more than one
      dimension, e.g. histogram bins of service times for a set of
      similar operation types. This is usually addressed by either
      linearizing the instance domain and constructing composite
      instance names, or mapping one of the dimensions onto the PMNS.
      When both dimensions are variable this can become quite messy.

      All of the DBMS PMDAs had issues in this area.

    * Some PMDAs are data-driven, e.g. the memory mapped class of PMDAs,
      and for these the names of available metrics maybe "discovered"
      when the memory mapped files are opened and so cannot be part of
      the PMDA's PMNS at the time the PMDA is installed.

Existing methods for handling a dynamic aspect of the PMNS are all *ugly* and error-prone, e.g. make a new PMNS, update the global PMNS and sent /pmcd/ a SIGHUP signal.


      Proposal Overview

The existing PMNS will be extended (no backwards compatibility issues) to introduce a new "non-terminal" node that will be used to indicate that the PMNS below this point is dynamic and defined by the associated PMDA.

As an example to be used throughout this proposal, the *foo* PMDA (domain 44) supports dynamic names below the *foo.count* node in the PMNS. The relevant fragment of the ASCII PMNS would be as follows:

root {
        ...
        foo
        ...
}
...
foo {
        version 44:0:1
        count           44:*:*
        memory  44:0:2
}

The *foo* PMDA is willing to export metadata and metric values for the following additional (dynamic) metrics:
       *foo.count.ops* (PMID 44:1:0)
       *foo.count.errs* (PMID 44:1:1)
       *foo.count.numcount* (PMID 44:0:27)

Changes to /pmcd/ and new interactions with the *foo* PMDA would mean that attempts to look up the PMIDs for metrics with names beginning *foo.count.* would be passed from /pmcd/ to the *foo* PMDA, and similarly requests to find the names of metrics given their PMID would also be passed from /pmcd/ to the *foo* PMDA if they are not resolved in the PMNS loaded into /pmcd/.


      Detailed Changes Required


        Changes to the ASCII PMNS Format

As forshadowed, the syntax :*:* after a domain number would flag a PMNS node as the root of a subtree of names to be resolved in the associated PMDA.

The only place where the ASCII PMNS format is known at this level of detail is in the internal routine /loadascii()/ of /libpcp/ which is called from /pmLoadNameSpace()/, /pmLoadASCIINameSpace()/ and /pmGetPMNSLocation()/. So extending the parser here is simple.


        Changes to the Binary PMNS Format

The binary format of the PMNS is what is loaded into the address space after the ASCII PMNS has been parsed (it is also the format generated by /pmnscomp/ and read by the /libpcp/ routines, but this is just a performance short cut — /pmnscomp/ will need almost no change as it simply writes out the binary PMNS after it has been loaded).

The relevant data structure is __pmnsNode (defined in <pcp/impl.h>). Now this structure is sufficiently public that we cannot change it in any way that would break binary compatibility, and the only field avalable to encode *both* the PMDA's domain number and the dynamic nature of the node in the PMNS is the pmid field. Internally a pmid is structured thus (ignoring the endian alternative form):

typedef struct {
        int                     pad : 2;
        unsigned int    domain : 8;
        unsigned int    cluster : 12;
        unsigned int    item : 10;
} __pmID_int;

So the domain field must be used to encode the domain of the PMDA providing the dynamic names, but unfortunately there are no values for cluster and/or item that could be used to mark the node as the root of a subtree of dynamic names. By good fortune we have spare bits hiding in the pad field, so the proposal is to extend the __pmID_int struct to allocate one of the bits from pad to the new field dynamic, as follows:

typedef struct {
        unsigned int    dynamic : 1;
        int                     pad : 1;
        unsigned int    domain : 8;
        unsigned int    cluster : 12;
        unsigned int    item : 10;
} __pmID_int;

A value of 1 for dynamic encodes the fact that this PMNS node is the root of a dynamic subtree. Leaving pad between domain and dynamic would allow the domain field of a PMID to expand to 9 bits if that becomes necessary at some point in the future. This change does make a dynamic PMID negative when treated as a 32-bit integer but this should not be a problem as code of the form:

        if (pmid < 0) ...

is just plain wrong, and should probably be

        if (pmid == PMID_NULL) ...

Internally, the PMID for a node at the base of a dynamic subtree would be encoded as 1::4095:1023, i.e. the pad field set to 1 and all ones in the cluster and item fields to minimize the chance of any "false" matching.


        Changes for /pmlogger/ and PCP Archives

No changes are needed here as /pmlogger/ tolerates missing metrics and only adds PMNS and metadata information into the PCP archives for those metrics that can be found, so the PMID for the the root of a subtree of dynamic names will never appear in an archive, although the descendent nodes (with their associated names and PMIDs) may appear in an archive.


        Changes for /libpcp_pmda/

To support the additional interactions between /pmcd/ and the PMDAs the pmdaInterface structure needs to be extended. This will be PMDA_INTERFACE_4, and involves adding struct { } three; to the union, with all of the fields from struct { } two; plus the following:

        int     (*pmns_pmid)(char *, pmID *);
        int     (*pmns_name)(pmID, char **);
        int     (*pmns_children)(char *, char ***, int **);

The standard implementation of these routines should suffice for the majority of cases, but they are exposed in the interface to allow an over-riding implementation should that be necessary (this also makes them consistent with all other PDU handling routines in the PMDA library).


        Changes for /libpcp/

The table below describes the changes that are needed in various /libpcp/ routines that are used once a PMNS is loaded (for simplicity we've omitted the asynchronous versions of these synchronous routines, but the same semantics would apply to the asynchronous versions).

libpcp Routine  PMNS_LOCAL      PMNS_REMOTE
pmLookupName If namelist[i] prefix matches the path to a dynamic node in the PMNS, then pmidlist[i] is set to a dynamic PMID (with the domain domain of the PMDA from the root of the dynamic subtree, the cluster and item serial fields are set to all ones and the dynamic field set to 1) and /pmLookupName()/ returns success. So *foo.count* would return PM_ERR_NONLEAF (this is a special case that needs to be checked for), while *foo.count.ops* and *foo.count.anything.else* would both return the PMID 1:44:4095:1023. Ship to /pmcd/, where /pmLookupName()/ is called as in the PMNS_LOCAL case. Scan the resulting pmidlist[] and if a dynamic PMID is found and if the associated PMDA is using PMDA_INTERFACE_4 ship the original metric name to the matching PMDA and use the PMID and status that is returned from the /pmns_pmid()/ method, otherwise do nothing. So *foo.count* would return PM_ERR_NONLEAF, *foo.count.ops* would return PMID 0:44:1:0 and *foo.count.anything.else* would PM_ERR_NAME. pmNameID Nothing really special is required. Any PMID that could be associated with dynamic names will fail to match in the PMNS and PM_ERR_PMID will be returned. If the PMID has the dynamic field set to 1, then it might be possible to find and return the name at the root of a dynamic subtree (e.g. *foo.count* for PMID 1:44:4095:1023), but since this is potentially ambiguous and of no apparent use, PM_ERR_PMID will be returned in this case also. Ship to /pmcd/, where /pmNameID()/ is called as in the PMNS_LOCAL case. If PM_ERR_PMID is returned, then some extra processing is required. If PMID has the dynamic field set to 1 or the associated PMDA (from the domain field of the PMID) is not using PMDA_INTERFACE_4 then return PM_ERR_PMID as this cannot match any valid metric. If the domain field in the PMID matches *any* node in the PMNS that is the root of a dynamic subtree then ship the original PMID to the matching PMDA where the /pmns_name()/ method is called and use the metric name and status that is returned. pmNameAll Same handling as for /pmNameID()/. Same handling as for /pmNameID()/. pmGetChildren If name matches the path to a dynamic node in the PMNS then return zero (no descendents), otherwise there is no change in behaviour. So the descendents of *foo* will be *version*, *count* and *memory*, but there are no descendents of *foo.count*. Ship to /pmcd/, where /pmGetChildren()/ is called as in the PMNS_LOCAL case. If there are no descendents of name, and name is the root of a dynamic subtree in the PMNS, and the associated PMDA (from the domain field of the matching PMID) is using PMDA_INTERFACE_4 then ship the original name to the matching PMDA where the /pmns_children()/ method is called and use the list of offspring and status that is returned. Otherwise (not PMDA_INTERFACE_4 or no matching PMDA) if name is the root of a dynamic subtree in the PMNS then it is skipped. So, for example calling /pmGetChildren/ with *foo* as the name argument will return *foo.version*, *foo.count.ops*, *foo.count.errs*, *foo.count.numcount* and *foo.memory*. pmGetChildrenStatus Similar handling as for /pmGetChildren()/, except that the status for a descendent node that is itself the root of a dynamic subtree in the PMNS is set to PMNS_NONLEAF_STATUS. Similar handling as for /pmGetChildren()/, except the status may also be returned from the PMDA if /pmcd/ ships the request to a PMDA. pmTraversePMNS Any node that is at the root of a dynamic subtree of the PMNS is never returned to the dometric() method. Ship to /pmcd/, where /pmTraversePMNS()/ is called as in the PMNS_LOCAL case, however whenever a node is encountered that is the root of a dynamic subtree in the PMNS (i.e. the matching PMID has the dynamic field set to 1 and the associated PMDA (from the domain field of the PMID) is using PMDA_INTERFACE_4) then the path is shipped to the associated PMDA where the /pmns_children()/ method is called and all of the decendent names are returned to /pmcd/ which inserts these into the list of names to be returned to the caller for subsequent callbacks to the dometric() method. pmLoadNameSpace Parser changes as noted above, e.g. for 44:*:*. Do not check for duplicate PMIDs in the case of a node that is the root of a dynamic subtree, as there could be more than one dynamic subtree associated with the same PMDA. Not relevant.

A small change is needed in /pmIDStr()/ to detect a dynamic PMID (checking if the dynamic field is 1), and then output an asterisk in place of the numeric cluster and item fields.


      Some Limitations

The "dynamic" nature of the PMNS only applies to the PMNS at the time it is explored. For most monitoring tools this is at start up (typically after a configuration file has been read), so any changes to the PMNS after that point in time will not be noticed. Specifically this means:

    * /pmchart/ explores the PMNS each time a new plot is added, so
      metrics missing at the time a view is populated are dropped with a
      warning, but the metric selector is re-populated each time it is
      launched
    * /pmlogger/ will only log those metrics defined at the time it is
      started
    * /pmie/ will return "unknown" as the value of any expression
      involving metrics not defined at the time it is started
    * /pmdumptext/ will only report on those metrics defined at the time
      it is started
    * /pmval/ reports an error and exits if the metric name is not
      defined at the time it is started

This behaviour is no different to other existing interactions, e.g. when a PMDA is installed or removed, so is not a new issue.


------------------------------------------------------------------------

_______________________________________________
pcp mailing list
pcp@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/pcp

<Prev in Thread] Current Thread [Next in Thread>