pcp
[Top] [All Lists]

Re: pcp-doc updates: PG draft, UAG review feedback

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: pcp-doc updates: PG draft, UAG review feedback
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Mon, 7 Oct 2013 23:36:09 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1285989644.16216362.1378636215997.JavaMail.root@xxxxxxxxxx>
References: <1428186571.5255560.1377416000569.JavaMail.root@xxxxxxxxxx> <21889907.5255563.1377416062936.JavaMail.root@xxxxxxxxxx> <y0mioymkgz2.fsf@xxxxxxxx> <1285989644.16216362.1378636215997.JavaMail.root@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: ZE8D2aj1PXhzTpX1wnZCdkbnP80V1iDKYdMa
Thread-topic: pcp-doc updates: PG draft, UAG review feedback

----- Original Message -----
> ----- Original Message -----
> > 
> > Some notes from the PG draft:
> >                                                                             
> >    
> > ...
> 
> As always, quality review - thanks Frank!  I've tackled the big
> item of Chapter 4 - adding several new sections about pmdammv -
> and a small number of the other low-hanging fruit here.  I also
> added in new sections describing event metrics in a fair bit of
> detail - a new version of the PG is on oss now.
> 
> Alot of work remains to get through updates from the rest of
> this review ... will get there over time, thanks.  If there's

Finally, we're over the line with a vastly improved Programmers
Guide ... followup review comment notes below.  There's other
areas where we can add more text (perl/python APIs come to mind,
PMIMPORT APIs, and one or two others) - but this gets us a good
initial stake in the ground.

Revised version is online on oss.sgi.com now.

Thanks!!

> 
> Some notes from the PG draft:
>                                                                               
> - page ix, "Reader Comments", should include the email address

Done.

> - page 1, "several ways to extend PCP", the MMV stuff should also
>   end up being documented in as much detail as the trace stuff
> 

Done (new sections, pictures-n-all, in Chapter 4), as mentioned
earlier.

> - page 6, "warning" how about a "note" or suggestion instead?

Done.

> 
> - page 6, "calls to exit(1) ... have grave implications" is too
>   vague.  Instead, mention that the pmcd process would exit.
>   (It's elaborated well on page 12, maybe xref?)

Done and done.

> 
> - page 9, step 5, add "assign unique domain numbers, xref section
>   2.3.2 (page 14).

Done.

> - page 10, step 11, pmgadgets doesn't exist in OSS pcp yet

Working on that now, so will leave as-is in optimistic expectation
of something arriving on that front in the not-too-distant future.

> - page 16, metrictab[] trivial PMDA, the PMID sentence (cluster 0 item 1)
>   should spell out that this is not a complete PMID, in that the pmda's
>   own number will be inserted at some point.  (This bit me once or twice
>   earlier.)

*nod*, done.

> - page 17, the example 2.6 counter/instantaneous rows suggests
>   rate-conversion is being done (at the pmda end) for the "counter"
>   value.

No - perhaps the "at the pmda end" bit is a clue to confusion here.

My understanding of the section is that its trying to explain the
differences between the three metric semantics - so, for a given
hypothetical metric, if we modify the metric semantics but keep
everything else the same (i.e. export the same value for each of the
three semantics), the table shows what the effects of this would be.

The effects of rate conversion (which is done on the client side for
counter type metrics) are that the difference in value between samples
is calculated, divided by the difference in time (two time units).  If
that same value was of "instantaneous" semantics, no rate conversion
is performed and the value is reported "as-is", as long as *some*
value was available.  In the case of "discrete" semantics, this last
condition need not hold.

>  Is this conversion to be done per-client-context, so that
>   the rate is well-defined ("average rate since the last time this
>   same client asked") as opposed to "average rate since ... when?".
>   In the former case, each client can accumulate the instantaneous
>   values from the rate deltas.  In the latter, then it seems like many
>   rate-"counter" metrics should also be exported as instantaneous
>   values.

The explanation is not sufficiently clear it would seem, but I'm not
doing a better job here either I suspect!  Perhaps a new statement to
the effect that all interpretations of semantics are performed on the
client side, within the monitor tools?  (have added that)

> - page 19, pmda instances, need names be unique, or only numbers?

There is a note on page 17 stating they need to be unique and that
its the responsibility of the PMDA to ensure this.  There are other
subtle rules too, hmm, not covered AFAICS.  (have added a new sub-
section on "Instance Identification" near the start here).

> - page 20, "dp->version.two ..." that .two. bit should be introduced,
>   and choice of .two. vs. others be outlined.

Bullet dodged by using .any. and defered further discussion of the
version stuff until the following sections (as we're discussing the
simple PMDA, we probably want to avoid getting bogged down in this
detail at this point of the story, methinks).

> - page 21, the caveat under the warning relates to the page 17 issue
>   above, maybe xref?

Done.  (/me notes that refers only to "caching" PMDAs, which is not
most PMDAs)

> - page 25, was expecting something about the version.* bits early on
>   here;

I think deep discussion here may muddy the water and distract
from the simple aspect of pmdasimple.  Added a one-liner and an
xref to the detailed versioning discussion later.

> why use pmdaSet*CallBack here instead of changing the
>   ->version.*.* variables?

Yeah, they're subtely different concepts (confusingly, both are
"callbacks" just of different kinds) - the version.*.* functions
are each handling one specific PDU.  The pmdaSet*CallBack calls
are optional helpers for those PDU handlers.
Its confusing because they are both "callbacks", really.  Added
a sentence attempting to explain.

> - page 26, for the cluster/item conditions, might want to note that
>   we're not also checking the idp->domain number, because it's not
>   necessary; or perhaps add an assertion instead of a wordy
>   explanation

Good point, done.  (words added, couldn't assert as we don't have
the pmdaDispatch structure here to compare its domain number to
the one stamped in the pmdaMetric)

> - page 26, should expand on the choice of "appropriate field in
>   the pmAtomValue", ie., tabulate the fields as a function of the TYPE

*nod* - that mapping is covered later, added xrefs.

> - page 28, store, should note implications of security, reliable-logging,
>   credentials

Following from IRC discussion, taking "reliable-logging" to be related
to pmlogger - good points, all - added several sentences here.

> - page 28, simple.color store example; should explain the meaning of the
>   loop over j (vsp->numval) at the for loop as a teaser for the info
>   on page 29, like for (j=...) /* loop over instances */

Done.

> - page 30, PMDA_INTERFACE_* are mentioned here for the first time;
>   earlier & more details please; but maybe we don't need to
>   document older interfaces at all

Its worth doing I think - different versions used throughout PCP,
and people might find any individual PMDA as a suitable starting
point.  Its easy to cover this anyway, since the interfaces all
need to be covered and are additive.

> - page 30 PMDA structures, aha, here is where the .one/.two,
>   PMDA_INTERFACE_* bits are described.  Perhaps move this section way
>   ahead, ahead of 2.4 perhaps

Didn't move it forward, but added earlier xrefs, as mentioned earlier.

> - page 32 ... maybe move the whole PMDA setup section up ahead too; it
>   just makes a little more sense to me to cover the PMDA operation
>   from initialization onward (in a temporal sequence)

I think the original author(s) structured it as they did so that it
follows more or less the procedure outlined in section 2.1 - focus
is first and foremost on "how to extract from target domain" (which
maps to pmFetch, as the most critical thing, hence they tackled that
first I suspect), and that then drives the need for further coverage
of instances, descriptors, data structures, etc.  And initialisation
is a trivial thing at the end then, which ties it all together.

That seems OK to me, if we start explaining initialisation, we need
to introduce lots of topics all at once that might seem irrelevent
to someone who just wants to get stuck into extracting their data.

So, in the end, left that ordering as is.

> - page 34, mind a small elaboration about why the pmda might want to
>   handle both simple_fetch and simple_fetchCallBack?

Covered that earlier (simple_fetch -> PDU level, simple_fetchCallBack
-> individual metric/instance within a PDU) - but its non-obvious, so
added another clarification here too.

> - page 35, should mention when pmdaMain might return -> exit

Done.

> - page 36, pmstore pmcd.control.debug, is that credential-controlled?

Nope.  Should be done, now that we default to local: in pmstore.
May be non-trivial though cos pmdapmcd predates libpcp_pmda.

> - page 37, looking forward to scox's sys/sdt.h macros being mentioned
>   before too long, as an additional pmda debugging technique, with
>   some systemtap samples;

*nod*, that'd be good.  Python API coverage in general is lacking also.

> maybe in general, stub placeholders for
>   sections we know are needed/coming (like the python/etc. bindings,
>   pmwebapi, etc.) would be helpful

*shakes head*, not a fan of doing that, since these things may never
arrive and its then distracting/annoying to the reader IMO.

> - page 39, .pmchart files; I didn't realize we did that.  Is there
>   analogous handling for pmieconf / pmlogconf fragments?

Not sure off the top of my head, that could be handy though.
/me checks - nope, we don't do those other things (PMDA Install
can of course do it itself too, but none do - for logconf, we
just install the files directly & use the logconf probe function
to test for the metrics needed).

> - page 45, Current PMAPI Context, should mention pmUseContext to switch
>   (and maybe pmReconnectContext, since we haven't automated that yet)

Done.

> - page 46, typo "unpackaing"

Done.

> - page 50, programming style topics should include memory management,
>   specifically the styles of handing over ownership of this or that
>   snippet of data structure to libpcp vs. the app; this is particularly
>   tricky for those AGGREGATE / EVENT / _STATIC values;

Events need lotsa special decode action/help from libpcp - this is
now documented in a section of its own, recently added (post-review).

I need to get my head around the AGGREGATE / _STATIC differences,
TBH I'm not sure exactly how they affect PMAPI users.

>   the error handling
>   section should mention general policies involving partial failures
>   (like pmLookupNames that partly fail, partly succeed; or other cases)

Done.

> - page 51.., general re. section 3.8, dunno how much value a direct
>   man-page-extract type reference dump for all these functions has;
>   for a tutorial, one might want to see these functions being used,
>   (like the pmda/simple examples were) rather than simply enumerated

I wouldn't have thought so either, but some people seem to like it -
the guy who hacked the initial python bindings seemed to have a large
preference for these over the man pages... *shrug*.

Agreed re examples being better still.

> - page 54, if the PMNS operations are rarely required in modern PCP
>   code, consider dropping their details from these docs; maybe just
>   mention and refer to the man pages

Done.

> - page 56, PMAPI context services, now we're talking ... this kind of
>   tabular overview is totally apropos for the manual
> 
> - page 57, 3.8.1 ... back to man-page-extracts, hmm.
> 
> - page 60, sample code, yummy
> 
> - page 61, pmGetContextHostName ... I guess thread-unsafe?

Ayup.  Note added.

> - page 65, record-mode is newly introduced here; maybe elaborate before
>   (or instead of) lunging into the API

Done.

> - page 69, the ...InDomArchive functions, it'd be good to get a motivation
>   for using these functions

*nod* - one for Ken perhaps, I don't really follow what the exact
circumstances are for which one would need these special variants.

> - page 70, "pmview" mystery superhero

Exits, stage left.  Vows to return someday.

> - page 76..77, some sample code, yummy
> 
> - page 78, programming issues & examples, now we're talking!

Yippee, nearly there; there's light at the end of the tunnel...

> - page 83, it'd be good to get a sense as to why the trace library is
>   considered heavyweight, some relative-to-printf costs or something?
>   can't some of our own code use them?

[... hah, its a train!]

Weeell, on every event it allocates memory & makes blocking system
calls.  Also isn't thread safe.  Since coding it, the world moved on
and many alternative, efficient, SMP-aware tracers arrived that make
alot more sense for people (and even PCP) to use than this.  I think
there's a strong case for a PCP static tracing API which operates at
a higher level (end-to-end, distributed tracing), but this isn't the
right API.

So, I'd not advocate use of it within PCP either - someday I hope we
can delete this code and replace it with something better (preferably
with a small core that isn't in PCP, like UST?) that we can build on.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>
  • Re: pcp-doc updates: PG draft, UAG review feedback, Nathan Scott <=