pcp
[Top] [All Lists]

Notes from todays PCP planning meeting

To: pcp@xxxxxxxxxxx
Subject: Notes from todays PCP planning meeting
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Thu, 1 Mar 2012 21:59:31 +1100 (EST)
In-reply-to: <1144310562.260957.1330599156755.JavaMail.root@xxxxxxxxxxxxxxxxxxxxxx>
Hi all,

Here's my braindump from this mornings meeting.  Please feel free
to reply with bits I've missed, glossed over anything important, or
what-have-you ... I'm sure to have missed bits.

Thanks again to TallPaul for breakfast, and Chatz for organising the
conference call, 'twas great to have Frank and Jeff online this time!


== Summary of PCP Planning Meeting
== 1 March 2012, @Aconex Melbourne

Attendees:
Frank Che, Josef "Jeff" Sipek, Ken McDonell, Mark Goodwin,
Nathan Scott, Paul Smith.

Topics:
1. Coverity
- First pass analysing PCP source scan output nearly done
- Discussion around how to move forward; a second pass with
  the updated code will be needed, which will form the basis
  of management going forward as the code base changes.
- Committing the coverity output into the trees in a common
  location (build/coverity or something like that), so that
  we can compare new vs old easily.
- Some discussion around how Coverity works to see if we can
  coerce it to check the other platforms for us - Windows,
  Solaris, Mac, FreeBSD, etc - from a single build.  No real
  conclusion there, may prove more difficult than its worth.
- Mark planning to run tools over pcp-gui source as well for
  next run (actually, would be good to check pcp-graph too -
  that'd cover everything I think).

2. pcpqa packaging
- Integrating with the Redhat testing process would be helped
  along by generating RPMs for pcpqa
- Issues around hanging tests also need resolution - running
  just the "local" group should simplify things greatly, but
  some issues remain.
- Ken to follow up with Frank.

3. Python bindings
- Status - original author has moved on, but was tackling the
  entire client side API.
- Expertise and many potential uses within Redhat driving a
  need for revitalisaing this project, client side more than
  PMDA interfaces.

4. Splitting process metrics from Linux kernel PMDA
- Ken has a development branch somewhere...
- Some discussion around where cgroup metrics belong, probably
  they should go with the process PMDA, likely to share code.
- pmlogrewrite can be used to manage (automate) the domain#
  transition for log rotation and existing archives.

5. Fedora/OpenSUSE spec file consolidation
- Some large differences in package naming convention, but the
  contents are very similar/same.
- Mark to follup up with DavidD re shared spec with conditional
  components

6. Running pmcd and pmdas as unprivileged user
- Long discussion, many many angles discussed
- Log files in system directories are one of the easiest parts,
  via introducion of a pcp user, and appropriate permissions in
  logfile locations such that
- How to address pmcd running as root is tougher, options;
  - Federation of pmcd servers each with their own ports and a
    meta-pmcd - could one pmcd become a special-case, historical
    wart, with root access, the rest are more pure-bred and run
    unprivileged?
  - Encourage users with local PMDAs, and a pmcd to collect up and
    provide a unified view to these fiefdoms.
  - Point made that this is not far removed from existing setup,
    where PMDAs can listen on their own sockets, and pmcd will
    rendevous with them once its running (which can be after the
    PMDAs start).
  - Another issues to address will be how pmcd becomes aware of a
    non-root user agents (pmcd.conf is the current mechanism, its
    only writable by root) and also their namespaces (PMNS rebuild
    must be done by root today).  Mechanism could be built though,
    doesn't seem insurmountable.
  - Moved on to discussing pmproxy as a more formal front-end to
    all data for a host.  pmproxy runs non-privileged already, so
    we could consider making its port the default target of live
    monitoring (-h) rather than pmcd, and then disallowing any
    remote access to pmcd (via pmcd.conf).  While not solving all
    problems, seems a relatively pain-free path architecturally,
    but probably fairly painful for existing users.
- Encryption and secure remote access are requirements for the
  enterprise solution - there are sites where this is strictly
  mandated, will get turned away at the door without this kind
  of feature.

7. pmlauch relaunch
- pmlaunch is a mechanism from the olden days of PCP for allowing
  arbitrary clients to launch arbitrary other clients (typically
  in a drill-down style) with context of where they are starting
  from (which metrics are selected in a scene/visualisation, and
  which contexts/archives are in use, etc, etc).  Implementation
  is via shell scripts, largely.  It was used by pmview and oview
  most extensively.
- Some initial discussion around where this code is currently at;
  its currently in a development branch off the pcp-gui tree and
  is not used by anything currently  (pmchart has home-brew model
  for starting up a new pmchart / pminfo / pmval).
- Discussion also around where these scripts should live, pcp-gui
  may not be the right place as these are just scripts and don't
  have anything specific to pcp-gui.  In a core package seems like
  a safe spot (pcp libs maybe, for pcp-gui clients with no local
  pcp package, just libs).

8. Event visualisation
- Follow-on from a recent discussion surrounding whether pmchart
  is the right place to house this stuff or whether a separate
  tool might be better suited.
- Time-axis alignment (of both existing sampled charts, and new
  event visualisations) is desirable, but for very detailed view
  of event contents, its generally expected a separate window is
  going to be needed - probably this latter part should be in a
  separate tool, but the former needs to live in pmchart.
- This tool would not only be useful for searching event records,
  but also the regular sampled metrics.
 
9. Web frontends and JSON-via-HTTP PMAPI bridge
- Much discussion, everyone wants this.
- Some learnings from initial prototype within pmproxy
- Nathan has an action item to revisit and make notes about any
  protocol issues that arose that would cause additional state
  to be required on the server side, per-connection/context.
- Frank pointed out his earlier idea to help resolve this which
  revolved around long-lived connections between server process
  and pmcd, plus a mechanism for sending cookies to clients for
  identification of their existing connection state, and a way
  to retire inactive connections.
- Sounds like the only viable way to attack this - fundamentally
  a mismatch between pmcd protocol (connection oriented) and the
  web client model (connection-less, state-less vastly prefered).
- Pluggable protocols would be nice, with a default to JSON still
  highly recommended by Paul.

10. Export/import integration with rrdtool and rrdgraph
- Installed base of tools using these interfaces, in particular
  the round-robin database format, is large and ideally we could
  leverage this.
- A tool like pmlogger that could generate rrdtool output would
  be helpful.
- Some pmimport tools exist in this space, more would help.

11. Feature planning
- As general rule, agreed oss.sgi.com bugzilla a fine place to
  house upstream bug reports and feature requests.
- Anyone should feel welcome to make use of it for this sort of
  thing.  Far better there than lost in private conversation.

12. PCP4 scheduling
- Ken gave a roundup of where PCP4 todo list was at.
- Because backward-compat has been so well covered in all these
  changes to date, discussed (and agreed) that a merge into the
  PCP3 code base was both feasible and desirable ... so plan of
  record is for Ken to Make It So, and a 3.6 is on the cards.
- Plan to save PCP v4 for *even bigger* stuff - esp. breaking
  change in terms of protocol revisions, etc (which is expected
  to be needed for security extensions at the very least).

13. SNMP
- Brief review of yesterdays meeting and on-list mail
- Derived metrics and local context mode considered as yet more
  ways to skin this cat, all with various upsides and downsides.
- Long response times from some SNMP devices throwing a spanner
  in many works, this would play havoc with PCP clients talking to
the devices directly... really need to cache values somewhere.
- MIB installation also makes multi-protocol aware clients more
  problematic - getting back to the bad old days of PCP1 where we
  had to distribute namespace files to clients
- General mirth around "secret agent" mode in pmie, pmdasummary
  ("the agent with its head up its...") and possible use of the
  two to help with tightening up SNMP metric semantics
- Overloading derived metrics bounced around as another model,
  possibly, cos the hooks are in the right places in libpcp - but
  maybe not too well suited in the end (IIRC?).


Place-holders for discussions next time:
- Discuss MMV scalability issues, approaches (Jeff/Nathan)

We planned to meet more regularly, clearly there's always plenty
to discuss (3 & 1/2 hours worth in last two days!)... every four
weeks has been suggested, which sounds good to me.  If anyone
else is keen to come along or call in, just let me know and I will
tee it up.

cheers.

--
Nathan
<Prev in Thread] Current Thread [Next in Thread>