| To: | pcp@xxxxxxxxxxx |
|---|---|
| Subject: | Notes from todays PCP planning meeting |
| From: | Nathan Scott <nathans@xxxxxxxxxx> |
| Date: | Thu, 1 Mar 2012 21:59:31 +1100 (EST) |
| In-reply-to: | <1144310562.260957.1330599156755.JavaMail.root@xxxxxxxxxxxxxxxxxxxxxx> |
|
Hi all,
Here's my braindump from this mornings meeting. Please feel free to reply with bits I've missed, glossed over anything important, or what-have-you ... I'm sure to have missed bits. Thanks again to TallPaul for breakfast, and Chatz for organising the conference call, 'twas great to have Frank and Jeff online this time! == Summary of PCP Planning Meeting == 1 March 2012, @Aconex Melbourne Attendees: Frank Che, Josef "Jeff" Sipek, Ken McDonell, Mark Goodwin, Nathan Scott, Paul Smith. Topics: 1. Coverity - First pass analysing PCP source scan output nearly done - Discussion around how to move forward; a second pass with the updated code will be needed, which will form the basis of management going forward as the code base changes. - Committing the coverity output into the trees in a common location (build/coverity or something like that), so that we can compare new vs old easily. - Some discussion around how Coverity works to see if we can coerce it to check the other platforms for us - Windows, Solaris, Mac, FreeBSD, etc - from a single build. No real conclusion there, may prove more difficult than its worth. - Mark planning to run tools over pcp-gui source as well for next run (actually, would be good to check pcp-graph too - that'd cover everything I think). 2. pcpqa packaging - Integrating with the Redhat testing process would be helped along by generating RPMs for pcpqa - Issues around hanging tests also need resolution - running just the "local" group should simplify things greatly, but some issues remain. - Ken to follow up with Frank. 3. Python bindings - Status - original author has moved on, but was tackling the entire client side API. - Expertise and many potential uses within Redhat driving a need for revitalisaing this project, client side more than PMDA interfaces. 4. Splitting process metrics from Linux kernel PMDA - Ken has a development branch somewhere... - Some discussion around where cgroup metrics belong, probably they should go with the process PMDA, likely to share code. - pmlogrewrite can be used to manage (automate) the domain# transition for log rotation and existing archives. 5. Fedora/OpenSUSE spec file consolidation - Some large differences in package naming convention, but the contents are very similar/same. - Mark to follup up with DavidD re shared spec with conditional components 6. Running pmcd and pmdas as unprivileged user - Long discussion, many many angles discussed - Log files in system directories are one of the easiest parts, via introducion of a pcp user, and appropriate permissions in logfile locations such that - How to address pmcd running as root is tougher, options; - Federation of pmcd servers each with their own ports and a meta-pmcd - could one pmcd become a special-case, historical wart, with root access, the rest are more pure-bred and run unprivileged? - Encourage users with local PMDAs, and a pmcd to collect up and provide a unified view to these fiefdoms. - Point made that this is not far removed from existing setup, where PMDAs can listen on their own sockets, and pmcd will rendevous with them once its running (which can be after the PMDAs start). - Another issues to address will be how pmcd becomes aware of a non-root user agents (pmcd.conf is the current mechanism, its only writable by root) and also their namespaces (PMNS rebuild must be done by root today). Mechanism could be built though, doesn't seem insurmountable. - Moved on to discussing pmproxy as a more formal front-end to all data for a host. pmproxy runs non-privileged already, so we could consider making its port the default target of live monitoring (-h) rather than pmcd, and then disallowing any remote access to pmcd (via pmcd.conf). While not solving all problems, seems a relatively pain-free path architecturally, but probably fairly painful for existing users. - Encryption and secure remote access are requirements for the enterprise solution - there are sites where this is strictly mandated, will get turned away at the door without this kind of feature. 7. pmlauch relaunch - pmlaunch is a mechanism from the olden days of PCP for allowing arbitrary clients to launch arbitrary other clients (typically in a drill-down style) with context of where they are starting from (which metrics are selected in a scene/visualisation, and which contexts/archives are in use, etc, etc). Implementation is via shell scripts, largely. It was used by pmview and oview most extensively. - Some initial discussion around where this code is currently at; its currently in a development branch off the pcp-gui tree and is not used by anything currently (pmchart has home-brew model for starting up a new pmchart / pminfo / pmval). - Discussion also around where these scripts should live, pcp-gui may not be the right place as these are just scripts and don't have anything specific to pcp-gui. In a core package seems like a safe spot (pcp libs maybe, for pcp-gui clients with no local pcp package, just libs). 8. Event visualisation - Follow-on from a recent discussion surrounding whether pmchart is the right place to house this stuff or whether a separate tool might be better suited. - Time-axis alignment (of both existing sampled charts, and new event visualisations) is desirable, but for very detailed view of event contents, its generally expected a separate window is going to be needed - probably this latter part should be in a separate tool, but the former needs to live in pmchart. - This tool would not only be useful for searching event records, but also the regular sampled metrics. 9. Web frontends and JSON-via-HTTP PMAPI bridge - Much discussion, everyone wants this. - Some learnings from initial prototype within pmproxy - Nathan has an action item to revisit and make notes about any protocol issues that arose that would cause additional state to be required on the server side, per-connection/context. - Frank pointed out his earlier idea to help resolve this which revolved around long-lived connections between server process and pmcd, plus a mechanism for sending cookies to clients for identification of their existing connection state, and a way to retire inactive connections. - Sounds like the only viable way to attack this - fundamentally a mismatch between pmcd protocol (connection oriented) and the web client model (connection-less, state-less vastly prefered). - Pluggable protocols would be nice, with a default to JSON still highly recommended by Paul. 10. Export/import integration with rrdtool and rrdgraph - Installed base of tools using these interfaces, in particular the round-robin database format, is large and ideally we could leverage this. - A tool like pmlogger that could generate rrdtool output would be helpful. - Some pmimport tools exist in this space, more would help. 11. Feature planning - As general rule, agreed oss.sgi.com bugzilla a fine place to house upstream bug reports and feature requests. - Anyone should feel welcome to make use of it for this sort of thing. Far better there than lost in private conversation. 12. PCP4 scheduling - Ken gave a roundup of where PCP4 todo list was at. - Because backward-compat has been so well covered in all these changes to date, discussed (and agreed) that a merge into the PCP3 code base was both feasible and desirable ... so plan of record is for Ken to Make It So, and a 3.6 is on the cards. - Plan to save PCP v4 for *even bigger* stuff - esp. breaking change in terms of protocol revisions, etc (which is expected to be needed for security extensions at the very least). 13. SNMP - Brief review of yesterdays meeting and on-list mail - Derived metrics and local context mode considered as yet more ways to skin this cat, all with various upsides and downsides. - Long response times from some SNMP devices throwing a spanner in many works, this would play havoc with PCP clients talking to the devices directly... really need to cache values somewhere. - MIB installation also makes multi-protocol aware clients more problematic - getting back to the bad old days of PCP1 where we had to distribute namespace files to clients - General mirth around "secret agent" mode in pmie, pmdasummary ("the agent with its head up its...") and possible use of the two to help with tightening up SNMP metric semantics - Overloading derived metrics bounced around as another model, possibly, cos the hooks are in the right places in libpcp - but maybe not too well suited in the end (IIRC?). Place-holders for discussions next time: - Discuss MMV scalability issues, approaches (Jeff/Nathan) We planned to meet more regularly, clearly there's always plenty to discuss (3 & 1/2 hours worth in last two days!)... every four weeks has been suggested, which sounds good to me. If anyone else is keen to come along or call in, just let me know and I will tee it up. cheers. -- Nathan |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Next by Date: | oss.sgi.com mailing list memberships reminder, mailman-owner |
|---|---|
| Next by Thread: | Re: Notes from todays PCP planning meeting, Frank Ch. Eigler |
| Indexes: | [Date] [Thread] [Top] [All Lists] |