pcp
[Top] [All Lists]

PCP developers meeting minutes - 21st March

To: PCP Mailing List <pcp@xxxxxxxxxxx>
Subject: PCP developers meeting minutes - 21st March
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Mon, 17 Mar 2014 03:16:57 -0400 (EDT)
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <181660187.318596.1395039778979.JavaMail.zimbra@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: cU73TF+QfdyteVSrQIS4/JGs77Y/5w==
Thread-topic: PCP developers meeting minutes - 21st March
Hi all,

Below are all of my notes and everything else I could remember
after last weeks call.  I've also merged in all the notes that
Ken sent through too.  If there's anything I've missed or have
misunderstood from the call, please send follow-up - thanks!

At the end of the call we discussed frequency of the call.  It
is generally thought we should hold them a bit more often, but
not too much, given the timezone challenges - perhaps every 2,
3 or 4 months.  So I'll tee the next one up around June-ish.

cheers.


== Attendees

Dave Brolley <brolley@xxxxxxxxxx>
Frank Ch. Eigler <fche@xxxxxxxxxx>
Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Lukas Berk <lberk@xxxxxxxxxx>
Martins Innus <minnus@xxxxxxxxxxx>
Nathan Scott <nathans@xxxxxxxxxx>
Peter Evans <pevans@xxxxxxxxxx>


== General build/release topics

- build processes
  - git-tar vs make src-link
    o  planning to tackle this issue in next point release (nathans)
       which will remove the double handling in configure/build that
       packaging builds via Makepkgs induce
    o  no objections provided modified files are included (new files
       likely to need add - thats ok)

  - qa .out files vs pcp releases
    o  issue of qa tests requiring multiple output files raised, and
       all agreed that nowadays we can do away with the multiple out
       files (for pcp-version-difference .out files only)
    o  still need then general mechanism for other situations, and
       the notrun mechanism still needs to exist too.

  - pcp-gui + pcp-doc + pcp -> tree merge planning
    o  planning to tackle this issue in next point release (nathans)
       which will allow more aggressive enabling of new features in
       pmchart & other pcp-gui tools (eg common option handling). It
       also will reduce build/release time.
    o  merge pcp and pcp-gui trees â no pushback, needs conditional
       building for platforms without any Qt / too-old-Qt.

- release schedule
  - once-a-month useful?  sustainable?
    o  [nathans] seems to be working well, good balance between the
       frequency of release, amount of code change, and time spent
       doing releases & QA.
    o  [fche] advocating fewer releases - systemtap releases less often
       for example, and it does weekly Fedora rawhide snapshots.
    o  [kenj] confidence from QA coverage, and also seems to think the
       current frequency is working OK.
    o  general vibe was to continue doing releases on current schedule
       (monthly, mid-month), with flexibility where need to increase
       the time.
    o  nathans mentions next release will slip half a week to give a
       bit more time for his own dev work, more reviews/merges, more QA
    o  kenj suggests that nathans send out mail mid-cycle as to whether
       we're still tracking for the initially planned dates - email to
       the list should do the trick, try it for next couple of releases.

  - code reviews
    o  nathans seeking more people to be doing code reviews, this is a
       big time sink and would like to see this task (which is at times
       very time consuming & not-fun-at-all) shared around more.


== Archive logging topics

- pmlogger writes and logical record buffers
    o  fche has documented the archive format â see pcp-archive(5)
    o  fche pointed out existing unbuffered-writes option to pmlogger
    o  writes aligned to logical record structure â does the âu option
       already do this?  and if so, how does this get included in the
       pmlogger_check/pmlogger_daily circus? [kenj]
    o  nathans mentions that there are few guarantees re atomicity of
       write(2) between processes, we just need to deal with the fact
       that reader tools will see incomplete appending of archives.
    o  needs to be dealt with below the PMAPI [kenj & nathans]

- truncated archives
    o  there is no real error here, everything is OK up to the end of
       the previous record, and there is no missing data that can be
       recovered for the truncated (last) record [kenj]
    o  when reading forward â return PM_ERR_EOL instead of PM_ERR_LOGREC
       (which still makes sense when posn+header len is inside file size,
       but trailer len not found to be correct) [kenj]
    o  when moving to the end of the file to read backwards, need to read
       serially to find the logical EOF if the physical EOF does not smell
       right [kenj]

- pcp_daily pending items [action kenj]
    o  martins donât merge option (for easier rsync)
    o  one archive donât run pmlogextract optimization
    o  host@host fix (really pmnewlog)

- libpcp pending items [action kenj]
    o  work ongoing with fche archive with mark records performance
       issue [kenj]
    o  believed to be a libpcp issue, pmlogextract/merge not to blame,
       the archive appears to have been simply created through a great
       many (valid) log merges.

- pmlc access control changes
    o kenj to distribute a revised access control matrix for all pmlc
      commands, and then implement this
    o further discussion around af_unix sockets, nathans pointed brolley
      toward existing libpcp interface (__pmServerSetLocalCreds) taking a
      file descriptor and a hash, can be used here too.
 
- path toward grand-unified archive+live capability
  -  tail âf like functionality
    o  â -u solves part of it, but more thought needed on how clients
       would (a) synchronize with pmlogger @ EOF, and (b) deal with time
       semantics change from client-driven to pmlogger-defined (remember
       interp @ 5sec intervals may be very bursty if pmlogger is logging
       the metric at 5min intervals)  [kenj]

  -  virtually-glue-archives-together
    o  automated support for a family of related archives â suggestion
       extending pmNewContext to allow the second argument when using
       PM_CONTEXT_ARCHIVE to be a directory (as well as file)  [fche]
       Â  only if all archives in dir for the same host
       Â  only if all archives for disjoint time intervals
       Â  only if metadata consistent across all archives
       Â  need to merge metadata, rework for temporal index, handle
          archive switching similar to volume switching (.0 etc files
          are really virtual volumes of the one [concatenated] archive)
       Â  need to consider how one handles big temporal gaps so clients
       can be smarter [nathans concern] â maybe some sort of variant to
       the <mark> record that would allow a client to skip forward or
       backward over the gap and resume

  - would "it" require new context type?
    o  yes, to solve both the live/archive transition and multiple log
       aspects well [nathans]
    o  general concerns expressed about overloading / shoe-horning
       functionality into existing context types when we clearly will
       need a new context type eventually [nathans]
    o  fche suggests continual addition of code until nathans complains
       during code review that its going too far is a Just Fine option.
    o  not surprisingly, nathans remains non-comittal.  ;)
    o  all in agreement that the multiple-archive experimentation via
       directory-to-pmNewContext concept is a good way to move that
       aspect forward, and is highly likely to be generally useful.

  - acceptability of server/proxy process for full capability?
    o  deferred discussion until after further analysis/experimentation


== Data integrity

  - whither the fche/fsync patches?
    o  revisited review comments - nathans wants a well-thought-out API
       that tackles the meatier part of the problem (interaction between
       sync and rename); not content with a static inline in impl.h nor
       anything that unconditionally adds fsync's to client code paths
    o  perhaps move attention from library to tools that need to be very
       careful, e.g. pmlogger_daily before culling inputs  [kenj]
    o  pmlogrewrite âi seems particularly exposed [kenj to investigate]

  - streamable archive format?
    o  discussion deferred until further experience gained with unified
       context concepts gained (via experimentation and smaller chunks)


== Supported platforms

  - is windows status quo (mingw) going anywhere?
  - worth trying cygwin (posix) builds again?
    o  in general, native code and MinGW has proven to be important
       from the Windows journey so far [nathans & kenj]
    o  fche wonders if revisiting Cygwin will help keep the Windows
       port up to date.
    o  nathans states that from experience so far (Aconex, SGI CXFS) a
       Cygwin PCP is not what users want, so that doesn't really help.
    o  nathans suggests the Fedora-mingw cross compilation project is
       well suited to our needs; but a lack of time and helpers to
       keep the existing port up-to-date holds us back.
    o  kenj points out as long as new code is conditionally enabled (as
       tends to be required, for older *nix platforms anyway), keeping
       it going should not be that difficult - just need to get it to
       build once more.
    o  fche mentioned it also needs to be tested, not just built (wrt
       cross-compilation)

  - which unixes/distros are of interest?  release binaries for them?
    o  reviewed each of the binary platforms from oss.sgi.com Downloads.

  - how do we share build/testing load?
  - how to gather evidence about compatibility assumptions
  - how to entice distro reps into presence in pcp community?

    o  no good answers to any of these questions.
    o  everyone writing code should be doing testing [nathans & kenj]
    o  the main load-sharing needed is code-review-helpers [nathans]


== Anything else?

  - pevans mentioned a PCP demo (pmchart) would be given by Red Hat
    folks at an upcoming SAS conference, which is great to hear!

<Prev in Thread] Current Thread [Next in Thread>