pcp
[Top] [All Lists]

Re: [pcp] Patch: gfs2 PMDA additional metrics for review/inclusion

To: Paul Evans <pevans@xxxxxxxxxx>
Subject: Re: [pcp] Patch: gfs2 PMDA additional metrics for review/inclusion
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 30 Apr 2013 21:57:32 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <517FBD63.3010804@xxxxxxxxxx>
References: <517FBD63.3010804@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: 3Lg8zmaQYN4fuITiXKTJ1cBfUIB73g==
Thread-topic: Patch: gfs2 PMDA additional metrics for review/inclusion
Hi Paul,

----- Original Message -----
> 
> Please consider the patch adding additional metrics for the gfs2 PMDA in
> pcp.git.
> ...
> Available at git://github.com/pauljevans/pcp.git dev
> 

Good stuff!  Have been reviewing it this morning, and made a few minor
tweaks along the way.  For now I've created a merge branch while we go
through a few things - its now at "git://oss.sgi.com/pcp/pcp gfs2".

Following are all my review notes.  Don't get disheartened by the length
of "stuff" here, I think overall its looking good and you've been thrown
in the deep-end with a relatively difficult agent!  None of this is set
in stone either, just my thoughts - everything is open to discussion, of
course.  Anyway, the notes are in no particular order, just file-by-file
as git-diff showed 'em to me...


qa/654

- 'gsf2' typo in comment (fixed)

- line below assumes the test is running as root, we need to
  explicitly request root as automated QA runs unprivileged:

  echo 1 > /sys/kernel/debug/tracing/events/gfs2/gfs2_glock_lock_time/enable
  (fixed)

- also, should use "_notrun" instead of echo "Cannot enable
  the gfs2 glock trace..." - that echo will cause a test fail
  (which causes QA and dev folks to investigatE) whereas if
  its simply kernel support thats lacking, it should just be
  marked as not-been-run on that test machine.

- the above two points makes me think this test should be split
  into two now - I notice on a debian unstable machine here
  for example, that there is no support for GFS2 trace points
  but the other metrics work fine still.  So, I would suggest
  the following:
  - create a common.gfs2 script in pcp/qa which can be sourced
    by both tests (and all new gfs2 tests later) to provide the
    common functions like _setup_mounts and perhaps some of the
    initial _notrun checks, etc.
  - move the parts you've added into a new test (655 is free)
    and add that additional "notrun" check on the trace file.
 
- further, I think I was being a bit optimistic simply assuming
  the tester has setup gfs2 support on their machine (this was
  my assumption, not yours of course).  But, we could make sure
  the test is run on most machines like this perhaps:
  o  grep gfs2 /proc/filesystems >/dev/null
  o  if non-zero status (not found):
    o  modprobe gfs2
    o  if non-zero exit state, _notrun "no gfs2 module & not builtin"
    o  grep gfs2 /proc/filesystems >/dev/null
    o  if non-zero status (not found):
       o  _notrun "failed to load gfs2 module successfully, sorry"
  This would go into that common.gfs2 script for all gfs2 tests,
  and would give us alot more confidence at release time that we
  (both PCP community devs and Red Hat QE folks) are giving gfs2
  a decent workout.

  
src/pmdas/gfs2/Install

- Adds explicit enabling of the glock_lock trace point if the file
  exists.  This should be moved into the PMDA - the Install is only
  run once, and so this (kernel) state will be lost across reboots,
  for example.

- It also means having the agent present would mean the cost of
  enabling the tracing must always be worn.  We can do this in such
  a way that this isn't necessary - there's two options:
  1.  enable tracing when metrics values are requested.  this would
      probably be too late, however.
  2.  use the pmStore(3) facility to dynamically request tracing be
      enabled when needed, and disabled when done.  This typically
      would be achieved through the addition of a new metric - e.g.
      gfs2.control.glock_time_tracing.
      A fetch of that value would return the trace state (a read on
      the trace enabling file hopefully tells us that?) and a store
      would be used to enable/disable.  See linux_store() from the
      Linux kernel PMDA (src/pmdas/linux/pmda.c) for an example.

  I recommend option #2.  The pmstore(1) command can be used to test
  this (e.g. in QA tests) and when you move on to the phase 2 (warm
  liquid goo phase) of your work, where you write the cluster-aware
  client-side analysis tool, this tool can do the pmStore(3) at the
  appropriate time.


src/pmdas/gfs2/lock_time.c

- Hmmm, wondering if a linked-list is the right data structure here,
  lets come back to that though.

- Coding style seems inconsistent with recent of PCP, and gfs2 code
  for that matter ;) ... I'd recommend picking one of those instead
  of free-styling it.

- gfs2_refresh_lock_time()

  - Memory leak on "buffer" at line 108

  - temp.lock_type == 2 || temp.lock_type == 3 - can we use macros
    instead of 2/3?  (LOCKTIME_* already exist, maybe those?)

  - How big can this trace pipe get?  I suspect "quite big" as it
    appears to be a function of the client tools sampling interval
    (does the kernel tracing drop traces after some time?  I think
    it has to otherwise kernel memory requirements are unbounded -
    at what buffer size does that happen?) and also of the rate at
    which the glock tracepoint is triggered, which is probably a
    function of filesystem activity level?

  - Assuming it can get "quite big", this may equate to a significant
    memory allocation in the PMDA?  And it seems a bit unnecessary -
    could the read-allocate-loop (line 111) not be combined with the
    parse-and-list-insert-loop (line 129)?  Then only one buffer (or
    possibly two at most) is needed and the memory requirements of
    pmdagfs2 wont balloon out as much.

  - We appear to have a single list for all glocks across all mounted
    filesystems ... is that wise?  Consider a hash or series of hash
    tables?

  - list_remove_duplicates() ... /* get rid of dups */  - again, I'd
    think a hash table seems a better approach here - inserting over
    the top of existing entries comes "for free" with a well-chosen
    hash key, and a potentially expensive list-iteration goes away.
    A suitable key would seem to be a composite of dev/type/number,
    as suggested by list_remove_duplicates code:
      if(iterator->dev_id == currentNode->next->dev_id &&
         iterator->data.lock_type == currentNode->next->data.lock_type
         && iterator->data.number == currentNode->next->data.number){
    ... a hash would mean less memory, and no list_remove_duplicates
    step needed, I think.

  - You could avoid writing new hash table code by using the existing
    pmdaCache(3) methods here - if you create an indom (which doesn't
    need to be exposed outside of pmdagfs2), and a pmdaCacheLookupKey
    on that composite key (as a string) should work fine.  There's a
    hash-iterator mechanism too which you can use (actually its used
    already elsewhere in pmdagfs2, could use that as example code, or
    one of the other PMDAs - its used quite extensively)

  - IIRC, you originally wanted a top-10 worst glocks?  Thats doable
    here too - instead of just a single "worstGlock", you could keep
    a sorted array and export those as separate metrics if you still
    want to do that?

  - Digging deeper, it looks like gfs2_refresh_lock_time() is called
    in a loop, once per device_id (mounted filesystem) being queried.
    However, the refresh code drains values from the event trace pipe
    does it not?  Hence, it would seem the first refresh_lock_time()
    call will consume all the pending events and subsequent calls for
    other filesystems will get no trace data?

  - typo -> "comparision" (fixed)


src/pmdas/gfs2/pmda.c

- gfs2_instance_refresh
    - consider extracting the code that opens
    /sys/fs/gfs2/BDEV/id and figures out the dev_id into a separate
    function that returns dev_id ... would be more readable I think.

- gfs2_fetch_refresh
  - makedev(253, 0) -> does that mean device-mapper only?  Either way
    253 should have a human-readable macro associated.  And it really
    can only be device mapper?  Hmm - how does our QA on loopback
    devices work for this?  Or are these metrics not yet being tested?
    (this may go away with the changes suggested above relating to the
    way events can only be consumed once?)

src/pmdas/gfs2/pmdagfs2.h

  - I'd put the dev_id at the start of the structure, but thats mainly
    just cosmetic (it being an important identifier kind of field that
    is accessed quite a bit).



<Prev in Thread] Current Thread [Next in Thread>