pcp
[Top] [All Lists]

Re: [pcp] Patch: gfs2 PMDA additional metrics for review/inclusion

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] Patch: gfs2 PMDA additional metrics for review/inclusion
From: Paul Evans <pevans@xxxxxxxxxx>
Date: Wed, 01 May 2013 09:58:23 +0100
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1117296374.7864618.1367373452615.JavaMail.root@xxxxxxxxxx>
References: <517FBD63.3010804@xxxxxxxxxx> <1117296374.7864618.1367373452615.JavaMail.root@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4
Hi Nathan,

No worries, I shall have a look and correct most of this things you have pointed out, some of them oversights on my behalf. Once the changes have been done I shall make some noise again.

On 05/01/2013 02:57 AM, Nathan Scott wrote:
Hi Paul,

----- Original Message -----
Please consider the patch adding additional metrics for the gfs2 PMDA in
pcp.git.
...
Available at git://github.com/pauljevans/pcp.git dev

Good stuff!  Have been reviewing it this morning, and made a few minor
tweaks along the way.  For now I've created a merge branch while we go
through a few things - its now at "git://oss.sgi.com/pcp/pcp gfs2".

Following are all my review notes.  Don't get disheartened by the length
of "stuff" here, I think overall its looking good and you've been thrown
in the deep-end with a relatively difficult agent!  None of this is set
in stone either, just my thoughts - everything is open to discussion, of
course.  Anyway, the notes are in no particular order, just file-by-file
as git-diff showed 'em to me...


qa/654

- 'gsf2' typo in comment (fixed)

- line below assumes the test is running as root, we need to
   explicitly request root as automated QA runs unprivileged:

   echo 1 > /sys/kernel/debug/tracing/events/gfs2/gfs2_glock_lock_time/enable
   (fixed)

- also, should use "_notrun" instead of echo "Cannot enable
   the gfs2 glock trace..." - that echo will cause a test fail
   (which causes QA and dev folks to investigatE) whereas if
   its simply kernel support thats lacking, it should just be
   marked as not-been-run on that test machine.

- the above two points makes me think this test should be split
   into two now - I notice on a debian unstable machine here
   for example, that there is no support for GFS2 trace points
   but the other metrics work fine still.  So, I would suggest
   the following:
   - create a common.gfs2 script in pcp/qa which can be sourced
     by both tests (and all new gfs2 tests later) to provide the
     common functions like _setup_mounts and perhaps some of the
     initial _notrun checks, etc.
   - move the parts you've added into a new test (655 is free)
     and add that additional "notrun" check on the trace file.
- further, I think I was being a bit optimistic simply assuming
   the tester has setup gfs2 support on their machine (this was
   my assumption, not yours of course).  But, we could make sure
   the test is run on most machines like this perhaps:
   o  grep gfs2 /proc/filesystems >/dev/null
   o  if non-zero status (not found):
     o  modprobe gfs2
     o  if non-zero exit state, _notrun "no gfs2 module & not builtin"
     o  grep gfs2 /proc/filesystems >/dev/null
     o  if non-zero status (not found):
        o  _notrun "failed to load gfs2 module successfully, sorry"
   This would go into that common.gfs2 script for all gfs2 tests,
   and would give us alot more confidence at release time that we
   (both PCP community devs and Red Hat QE folks) are giving gfs2
   a decent workout.
These are good points that I shall change into the qa tests.
src/pmdas/gfs2/Install

- Adds explicit enabling of the glock_lock trace point if the file
   exists.  This should be moved into the PMDA - the Install is only
   run once, and so this (kernel) state will be lost across reboots,
   for example.

- It also means having the agent present would mean the cost of
   enabling the tracing must always be worn.  We can do this in such
   a way that this isn't necessary - there's two options:
   1.  enable tracing when metrics values are requested.  this would
       probably be too late, however.
   2.  use the pmStore(3) facility to dynamically request tracing be
       enabled when needed, and disabled when done.  This typically
       would be achieved through the addition of a new metric - e.g.
       gfs2.control.glock_time_tracing.
       A fetch of that value would return the trace state (a read on
       the trace enabling file hopefully tells us that?) and a store
       would be used to enable/disable.  See linux_store() from the
       Linux kernel PMDA (src/pmdas/linux/pmda.c) for an example.

   I recommend option #2.  The pmstore(1) command can be used to test
   this (e.g. in QA tests) and when you move on to the phase 2 (warm
   liquid goo phase) of your work, where you write the cluster-aware
   client-side analysis tool, this tool can do the pmStore(3) at the
   appropriate time.
Yes, suggestion two seems to be a good way to move forward, considering that debian unstable does not current have the GFS2 trace-points.
src/pmdas/gfs2/lock_time.c

- Hmmm, wondering if a linked-list is the right data structure here,
   lets come back to that though.

- Coding style seems inconsistent with recent of PCP, and gfs2 code
   for that matter ;) ... I'd recommend picking one of those instead
   of free-styling it.

- gfs2_refresh_lock_time()

   - Memory leak on "buffer" at line 108

   - temp.lock_type == 2 || temp.lock_type == 3 - can we use macros
     instead of 2/3?  (LOCKTIME_* already exist, maybe those?)

   - How big can this trace pipe get?  I suspect "quite big" as it
     appears to be a function of the client tools sampling interval
     (does the kernel tracing drop traces after some time?  I think
     it has to otherwise kernel memory requirements are unbounded -
     at what buffer size does that happen?) and also of the rate at
     which the glock tracepoint is triggered, which is probably a
     function of filesystem activity level?

   - Assuming it can get "quite big", this may equate to a significant
     memory allocation in the PMDA?  And it seems a bit unnecessary -
     could the read-allocate-loop (line 111) not be combined with the
     parse-and-list-insert-loop (line 129)?  Then only one buffer (or
     possibly two at most) is needed and the memory requirements of
     pmdagfs2 wont balloon out as much.

   - We appear to have a single list for all glocks across all mounted
     filesystems ... is that wise?  Consider a hash or series of hash
     tables?

   - list_remove_duplicates() ... /* get rid of dups */  - again, I'd
     think a hash table seems a better approach here - inserting over
     the top of existing entries comes "for free" with a well-chosen
     hash key, and a potentially expensive list-iteration goes away.
     A suitable key would seem to be a composite of dev/type/number,
     as suggested by list_remove_duplicates code:
       if(iterator->dev_id == currentNode->next->dev_id &&
          iterator->data.lock_type == currentNode->next->data.lock_type
          && iterator->data.number == currentNode->next->data.number){
     ... a hash would mean less memory, and no list_remove_duplicates
     step needed, I think.

   - You could avoid writing new hash table code by using the existing
     pmdaCache(3) methods here - if you create an indom (which doesn't
     need to be exposed outside of pmdagfs2), and a pmdaCacheLookupKey
     on that composite key (as a string) should work fine.  There's a
     hash-iterator mechanism too which you can use (actually its used
     already elsewhere in pmdagfs2, could use that as example code, or
     one of the other PMDAs - its used quite extensively)

   - IIRC, you originally wanted a top-10 worst glocks?  Thats doable
     here too - instead of just a single "worstGlock", you could keep
     a sorted array and export those as separate metrics if you still
     want to do that?

   - Digging deeper, it looks like gfs2_refresh_lock_time() is called
     in a loop, once per device_id (mounted filesystem) being queried.
     However, the refresh code drains values from the event trace pipe
     does it not?  Hence, it would seem the first refresh_lock_time()
     call will consume all the pending events and subsequent calls for
     other filesystems will get no trace data?

   - typo -> "comparision" (fixed)
Some more good points, will clean up the coding style and will look into both hashes and pmCache(3) today and decide the best for the job here.

gfs2_refresh_lock_time is called for each file-system separately and the device_id is used to map the results as they come from the trace_pipe to each file-system has been mounted.

Yes it is true that the trace_pipe is emptied each call, but unfortunately the events are fired in no particular order so the current method was to save any data found for other file-systems in the list (or any other data-structure) between calls using the dev_id to distinguish what file-system the information belonged to. When the next file-system called refresh_lock_time we would check the list for any stored data from previous runs, it seemed the best idea at the time for mitigating the chances of losing data. The trace-point's use the dev_id as their way to represent the different file-systems, I believe this is true for all gfs2 trace points at least.

I was hoping currently have just the worst glock for the call (and collecting a top 10 client side) but this can be changed to give the worst 10 for that fetch.
src/pmdas/gfs2/pmda.c

- gfs2_instance_refresh
     - consider extracting the code that opens
     /sys/fs/gfs2/BDEV/id and figures out the dev_id into a separate
     function that returns dev_id ... would be more readable I think.

- gfs2_fetch_refresh
   - makedev(253, 0) -> does that mean device-mapper only?  Either way
     253 should have a human-readable macro associated.  And it really
     can only be device mapper?  Hmm - how does our QA on loopback
     devices work for this?  Or are these metrics not yet being tested?
     (this may go away with the changes suggested above relating to the
     way events can only be consumed once?)
Shall move the dev_id figuring out to a different method.

You've found a forgotten part of previous testing that I should have deleted thankfully it currently isn't being used anywhere in that method and just hanging around, shall swiftly deleted it.
src/pmdas/gfs2/pmdagfs2.h

   - I'd put the dev_id at the start of the structure, but thats mainly
     just cosmetic (it being an important identifier kind of field that
     is accessed quite a bit).
No worries, shall move it up to the top.

Cheers,

Paul.

<Prev in Thread] Current Thread [Next in Thread>