Hi Nathan,
No worries, I shall have a look and correct most of this things you have
pointed out, some of them oversights on my behalf. Once the changes have
been done I shall make some noise again.
On 05/01/2013 02:57 AM, Nathan Scott wrote:
Hi Paul,
----- Original Message -----
Please consider the patch adding additional metrics for the gfs2 PMDA in
pcp.git.
...
Available at git://github.com/pauljevans/pcp.git dev
Good stuff! Have been reviewing it this morning, and made a few minor
tweaks along the way. For now I've created a merge branch while we go
through a few things - its now at "git://oss.sgi.com/pcp/pcp gfs2".
Following are all my review notes. Don't get disheartened by the length
of "stuff" here, I think overall its looking good and you've been thrown
in the deep-end with a relatively difficult agent! None of this is set
in stone either, just my thoughts - everything is open to discussion, of
course. Anyway, the notes are in no particular order, just file-by-file
as git-diff showed 'em to me...
qa/654
- 'gsf2' typo in comment (fixed)
- line below assumes the test is running as root, we need to
explicitly request root as automated QA runs unprivileged:
echo 1 > /sys/kernel/debug/tracing/events/gfs2/gfs2_glock_lock_time/enable
(fixed)
- also, should use "_notrun" instead of echo "Cannot enable
the gfs2 glock trace..." - that echo will cause a test fail
(which causes QA and dev folks to investigatE) whereas if
its simply kernel support thats lacking, it should just be
marked as not-been-run on that test machine.
- the above two points makes me think this test should be split
into two now - I notice on a debian unstable machine here
for example, that there is no support for GFS2 trace points
but the other metrics work fine still. So, I would suggest
the following:
- create a common.gfs2 script in pcp/qa which can be sourced
by both tests (and all new gfs2 tests later) to provide the
common functions like _setup_mounts and perhaps some of the
initial _notrun checks, etc.
- move the parts you've added into a new test (655 is free)
and add that additional "notrun" check on the trace file.
- further, I think I was being a bit optimistic simply assuming
the tester has setup gfs2 support on their machine (this was
my assumption, not yours of course). But, we could make sure
the test is run on most machines like this perhaps:
o grep gfs2 /proc/filesystems >/dev/null
o if non-zero status (not found):
o modprobe gfs2
o if non-zero exit state, _notrun "no gfs2 module & not builtin"
o grep gfs2 /proc/filesystems >/dev/null
o if non-zero status (not found):
o _notrun "failed to load gfs2 module successfully, sorry"
This would go into that common.gfs2 script for all gfs2 tests,
and would give us alot more confidence at release time that we
(both PCP community devs and Red Hat QE folks) are giving gfs2
a decent workout.
These are good points that I shall change into the qa tests.
src/pmdas/gfs2/Install
- Adds explicit enabling of the glock_lock trace point if the file
exists. This should be moved into the PMDA - the Install is only
run once, and so this (kernel) state will be lost across reboots,
for example.
- It also means having the agent present would mean the cost of
enabling the tracing must always be worn. We can do this in such
a way that this isn't necessary - there's two options:
1. enable tracing when metrics values are requested. this would
probably be too late, however.
2. use the pmStore(3) facility to dynamically request tracing be
enabled when needed, and disabled when done. This typically
would be achieved through the addition of a new metric - e.g.
gfs2.control.glock_time_tracing.
A fetch of that value would return the trace state (a read on
the trace enabling file hopefully tells us that?) and a store
would be used to enable/disable. See linux_store() from the
Linux kernel PMDA (src/pmdas/linux/pmda.c) for an example.
I recommend option #2. The pmstore(1) command can be used to test
this (e.g. in QA tests) and when you move on to the phase 2 (warm
liquid goo phase) of your work, where you write the cluster-aware
client-side analysis tool, this tool can do the pmStore(3) at the
appropriate time.
Yes, suggestion two seems to be a good way to move forward, considering
that debian unstable does not current have the GFS2 trace-points.
src/pmdas/gfs2/lock_time.c
- Hmmm, wondering if a linked-list is the right data structure here,
lets come back to that though.
- Coding style seems inconsistent with recent of PCP, and gfs2 code
for that matter ;) ... I'd recommend picking one of those instead
of free-styling it.
- gfs2_refresh_lock_time()
- Memory leak on "buffer" at line 108
- temp.lock_type == 2 || temp.lock_type == 3 - can we use macros
instead of 2/3? (LOCKTIME_* already exist, maybe those?)
- How big can this trace pipe get? I suspect "quite big" as it
appears to be a function of the client tools sampling interval
(does the kernel tracing drop traces after some time? I think
it has to otherwise kernel memory requirements are unbounded -
at what buffer size does that happen?) and also of the rate at
which the glock tracepoint is triggered, which is probably a
function of filesystem activity level?
- Assuming it can get "quite big", this may equate to a significant
memory allocation in the PMDA? And it seems a bit unnecessary -
could the read-allocate-loop (line 111) not be combined with the
parse-and-list-insert-loop (line 129)? Then only one buffer (or
possibly two at most) is needed and the memory requirements of
pmdagfs2 wont balloon out as much.
- We appear to have a single list for all glocks across all mounted
filesystems ... is that wise? Consider a hash or series of hash
tables?
- list_remove_duplicates() ... /* get rid of dups */ - again, I'd
think a hash table seems a better approach here - inserting over
the top of existing entries comes "for free" with a well-chosen
hash key, and a potentially expensive list-iteration goes away.
A suitable key would seem to be a composite of dev/type/number,
as suggested by list_remove_duplicates code:
if(iterator->dev_id == currentNode->next->dev_id &&
iterator->data.lock_type == currentNode->next->data.lock_type
&& iterator->data.number == currentNode->next->data.number){
... a hash would mean less memory, and no list_remove_duplicates
step needed, I think.
- You could avoid writing new hash table code by using the existing
pmdaCache(3) methods here - if you create an indom (which doesn't
need to be exposed outside of pmdagfs2), and a pmdaCacheLookupKey
on that composite key (as a string) should work fine. There's a
hash-iterator mechanism too which you can use (actually its used
already elsewhere in pmdagfs2, could use that as example code, or
one of the other PMDAs - its used quite extensively)
- IIRC, you originally wanted a top-10 worst glocks? Thats doable
here too - instead of just a single "worstGlock", you could keep
a sorted array and export those as separate metrics if you still
want to do that?
- Digging deeper, it looks like gfs2_refresh_lock_time() is called
in a loop, once per device_id (mounted filesystem) being queried.
However, the refresh code drains values from the event trace pipe
does it not? Hence, it would seem the first refresh_lock_time()
call will consume all the pending events and subsequent calls for
other filesystems will get no trace data?
- typo -> "comparision" (fixed)
Some more good points, will clean up the coding style and will look into
both hashes and pmCache(3) today and decide the best for the job here.
gfs2_refresh_lock_time is called for each file-system separately and the
device_id is used to map the results as they come from the trace_pipe to
each file-system has been mounted.
Yes it is true that the trace_pipe is emptied each call, but
unfortunately the events are fired in no particular order so the current
method was to save any data found for other file-systems in the list (or
any other data-structure) between calls using the dev_id to distinguish
what file-system the information belonged to. When the next file-system
called refresh_lock_time we would check the list for any stored data
from previous runs, it seemed the best idea at the time for mitigating
the chances of losing data. The trace-point's use the dev_id as their
way to represent the different file-systems, I believe this is true for
all gfs2 trace points at least.
I was hoping currently have just the worst glock for the call (and
collecting a top 10 client side) but this can be changed to give the
worst 10 for that fetch.
src/pmdas/gfs2/pmda.c
- gfs2_instance_refresh
- consider extracting the code that opens
/sys/fs/gfs2/BDEV/id and figures out the dev_id into a separate
function that returns dev_id ... would be more readable I think.
- gfs2_fetch_refresh
- makedev(253, 0) -> does that mean device-mapper only? Either way
253 should have a human-readable macro associated. And it really
can only be device mapper? Hmm - how does our QA on loopback
devices work for this? Or are these metrics not yet being tested?
(this may go away with the changes suggested above relating to the
way events can only be consumed once?)
Shall move the dev_id figuring out to a different method.
You've found a forgotten part of previous testing that I should have
deleted thankfully it currently isn't being used anywhere in that method
and just hanging around, shall swiftly deleted it.
src/pmdas/gfs2/pmdagfs2.h
- I'd put the dev_id at the start of the structure, but thats mainly
just cosmetic (it being an important identifier kind of field that
is accessed quite a bit).
No worries, shall move it up to the top.
Cheers,
Paul.
|