On 05/16/2013 10:36 AM, Nathan Scott wrote:
Looking good. Some small stuff from reviewing...
thanks for that
man/man1/
- man page doesn't do collectl(1) referencing right - should it be:
.BR collectl (1)
- add reference to collectl(1) at end of man page.
- just for grins, maybe add mention of pmcollectl(1) as well
ok will do
- the man page explicitly says "supports XYZ metrics only currently"
well the intent was to add to the list as new metrics handlers are added.
Didn't want to set any expectations that everything in a collectl
archive can be imported in the current implementation - collectl raw
archives consist of a header followed by a repeating sequence of
timestamp and a bunch of raw data copied from /proc/whatever .. repeat.
There are something like 75 different line patterns to parse, which
can be factored down to roughly 20 or so different handlers (much like
the linux PMDA I guess).
which makes it brittle to future code updates to increase coverage -
the code does a good job of handling the feedback about skipped data
so perhaps the man page could just have words describing that output
(which will hopefully lessen over time).
ok I'll make that man page change then
- loving all the packaging work that has been done, thanks!!!
haven't tried to build the deb packages yet - should work!!
src/pmimport/collectl2pcp
- the -a option seems a bit wierd in collectl2pcp.c - a mandatory
option? I was expecting syntax more like:
$ collect2pcp infile outfile
how about: collectl2pcp [-v] [-D N] output input [input ...]
or perhaps collectl2pcp [-v] [-D N] input [input ...] output
- allowing multiple infiles on the cmd line opens up the problem
of having data from different hosts - is there any advantage to
supporting multiple files, rather than just doing em individually
and then running pmlogmerge? I guess its a usability thing for
when you have many days worth of data right? Fair enough then but
it does obfuscate the code a little, so...
definitely wanted this feature since the most common use case
(for me anyway) is: collectl2pcp pcparchive /var/log/collectl/*.gz
and then run pmchart -a pcparchive ... without needing a wrapper
script to gather up all the archives and merge'em all together.
Of course it'd help if pmchart supported multiple archives on the
command line and knew how to internally sort by host and stitch them
together cronologically ;) but that's a separate RFE (and also a
usability issue). And yep the code checks each collectl header
is for the same host.
- code could do with a bit of refactoring - everything in main(),
perhaps the inner loops (from "parse collectl header" and the code
below that) could become a new function, passed in a file pointer
and one or two other things
ok will do that.
- the two snippets of code that begins "if (filenum == 0)" - looks
like they could be moved out of all of the loops entirely (after
pmiStart)
could be done too
- the handlers seem a little brittle in the face of odd data, eg.
the cpu_handler assumes the passed buffer will be of length >3 -
might be a good idea to pass the length of the line just read in
to these callbacks so they're more inclined to validate the input
they are given?
yes ok good point - I'll do that. For one thing - collectl archives
seem to be flushed out at odd intervals .. often resulting in
partial records near EOF for the current log at least.
- could we #include pmdas/linux/indom.h instead of duplicating it?
from a quick look, looks like it doesn't pull in many odd headers
or define too many things that'd make that difficult.
yes can do that - I wanted to keep these indom and pmid numbers
common with the linux and proc PMDAs so we can at least entertain
merging collectl archives with pcp archives .. but internal instance
IDs could be a problem with without some pmlogrewrite magic.
- re the TODOs in util.c - those seem like micro-optimisations -
if you implement 'em, great, but i wouldn't leave a TODO comment
in the code forever if not.
ok fair enough. Basically just the linear searches need some
indexing of some sort (this thing chews on collectl archives from large
machines for quite some time .. most of the time is spent looking up
metrics and handlers etc).
I'll send out another update with these changes, probably late tonight.
Cheers
|