pcp
[Top] [All Lists]

Re: [pcp] plans for a stable release

To: Nathan Scott <nscott@xxxxxxxxxx>
Subject: Re: [pcp] plans for a stable release
From: Scott Emery <emery@xxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 26 Feb 2009 19:04:24 -0600
Cc: Martin Hicks <mort@xxxxxxx>, Scott Emery <emery@xxxxxxx>, pcp@xxxxxxxxxxx
In-reply-to: Your message of "Fri, 27 Feb 2009 11:29:15 +1100." <1235694555.4166.48.camel@xxxxxxxxxxxxxxxxxx>
        Thank you for your detailed notes.  I am new to creating
PMDAs and really appreciate the benefit of your experience.  I was
concerned about supporting this as a DSO since instability in the
PDMA would hurt pmcd.  Once your patch is applied I will look at the
other issues you raised.

Scott Emery
emery@xxxxxxx

In message <1235694555.4166.48.camel@xxxxxxxxxxxxxxxxxx>, Nathan Scott writes:
>On Thu, 2009-02-26 at 15:17 -0600, Martin Hicks wrote:
>I've pushed changes to the "master" branch in my git repository on oss.
>> This is my first crack at issuing a stable 2.7.9 release.
>> 
>
>Great!
>
>> I'd still like to merge in the lustre PMDA and the cluster PMDA that
>> are still on my "dev" branch.
>> 
>> If I can get a quick review on the two other PMDAs I think I'll be
>> ready to merge them into the dev branch and cherry pick them over
>> to master.
>
>I've looked over the Lustre PMDA - here's some notes and a patch
>to cleanup some, but further work is in order here I think before
>merging:
>
>- lustrecomm/help needs some love - no text for any metrics yet?
>- theres no indom, so that first help line can go  [patched]
>- the main() line is a bit dated now - should use __pmSetProgname,
>  the OS-independent path separator stuff (even though Linux-only,
>  someone else may copy from here someday), the DSO side of things
>  is missing a bit of code  [patched]
>- when installing on my laptop, SIGSEGV received from the PMDA
>  during "pminfo -v lustrecomm" - doesn't handle no-procfs files?
>  [patched]
>- error handling should use the __pmNotify API for consistent and
>  timestamped messages [one example patched, more remain]
>- last four metrics in metrictab listed as 64 bits, but when we
>  extract values via file_indexed() we pass in PM_TYPE_32.
>- the fetch routine comment about PCP values not matching kernel
>  values is worrying - more details there?  these sometimes need
>  to be matching.  If size changes depending on 32/64 bit kernel
>  for counter metrics, then the agent needs to deal with that
>  (e.g. as pmdalinux does)
>- "aux" in fetch routine is always 10 - eh?
>- fetch_callback routine malloc has no error guard.
>- file_indexed malloc has no error guard
>- would be better if we had no mallocs on the fetch_callback code
>  path - see last paragraph of "design notes" below.
>- file_indexed SIGSEGV on the strcpy if refresh_file succeeds but
>  f_s->datap is still NULL (observed on my laptop)  [patched]
>- filestatetab - only first entry used?  (v4?)
>- the fetch callback initial check for invalid PMIDs should be
>  done differently - pass back PM_ERR_PMID after each switch(item)
>  and in the final else branch at the end of going through the
>  recognised idp->cluster values.
>- when no Lustre is installed, we should get PM_ERR_APPVERSION
>  instead of zero for the metric values.  A few of the values
>  returned aren't zero, but look like uninitialised data too
>  (timeout, and the 64 bit ones).  Lustre can probably be loaded
>  (and unloaded) as a kernel module, so this should get fixed.
>
>
>- General design notes:
>  - pmns.v4 - should be just one pmns file, and metrics that dont
>    exist for earlier versions should return PM_ERR_APPVERSION,
>    once the v4 support is done.
>  - the type abstraction in file_indexed/single seems like extra
>    complexity for no reason?  esp. since we only ever pass 32 in
>    here as the type... (and should be doing 64 too I think, but
>    no more).
>  - the use of sub-second file_time_offset to guard against fast
>    repeated lookups (I guess?) adds a fair bit of code ... is
>    this needed really?  The kernel agent doesn't even do this.
>    It was the underlying cause of the SIGSEGV, ... so complexity
>    is reducing resilience there.  Consider removing it?
>  - oh, wait - are you doing this because you don't want to read
>    the files multiple times per fetch?  If so, other PMDAs handle
>    this differently, using an (initial) fetch routine separate to
>    the fetch_callback - do the read once in the initial fetch,
>    into a global struct holding correctly typed values, and then
>    just copy those values into the pmAtomValue in the callback.
>
>    That might simplify things alot here.  I'd recommend using
>    the Linux kernel PMDA as a reference here - it uses lots of
>    procfs files of course and has the read-once-per-fetch trick
>    (see need_refresh[] in there).
>
>
>cheers.
>
>--
>Nathan
>

<Prev in Thread] Current Thread [Next in Thread>