Thank you for your detailed notes. I am new to creating
PMDAs and really appreciate the benefit of your experience. I was
concerned about supporting this as a DSO since instability in the
PDMA would hurt pmcd. Once your patch is applied I will look at the
other issues you raised.
Scott Emery
emery@xxxxxxx
In message <1235694555.4166.48.camel@xxxxxxxxxxxxxxxxxx>, Nathan Scott writes:
>On Thu, 2009-02-26 at 15:17 -0600, Martin Hicks wrote:
>I've pushed changes to the "master" branch in my git repository on oss.
>> This is my first crack at issuing a stable 2.7.9 release.
>>
>
>Great!
>
>> I'd still like to merge in the lustre PMDA and the cluster PMDA that
>> are still on my "dev" branch.
>>
>> If I can get a quick review on the two other PMDAs I think I'll be
>> ready to merge them into the dev branch and cherry pick them over
>> to master.
>
>I've looked over the Lustre PMDA - here's some notes and a patch
>to cleanup some, but further work is in order here I think before
>merging:
>
>- lustrecomm/help needs some love - no text for any metrics yet?
>- theres no indom, so that first help line can go [patched]
>- the main() line is a bit dated now - should use __pmSetProgname,
> the OS-independent path separator stuff (even though Linux-only,
> someone else may copy from here someday), the DSO side of things
> is missing a bit of code [patched]
>- when installing on my laptop, SIGSEGV received from the PMDA
> during "pminfo -v lustrecomm" - doesn't handle no-procfs files?
> [patched]
>- error handling should use the __pmNotify API for consistent and
> timestamped messages [one example patched, more remain]
>- last four metrics in metrictab listed as 64 bits, but when we
> extract values via file_indexed() we pass in PM_TYPE_32.
>- the fetch routine comment about PCP values not matching kernel
> values is worrying - more details there? these sometimes need
> to be matching. If size changes depending on 32/64 bit kernel
> for counter metrics, then the agent needs to deal with that
> (e.g. as pmdalinux does)
>- "aux" in fetch routine is always 10 - eh?
>- fetch_callback routine malloc has no error guard.
>- file_indexed malloc has no error guard
>- would be better if we had no mallocs on the fetch_callback code
> path - see last paragraph of "design notes" below.
>- file_indexed SIGSEGV on the strcpy if refresh_file succeeds but
> f_s->datap is still NULL (observed on my laptop) [patched]
>- filestatetab - only first entry used? (v4?)
>- the fetch callback initial check for invalid PMIDs should be
> done differently - pass back PM_ERR_PMID after each switch(item)
> and in the final else branch at the end of going through the
> recognised idp->cluster values.
>- when no Lustre is installed, we should get PM_ERR_APPVERSION
> instead of zero for the metric values. A few of the values
> returned aren't zero, but look like uninitialised data too
> (timeout, and the 64 bit ones). Lustre can probably be loaded
> (and unloaded) as a kernel module, so this should get fixed.
>
>
>- General design notes:
> - pmns.v4 - should be just one pmns file, and metrics that dont
> exist for earlier versions should return PM_ERR_APPVERSION,
> once the v4 support is done.
> - the type abstraction in file_indexed/single seems like extra
> complexity for no reason? esp. since we only ever pass 32 in
> here as the type... (and should be doing 64 too I think, but
> no more).
> - the use of sub-second file_time_offset to guard against fast
> repeated lookups (I guess?) adds a fair bit of code ... is
> this needed really? The kernel agent doesn't even do this.
> It was the underlying cause of the SIGSEGV, ... so complexity
> is reducing resilience there. Consider removing it?
> - oh, wait - are you doing this because you don't want to read
> the files multiple times per fetch? If so, other PMDAs handle
> this differently, using an (initial) fetch routine separate to
> the fetch_callback - do the read once in the initial fetch,
> into a global struct holding correctly typed values, and then
> just copy those values into the pmAtomValue in the callback.
>
> That might simplify things alot here. I'd recommend using
> the Linux kernel PMDA as a reference here - it uses lots of
> procfs files of course and has the read-once-per-fetch trick
> (see need_refresh[] in there).
>
>
>cheers.
>
>--
>Nathan
>
|