pcp
[Top] [All Lists]

Re: [pcp] How should we approach the loading of the IB pmda as a LOCAL c

To: kenj@xxxxxxxxxxxxxxxx
Subject: Re: [pcp] How should we approach the loading of the IB pmda as a LOCAL context DSO?
From: nathans@xxxxxxxxxx
Date: Sat, 27 Mar 2010 08:23:38 +1100 (EST)
Cc: pcp@xxxxxxxxxxx, Corneliu Boac <cboac@xxxxxxx>
In-reply-to: <1254906689.178971269637761035.JavaMail.root@xxxxxxxxxxxxxxxxxx>
Sender: nscott@xxxxxxxxxx
----- "Ken McDonell" <kenj@xxxxxxxxxxxxxxxx> wrote:

> OK, there are several points here ...
> ...
> 2. I don't understand the "cluster pmda failed to load the ib pmda
> dso"

The code in question appears to be here:
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=pcp/pcp-pmda-cluster.git;a=blob;f=src/pmclusterd.c;h=c1b0a1d8c7bf60bd686b870255c983df13605216;hb=HEAD
around lines 440 - 460 or so.

> 3. PM_CONTEXT_LOCAL is a complete hack.  It was "invented" before the
> first release of PCP when one of the hurdles I needed to clear was a
> demonstrable proof to Akmal Khan (my boss at the time) that using PCP
> would not impose a horrific penalty in comparison to sar and/or
> vmstat ... to do this we needed to take the context switches and
> TCP/IP
> stack out of the picture, so the client app did not talk to pmcd.  So
> PM_CONTEXT_LOCAL was invented to support only the IRIX PMDA ...
> support
> for the sample PMDA was added later to help QA, but you don't want to
> start the sample PMDA in production every time a new PM_CONTEXT_LOCAL
> context is started, so the global "don't do it unless this magic
> environment variable is set" hack was added on top and we end up with
> tests for $PCP_LITE_SAMPLE (old style) and $PMDA_LOCAL_SAMPLE (new
> style) and $PMDA_LOCAL_PROC (IRIX only) and $PMDA_LOCAL_IB and ... .
> Everything that exists today in this area is built on this dodgey
> foundation.
> 
> Now I'd prefer to see PM_CONTEXT_LOCAL die.

IMO, the local context concept is sound, just the implementation is
lacking.  There are certainly some situations where it has advantages
over pmcd, if all the needed metrics are from known dso PMDAs (e.g.
the pmstat class of tool).

> I think Nathan has some real world cases where it is useful to be
> able
> to collect performance data on a specific host without needing
> infrastructure and sys admin cycles to ensure pmcd keeps running.

Yeah - no need to have the network up (attractive when implementing
local-focussed tools like top, vmstat, etc - as an option anyway -
pmstat should ideally fallback to local context if no pmcd connection
is possible), and as you mentioned - no need to install a system
service (root permissions) to try out pcp, the (revised) tutorial
could potentially make use of this, or an online pcp demo.

Mark mentioned another case specific to the cluster/infiniband PMDA
to me the other day, related to scheduling issues (and certainly I
remember the pain we had in production on Windows with pmdawindows
as a daemon wrt fetch timeouts, which we no longer see wit the dll
PMDA).  Marks on holiday though, so may be awhile till we hear from
him on this.

> So if we are to keep PM_CONTEXT_LOCAL, then I support Corneliu's
> suggestion that we should define new API support to add entries to
> dsotbl[] at run-time ... but how would this work with something like
> pminfo, where the metrics are arbitrary and the PCP client does not
> really know _which_ PMDAs need to be loaded into the local context?

I'd like to see this API cleaned up.  We have the unused argument to
pmNewContext which could be used to give, perhaps, a comma separated
list of dso PMDAs... (pretty sure we audited this a year or two back
and checked all callers were passing NULL in this argument atm, if it
helps).

cheers.

-- 
Nathan

<Prev in Thread] Current Thread [Next in Thread>