OK, there are several points here ...
1. this looks like an unintended fallout from moving the IB and cluster
PMDAs out of the main PCP build ... HAVE_IBDEV never gets set in the PCP
tree, nor indeed in the pcp-pmda-infinband tree.
2. I don't understand the "cluster pmda failed to load the ib pmda dso"
part ... there should not be any nested dependency like this in the
pmdas ... it is possible the cluster pmda cannot fetch the ib metrics if
they appear in the cluster pmda's config file, but that is not likely to
be related to anything involving PM_CONTEXT_LOCAL ... Mark, please jump
in here if I've got this wrong.
3. PM_CONTEXT_LOCAL is a complete hack. It was "invented" before the
first release of PCP when one of the hurdles I needed to clear was a
demonstrable proof to Akmal Khan (my boss at the time) that using PCP
would not impose a horrific penalty in comparison to sar and/or
vmstat ... to do this we needed to take the context switches and TCP/IP
stack out of the picture, so the client app did not talk to pmcd. So
PM_CONTEXT_LOCAL was invented to support only the IRIX PMDA ... support
for the sample PMDA was added later to help QA, but you don't want to
start the sample PMDA in production every time a new PM_CONTEXT_LOCAL
context is started, so the global "don't do it unless this magic
environment variable is set" hack was added on top and we end up with
tests for $PCP_LITE_SAMPLE (old style) and $PMDA_LOCAL_SAMPLE (new
style) and $PMDA_LOCAL_PROC (IRIX only) and $PMDA_LOCAL_IB and ... .
Everything that exists today in this area is built on this dodgey
foundation.
Now I'd prefer to see PM_CONTEXT_LOCAL die.
I think Nathan has some real world cases where it is useful to be able
to collect performance data on a specific host without needing
infrastructure and sys admin cycles to ensure pmcd keeps running.
So if we are to keep PM_CONTEXT_LOCAL, then I support Corneliu's
suggestion that we should define new API support to add entries to
dsotbl[] at run-time ... but how would this work with something like
pminfo, where the metrics are arbitrary and the PCP client does not
really know _which_ PMDAs need to be loaded into the local context?
I think we need some design by argument here.
On Wed, 2010-03-24 at 11:50 -0500, Corneliu Boac wrote:
> Hello PCP group:
>
> I have ran into a PCP bug: the cluster pmda failed to load the ib pmda dso.
> How should we approach the loading of the IB pmda as a LOCAL context DSO?
> Should we just permanently add it to the dsotab like in the following patch,
> or should we rethink the dsotab and allow it to grow dynamically via a new
> API?
> ===========================================================================
> diff -uprBw pcp-3.1.1.sgi.orig/src/libpcp/src/GNUmakefile
> pcp-3.1.1.sgi/src/libpcp/src/GNUmakefile
> --- pcp-3.1.1.sgi.orig/src/libpcp/src/GNUmakefile 2010-02-25
> 15:07:18.000000000 -0600
> +++ pcp-3.1.1.sgi/src/libpcp/src/GNUmakefile 2010-03-24 09:47:08.000000000
> -0500
> @@ -91,11 +91,8 @@ else
> kernel_pmda_dso = $(TARGET_OS)
> endif
>
> -ifeq ($(HAVE_IBDEV),1)
> +# the pmdacluster needs the ib entry into the dsotab (even if HAVE_IBDEV is
> not defined by default)
> infiniband_pmda_dso = ib
> -else
> -infiniband_pmda_dso =
> -endif
>
> dsotbl.h: $(TOPDIR)/src/pmns/stdpmid
> echo '/* This file is automatically generated by build' > $@
> ===========================================================================
>
>
> After I apply this patch the libpcp/src/dsotbl.h looks like this:
> ---------------------------------------------------------------------------
> /* This file is automatically generated by build
> *
> * It contains list of DSO, supported by the CONTEXT_LOCAL
> */
> static __pmDSO dsotab[] = {
> #define LINUX_DSO 60
> { 60, "linux/pmda_linux.so", "linux_init" },
> #define IB_DSO 91
> { 91, "ib/pmda_ib.so", "ib_init" },
> #define MMV_DSO 70
> { 70, "mmv/pmda_mmv.so", "mmv_init" },
> #define SAMPLE_DSO 30
> { 30, "sample/pmda_sample.so", "sample_init" },
> };
> static int numdso = (sizeof(dsotab)/sizeof(dsotab[0]));
> ---------------------------------------------------------------------------
>
>
> Thank you,
> Cornel
> --
> Corneliu Boac - Software Engineer
> Silicon Graphics International
> 2750 Blue Water Road
> Eagan, MN 55121
> Phone: (651)683-7900
> E-mail: cboac@xxxxxxx
>
> _______________________________________________
> pcp mailing list
> pcp@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/pcp
>
|