pcp
[Top] [All Lists]

Re: IB pmda and writing out the default config file

To: Martin Hicks <mort@xxxxxxx>
Subject: Re: IB pmda and writing out the default config file
From: Max Matveev <makc@xxxxxxxxx>
Date: Tue, 9 Jun 2009 23:12:48 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <20090608171828.GA14199@xxxxxxxxxxxxxxxxxxxxxxxxx>
References: <20090608171828.GA14199@xxxxxxxxxxxxxxxxxxxxxxxxx>
>>>>> "mort" == Martin Hicks <mort@xxxxxxx> writes:

 mort> Hi Max and list,

 mort> The infiniband PMDA, on it's first load, enumerates the local Infiniband
 mort> ports and, if a config file in /var/lib/pcp/pmdas/ib does not exist,
 mort> writes out a simple config file.

There are two levels on indirection here: firstly you need to have a
semi-stable mapping between external and internal instance
IDs, secondly you need to map the external instance IDs to HCAs/ports.

Originally I've used GUIDs to map the names like mlx4_1:1 to instance
IDs but it was such gross hack that I quickly got rid of it. Current
scheme attempts to tie the name to GUID via IB pmda config file and
then external to internal is done via pmdaCache.

 mort> This creates certain problems if the HCA is replaced, since the GUIDs of
 mort> the local devices will change and the IB pmda will search the fabric for
 mort> non-existant GUIDs.

 mort> I think there are two ways to solve this, but I'm not sure which is
 mort> better:

 mort> - the first would be just to not write out a default config file.
 mort>   Users who have written their own would at least know if the existance
 mort>   of this config file and would figure out quickly to update it if they
 mort>   changed their HCA.

 mort> - Or, the local port GUIDs would enver appear in the config file.  The
 mort>   config file would only contain additional (remote) GUIDs which should
 mort>   be monitored by the PMDA.

HCA names are assigned in the order of discovery, if you add (not just
change) an HCA then something which used to be mthca2 may become
mthca3. With the config file you have a name to GUID mapping, if you
add an HCA it will not be reported by the PMDA until the config file
is updated. You need to decide what's more important - stable instance
mapping or automatic adjustments in case of change. I've decided that
stable instance mapping was more important.

Perhaps a better way to deal with this would be to add some keywords
to the config file to indicate local HCAs and to complain bitterly if
they're not found.

BTW, the config file can be an executable too - nasmgr used that to
generate its own naming scheme.

max

<Prev in Thread] Current Thread [Next in Thread>