>>>>> "mort" == Martin Hicks <mort@xxxxxxx> writes:
mort> Hi Max and list,
mort> The infiniband PMDA, on it's first load, enumerates the local Infiniband
mort> ports and, if a config file in /var/lib/pcp/pmdas/ib does not exist,
mort> writes out a simple config file.
There are two levels on indirection here: firstly you need to have a
semi-stable mapping between external and internal instance
IDs, secondly you need to map the external instance IDs to HCAs/ports.
Originally I've used GUIDs to map the names like mlx4_1:1 to instance
IDs but it was such gross hack that I quickly got rid of it. Current
scheme attempts to tie the name to GUID via IB pmda config file and
then external to internal is done via pmdaCache.
mort> This creates certain problems if the HCA is replaced, since the GUIDs of
mort> the local devices will change and the IB pmda will search the fabric for
mort> non-existant GUIDs.
mort> I think there are two ways to solve this, but I'm not sure which is
mort> better:
mort> - the first would be just to not write out a default config file.
mort> Users who have written their own would at least know if the existance
mort> of this config file and would figure out quickly to update it if they
mort> changed their HCA.
mort> - Or, the local port GUIDs would enver appear in the config file. The
mort> config file would only contain additional (remote) GUIDs which should
mort> be monitored by the PMDA.
HCA names are assigned in the order of discovery, if you add (not just
change) an HCA then something which used to be mthca2 may become
mthca3. With the config file you have a name to GUID mapping, if you
add an HCA it will not be reported by the PMDA until the config file
is updated. You need to decide what's more important - stable instance
mapping or automatic adjustments in case of change. I've decided that
stable instance mapping was more important.
Perhaps a better way to deal with this would be to add some keywords
to the config file to indicate local HCAs and to complain bitterly if
they're not found.
BTW, the config file can be an executable too - nasmgr used that to
generate its own naming scheme.
max
|