Creating a Failover PolicyThis chapter tells you how to create a failover policy. It describes
the following topics:Contents of a Failover PolicyA failover policy is the method by which
a resource group is failed over from one node to another. A failover policy
consists of the following:failover
policycontentsFailover domainFailover attributesFailover scriptsLinux FailSafe uses the failover domain output from a failover script
along with failover attributes to determine on which node a resource group
should reside. The administrator must configure a failover policy for each resource
group. The name of the failover policy must be unique within the pool.
Failover DomainA failover domain is the ordered
list of nodes on which a given resource group can be allocated.
The nodes listed in the failover domain must be within
the same cluster; however, the failover domain does not have to include
every node in the cluster. The failover domain can also be used to statically
load balance the resource groups in a cluster.Examples:In a four–node cluster, a set of two nodes that
have access to a particular XLV volume may be the failover domain of the
resource group containing that XLV volume.In a cluster of nodes named venus, mercury, and pluto,
you could configure the following initial failover domains for resource
groups RG1 and RG2:failover policy
failover domaindomainfailover domainmercury, venus, pluto for RG1pluto, mercury for RG2The administrator defines the initial failover domaininitial failover domain
when configuring a failover policy. The initial failover domain is used
when a cluster is first booted. The ordered list specified by the initial
failover domain is transformed into a run-time failover domainrun-time failover domain
by the failover script. With each failure, the failover script takes the
current run-time failover domain and potentially modifies it; the initial
failover domain is never used again. Depending on the run-time conditions
and contents of the failover script, the initial and run-time failover
domains may be identical.For example, suppose the initial failover domain is:
N1 N2 N3The runtime failover domain will vary based on the failover script: If ordered: N1 N2 N3
If round-robin: N2 N3 N1
If a customized failover script, the order could be any
permutation, based on the contents of the script: N1 N2 N3
N1 N3 N2
N2 N3 N1
N2 N1 N3
N3 N2 N1
N3 N1 N2Linux FailSafe stores the run-time failover domain and uses it as
input to the next failover script invocation.Failover AttributesA failover attribute is a value that is passed
to the failover scrip and used by Linux FailSafe for the purpose of modifying
the run-time failover domain for a specific resource group. There are
required and optional failover attributes, and you can also specify your
own strings as attributes.failover
policyfailover attributesattributesfailover attributes shows the required failover attributes.
You must specify one and only one of these attributes. Note that the
starting conditions for the attributes differs: for the required attributes,
the starting condition is that a node joins the cluster membership when
the cluster is already providing HA services; for the optional attributes,
the starting condition is that HA services are started and the resource
group is running in only one node in the cluster
Required Failover Attributes (mutually exclusive)
NameDescription
Auto_Failback failover attributeAuto_Failback
Specifies that the resource group
is made online based on the failover policy when a node joins the cluster.
This attribute is best used when some type of load balancing is required.
You must specify either this attribute or the Controlled_Failback
attribute.Controlled_Failback failover attribute
Controlled_FailbackSpecifies that the resource group
remains on the same node when a node joins the cluster. This attribute
is best used when client/server applications have expensive recovery mechanisms,
such as databases or any application that uses tcp
to communicate. You must specify either this attribute or the
Auto_Failback attribute.
When defining a failover policy, you can optionally also choose
one and only one of the recovery attributes shown in .
The recovery attribute determines the node on which a resource group will
be allocated when its state changes to online and a member of the group
is already allocated (such as when volumes are present).
Optional Failover Attributes (mutually exclusive)
NameDescriptionAuto_Recovery failover attribute
Auto_RecoverySpecifies that the resource group
is made online based on the failover policy even when an exclusivity check
shows that the resource group is running on a node. This attribute is
optional and is mutually exclusive with the Inplace_Recovery
attribute. If you specify neither of these attributes, Linux
FailSafe will use this attribute by default if you have specified the
Auto_Failback attribute.InPlace_Recovery failover attribute
InPlace_RecoverySpecifies that the resource group
is made online on the same node where the resource group is running.
This attribute is the default and is mutually exclusive with the
Auto_Recovery attribute. If you specify neither of these attributes,
Linux FailSafe will use this attribute by default if you have specified
the Controlled_Failback attribute.
Failover Scriptsfailover policyfailover scriptfailover scriptdescriptionA
failover script generates the run-time failover domain and returns it
to the Linux FailSafe process. The Linux FailSafe process applies the
failover attributes and then selects the first node in the returned failover
domain that is also in the current node membership.The run-time of the failover script must be capped to a system-definable
maximum. Hence, any external calls must be guaranteed to return quickly.
If the failover script takes too long to return, Linux FailSafe will kill
the script process and use the previous run-time failover domain.Failover scripts are stored in the /usr/lib/failsafe/policies
directory.
/usr/lib/failsafe/policies directoryThe ordered Failover ScriptTheordered
failover scriptordered failover
script is provided with the release. The ordered script
never changes the initial domain; when using this script, the initial
and run-time domains are equivalent. The script reads six lines from the
input file and in case of errors logs the input parameters and/or the
error to the script log.The following example shows the contents of the ordered
failover script. (Line breaks added for readability.)#!/bin/sh
#
# Copyright (c) 2000 Silicon Graphics, Inc. All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
# Contact information: Silicon Graphics, Inc., 1600 Amphitheatre Pkwy,
# Mountain View, CA 94043, or:
#
# http://www.sgi.com
#
# For further information regarding this notice, see:
#
# http://oss.sgi.com/projects/GenInfo/NoticeExplan
#
# $1 - input file
# $2 - output file
#
# line 1 input file - version
# line 2 input file - name
# line 3 input file - owner field
# line 4 input file - attributes
# line 5 input file - list of possible owners
# line 6 input file - application failover domain
DIR=/usr/lib/failsafe/bin
LOG="${DIR}/ha_cilog -g ha_script -s script"
FILE=/usr/lib/failsafe/policies/ordered
input=$1
output=$2
{
read version
read name
read owner
read attr
read mem1 mem2 mem3 mem4 mem5 mem6 mem7 mem8
read afd1 afd2 afd3 afd4 afd5 afd6 afd7 afd8
} < ${input}
${LOG} -l 1 "${FILE}:" `/bin/cat ${input}`
if [ "${version}" -ne 1 ] ; then
${LOG} -l 1 "ERROR: ${FILE}: Different version no. Should be (1) rather than (${version})" ;
exit 1;
elif [ -z "${name}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: Failover script not defined";
exit 1;
elif [ -z "${attr}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: Attributes not defined";
exit 1;
elif [ -z "${mem1}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: No node membership defined";
exit 1;
elif [ -z "${afd1}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: No failover domain defined";
exit 1;
fi
found=0
for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do
for j in $mem1 $mem2 $mem3 $mem4 $mem5 $mem6 $mem7 $mem8; do
if [ "X${j}" = "X${i}" ]; then
found=1;
break;
fi
done
done
if [ ${found} -eq 0 ]; then
mem="("$mem1")"" ""("$mem2")"" ""("$mem3")"" ""("$mem4")"" \
""("$mem5")"" ""("$mem6")"" ""("$mem7")"" ""("$mem8")";
afd="("$afd1")"" ""("$afd2")"" ""("$afd3")"" ""("$afd4")"" \
""("$afd5")"" ""("$afd6")"" ""("$afd7")"" ""("$afd8")";
${LOG} -l 1 "ERROR: ${FILE}: Policy script failed"
${LOG} -l 1 "ERROR: ${FILE}: " `/bin/cat ${input}`
${LOG} -l 1 "ERROR: ${FILE}: Nodes defined in membership do not match \
the ones in failure domain"
${LOG} -l 1 "ERROR: ${FILE}: Parameters read from input file: \
version = $version, name = $name, owner = $owner, attribute = $attr, \
nodes = $mem, afd = $afd"
exit 1;
fi
if [ ${found} -eq 1 ]; then
rm -f ${output}
echo $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8 > ${output}
exit 0
fi
exit 1The round-robin Failover ScriptThe round-robin script selects the resource
group owner in a round-robin (circular) fashion. This policy can be used
for resource groups that can be run in any node in the cluster.The following example shows the contents of the round-robin
failover script. #!/bin/sh
#
# Copyright (c) 2000 Silicon Graphics, Inc. All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
# Contact information: Silicon Graphics, Inc., 1600 Amphitheatre Pkwy,
# Mountain View, CA 94043, or:
#
# http://www.sgi.com
#
# For further information regarding this notice, see:
#
# http://oss.sgi.com/projects/GenInfo/NoticeExplan
#
# $1 - input file
# $2 - output file
#
# line 1 input file - version
# line 2 input file - name
# line 3 input file - owner field
# line 4 input file - attributes
# line 5 input file - Possible list of owners
# line 6 input file - application failover domain
DIR=/usr/lib/failsafe/bin
LOG="${DIR}/ha_cilog -g ha_script -s script"
FILE=/usr/lib/failsafe/policies/round-robin
# Read input file
input=$1
output=$2
{
read version
read name
read owner
read attr
read mem1 mem2 mem3 mem4 mem5 mem6 mem7 mem8
read afd1 afd2 afd3 afd4 afd5 afd6 afd7 afd8
} < ${input}
# Validate input file
${LOG} -l 1 "${FILE}:" `/bin/cat ${input}`
if [ "${version}" -ne 1 ] ; then
${LOG} -l 1 "ERROR: ${FILE}: Different version no. Should be (1) \
rather than (${version})" ;
exit 1;
elif [ -z "${name}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: Failover script not defined";
exit 1;
elif [ -z "${attr}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: Attributes not defined";
exit 1;
elif [ -z "${mem1}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: No node membership defined";
exit 1;
elif [ -z "${afd1}" ]; then
${LOG} -l 1 "ERROR: ${FILE}: No failover domain defined";
exit 1;
fi
# Return 0 if $1 is in the membership and return 1 otherwise.
check_in_mem()
{
for j in $mem1 $mem2 $mem3 $mem4 $mem5 $mem6 $mem7 $mem8; do
if [ "X${j}" = "X$1" ]; then
return 0;
fi
done
return 1;
}
# Check if owner has to be changed. There is no need to change owner if
# owner node is in the possible list of owners.
check_in_mem ${owner}
if [ $? -eq 0 ]; then
nextowner=${owner};
fi
# Search for the next owner
if [ "X${nextowner}" = "X" ]; then
next=0;
for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do
if [ "X${i}" = "X${owner}" ]; then
next=1;
continue;
fi
if [ "X${owner}" = "XNO ONE" ]; then
next=1;
fi
if [ ${next} -eq 1 ]; then
# Check if ${i} is in membership
check_in_mem ${i};
if [ $? -eq 0 ]; then
# found next owner
nextowner=${i};
next=0;
break;
fi
fi
done
fi
if [ "X${nextowner}" = "X" ]; then
# wrap round the afd list.
for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do
if [ "X${i}" = "X${owner}" ]; then
# Search for next owner complete
break;
fi
# Previous loop should have found new owner
if [ "X${owner}" = "XNO ONE" ]; then
break;
fi
if [ ${next} -eq 1 ]; then
check_in_mem ${i};
if [ $? -eq 0 ]; then
# found next owner
nextowner=${i};
next=0;
break;
fi
fi
done
fi
if [ "X${nextowner}" = "X" ]; then
${LOG} -l 1 "ERROR: ${FILE}: Policy script failed"
${LOG} -l 1 "ERROR: ${FILE}: " `/bin/cat ${input}`
${LOG} -l 1 "ERROR: ${FILE}: Could not find new owner"
exit 1;
fi
# nextowner is the new owner
print=0;
rm -f ${output};
# Print the new afd to the output file
echo -n "${nextowner} " > ${output};
for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8;
do
if [ "X${nextowner}" = "X${i}" ]; then
print=1;
elif [ ${print} -eq 1 ]; then
echo -n "${i} " >> ${output}
fi
done
print=1;
for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do
if [ "X${nextowner}" = "X${i}" ]; then
print=0;
elif [ ${print} -eq 1 ]; then
echo -n "${i} " >> ${output}
fi
done
echo >> ${output};
exit 0;Creating a New Failover ScriptIf the ordered or round-robin
scripts do not meet your needs, you can create a new failover script and
place it in the /usr/lib/failsafe/policies directory.
You can then configure the cluster configuration database to use your
new failover script for the required resource groups.Failover Script InterfaceThe following is passed to the failover script:failover policyfailover script interface
failover
scriptinterfacefunction(version,
name, owner, attributes
, possibleowners,
domain)versionLinux FailSafe version. The Linux FailSafe release uses version
number 1. nameName of the failover script (used for error validations and logging
purposes).ownerLogical name of the node that has the
resource group allocated.attributesFailover attributes (Auto_Failback or
Controlled_Failback must be included)possibleownersList of possible owners for the resource group. This list can be
subset of the current node membership.domainOrdered list of nodes used at the last failover. (At the first failover,
the initial failover domain is used.)The failover script returns the newly generated run-time failover
domain to Linux FailSafe, which then chooses the node on which the resource
group should be allocated by applying the failover attributes and node
membership to the run-time failover domain.Example Failover Policies for Linux FailSafeThere are two general types of configuration, each of which can
have from 2 through 8 nodes:N nodes that can potentially
failover their applications to any of the other nodes in the cluster.
N primary nodes that can failover
to M backup nodes. For example, you could have
3 primary nodes and 1 backup node.This section shows examples of failover policies for the following
types of configuration, each of which can have from 2 through 8 nodes:
N primary nodes and one backup
node (N+1)N primary nodes and two backup
nodes (N+2)N primary nodes and
M backup nodes (N+M)The diagrams in the following sections illustrate the configuration
concepts discussed here, but they do not address all required or supported
elements, such as reset hubs. For configuration details, see the
Linux FailSafe Installation and Maintenance Instructions.
N+1 Configuration for Linux FailSafe shows a specific instance of an
N+1 configuration in which there are three primary nodes
and one backup node. (This is also known as a star configuration
.) The disks shown could each be disk farms.failover policyexamples
N+1configurations
N+1N+1 Configuration
ConceptYou could configure the following failover policies for load balancing:
Failover policy for RG1:Initial failover domain = A, DFailover attribute = Auto_FailbackFailover script = orderedFailover policy for RG2:Initial failover domain = B, DFailover attribute = Auto_FailbackFailover script = orderedFailover policy for RG3:Initial failover domain = C, DFailover attribute = Auto_FailbackFailover script = orderedIf node A fails, RG1 will fail over to node D. As soon as node A
reboots, RG1 will be moved back to node A.If you change the failover attribute to Controlled_Failback
for RG1 and node A fails, RG1 will fail over to node D and
will remain running on node D even if node A reboots.N+2 Configuration shows a specific instance of an
N+2 configuration in which there are four
primary nodes and two backup nodes. The disks shown could each be disk
farms. failover policyexamplesN+2configurationsN+2N+2 Configuration
ConceptYou could configure the following failover policy for resource groups
RG7 and RG8:Failover policy for RG7:Initial failover domain = A, E, FFailover attribute = Controlled_FailbackFailover script = orderedFailover policy for RG8:Initial failover domain = B, F, EFailover attribute = Auto_FailbackFailover script = orderedIf node A fails, RG7 will fail over to node E. If node E also fails,
RG7 will fail over to node F. If A is rebooted, RG7 will remain on node
F.If node B fails, RG8 will fail over to node F. If B is rebooted,
RG8 will return to node B.N+M Configuration for Linux FailSafe shows a specific instance of an
N+M configuration in which there are four primary nodes
and each can serve as a backup node. The disk shown could be a disk farm. configurationsN+Mfailover policyexamplesN+MN+M
Configuration ConceptYou could configure the following failover policy for resource groups
RG5 and RG6:Failover policy for RG5:Initial failover domain = A, B, C, DFailover attribute = Controlled_FailbackFailover script = orderedFailover policy for RG6:Initial failover domain = C, A, DFailover attribute = Controlled_FailbackFailover script = orderedIf node C fails, RG6 will fail over to node A. When node C reboots,
RG6 will remain running on node A. If node A then fails, RG6 will return
to node C and RG5 will move to node B. If node B then fails, RG5 moves
to node C.