Creating a Failover Policy This chapter tells you how to create a failover policy. It describes the following topics: Contents of a Failover Policy A failover policy is the method by which a resource group is failed over from one node to another. A failover policy consists of the following:failover policycontents Failover domain Failover attributes Failover scripts Linux FailSafe uses the failover domain output from a failover script along with failover attributes to determine on which node a resource group should reside. The administrator must configure a failover policy for each resource group. The name of the failover policy must be unique within the pool. Failover Domain A failover domain is the ordered list of nodes on which a given resource group can be allocated. The nodes listed in the failover domain must be within the same cluster; however, the failover domain does not have to include every node in the cluster. The failover domain can also be used to statically load balance the resource groups in a cluster. Examples: In a four–node cluster, a set of two nodes that have access to a particular XLV volume may be the failover domain of the resource group containing that XLV volume. In a cluster of nodes named venus, mercury, and pluto, you could configure the following initial failover domains for resource groups RG1 and RG2:failover policy failover domain domain failover domain mercury, venus, pluto for RG1 pluto, mercury for RG2 The administrator defines the initial failover domaininitial failover domain when configuring a failover policy. The initial failover domain is used when a cluster is first booted. The ordered list specified by the initial failover domain is transformed into a run-time failover domainrun-time failover domain by the failover script. With each failure, the failover script takes the current run-time failover domain and potentially modifies it; the initial failover domain is never used again. Depending on the run-time conditions and contents of the failover script, the initial and run-time failover domains may be identical. For example, suppose the initial failover domain is: N1 N2 N3 The runtime failover domain will vary based on the failover script: If ordered: N1 N2 N3 If round-robin: N2 N3 N1 If a customized failover script, the order could be any permutation, based on the contents of the script: N1 N2 N3 N1 N3 N2 N2 N3 N1 N2 N1 N3 N3 N2 N1 N3 N1 N2 Linux FailSafe stores the run-time failover domain and uses it as input to the next failover script invocation. Failover Attributes A failover attribute is a value that is passed to the failover scrip and used by Linux FailSafe for the purpose of modifying the run-time failover domain for a specific resource group. There are required and optional failover attributes, and you can also specify your own strings as attributes.failover policyfailover attributes attributes failover attributes shows the required failover attributes. You must specify one and only one of these attributes. Note that the starting conditions for the attributes differs: for the required attributes, the starting condition is that a node joins the cluster membership when the cluster is already providing HA services; for the optional attributes, the starting condition is that HA services are started and the resource group is running in only one node in the cluster Required Failover Attributes (mutually exclusive) NameDescription Auto_Failback failover attributeAuto_Failback Specifies that the resource group is made online based on the failover policy when a node joins the cluster. This attribute is best used when some type of load balancing is required. You must specify either this attribute or the Controlled_Failback attribute. Controlled_Failback failover attribute Controlled_Failback Specifies that the resource group remains on the same node when a node joins the cluster. This attribute is best used when client/server applications have expensive recovery mechanisms, such as databases or any application that uses tcp to communicate. You must specify either this attribute or the Auto_Failback attribute.
When defining a failover policy, you can optionally also choose one and only one of the recovery attributes shown in . The recovery attribute determines the node on which a resource group will be allocated when its state changes to online and a member of the group is already allocated (such as when volumes are present). Optional Failover Attributes (mutually exclusive) NameDescription Auto_Recovery failover attribute Auto_Recovery Specifies that the resource group is made online based on the failover policy even when an exclusivity check shows that the resource group is running on a node. This attribute is optional and is mutually exclusive with the Inplace_Recovery attribute. If you specify neither of these attributes, Linux FailSafe will use this attribute by default if you have specified the Auto_Failback attribute. InPlace_Recovery failover attribute InPlace_Recovery Specifies that the resource group is made online on the same node where the resource group is running. This attribute is the default and is mutually exclusive with the Auto_Recovery attribute. If you specify neither of these attributes, Linux FailSafe will use this attribute by default if you have specified the Controlled_Failback attribute.
Failover Scripts failover policy failover script failover scriptdescriptionA failover script generates the run-time failover domain and returns it to the Linux FailSafe process. The Linux FailSafe process applies the failover attributes and then selects the first node in the returned failover domain that is also in the current node membership. The run-time of the failover script must be capped to a system-definable maximum. Hence, any external calls must be guaranteed to return quickly. If the failover script takes too long to return, Linux FailSafe will kill the script process and use the previous run-time failover domain. Failover scripts are stored in the /usr/lib/failsafe/policies directory. /usr/lib/failsafe/policies directory The <filename>ordered</filename> Failover Script Theordered failover script ordered failover script is provided with the release. The ordered script never changes the initial domain; when using this script, the initial and run-time domains are equivalent. The script reads six lines from the input file and in case of errors logs the input parameters and/or the error to the script log. The following example shows the contents of the ordered failover script. (Line breaks added for readability.) #!/bin/sh # # Copyright (c) 2000 Silicon Graphics, Inc. All Rights Reserved. # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it would be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # Further, this software is distributed without any warranty that it is # free of the rightful claim of any third person regarding infringement # or the like. Any license provided herein, whether implied or # otherwise, applies only to this software file. Patent licenses, if # any, provided herein do not apply to combinations of this program with # other software, or any other product whatsoever. # # You should have received a copy of the GNU General Public License # along with this program; if not, write the Free Software Foundation, # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. # # Contact information: Silicon Graphics, Inc., 1600 Amphitheatre Pkwy, # Mountain View, CA 94043, or: # # http://www.sgi.com # # For further information regarding this notice, see: # # http://oss.sgi.com/projects/GenInfo/NoticeExplan # # $1 - input file # $2 - output file # # line 1 input file - version # line 2 input file - name # line 3 input file - owner field # line 4 input file - attributes # line 5 input file - list of possible owners # line 6 input file - application failover domain DIR=/usr/lib/failsafe/bin LOG="${DIR}/ha_cilog -g ha_script -s script" FILE=/usr/lib/failsafe/policies/ordered input=$1 output=$2 { read version read name read owner read attr read mem1 mem2 mem3 mem4 mem5 mem6 mem7 mem8 read afd1 afd2 afd3 afd4 afd5 afd6 afd7 afd8 } < ${input} ${LOG} -l 1 "${FILE}:" `/bin/cat ${input}` if [ "${version}" -ne 1 ] ; then ${LOG} -l 1 "ERROR: ${FILE}: Different version no. Should be (1) rather than (${version})" ; exit 1; elif [ -z "${name}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: Failover script not defined"; exit 1; elif [ -z "${attr}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: Attributes not defined"; exit 1; elif [ -z "${mem1}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: No node membership defined"; exit 1; elif [ -z "${afd1}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: No failover domain defined"; exit 1; fi found=0 for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do for j in $mem1 $mem2 $mem3 $mem4 $mem5 $mem6 $mem7 $mem8; do if [ "X${j}" = "X${i}" ]; then found=1; break; fi done done if [ ${found} -eq 0 ]; then mem="("$mem1")"" ""("$mem2")"" ""("$mem3")"" ""("$mem4")"" \ ""("$mem5")"" ""("$mem6")"" ""("$mem7")"" ""("$mem8")"; afd="("$afd1")"" ""("$afd2")"" ""("$afd3")"" ""("$afd4")"" \ ""("$afd5")"" ""("$afd6")"" ""("$afd7")"" ""("$afd8")"; ${LOG} -l 1 "ERROR: ${FILE}: Policy script failed" ${LOG} -l 1 "ERROR: ${FILE}: " `/bin/cat ${input}` ${LOG} -l 1 "ERROR: ${FILE}: Nodes defined in membership do not match \ the ones in failure domain" ${LOG} -l 1 "ERROR: ${FILE}: Parameters read from input file: \ version = $version, name = $name, owner = $owner, attribute = $attr, \ nodes = $mem, afd = $afd" exit 1; fi if [ ${found} -eq 1 ]; then rm -f ${output} echo $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8 > ${output} exit 0 fi exit 1 The <filename>round-robin </filename>Failover Script The round-robin script selects the resource group owner in a round-robin (circular) fashion. This policy can be used for resource groups that can be run in any node in the cluster. The following example shows the contents of the round-robin failover script. #!/bin/sh # # Copyright (c) 2000 Silicon Graphics, Inc. All Rights Reserved. # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it would be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # Further, this software is distributed without any warranty that it is # free of the rightful claim of any third person regarding infringement # or the like. Any license provided herein, whether implied or # otherwise, applies only to this software file. Patent licenses, if # any, provided herein do not apply to combinations of this program with # other software, or any other product whatsoever. # # You should have received a copy of the GNU General Public License # along with this program; if not, write the Free Software Foundation, # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. # # Contact information: Silicon Graphics, Inc., 1600 Amphitheatre Pkwy, # Mountain View, CA 94043, or: # # http://www.sgi.com # # For further information regarding this notice, see: # # http://oss.sgi.com/projects/GenInfo/NoticeExplan # # $1 - input file # $2 - output file # # line 1 input file - version # line 2 input file - name # line 3 input file - owner field # line 4 input file - attributes # line 5 input file - Possible list of owners # line 6 input file - application failover domain DIR=/usr/lib/failsafe/bin LOG="${DIR}/ha_cilog -g ha_script -s script" FILE=/usr/lib/failsafe/policies/round-robin # Read input file input=$1 output=$2 { read version read name read owner read attr read mem1 mem2 mem3 mem4 mem5 mem6 mem7 mem8 read afd1 afd2 afd3 afd4 afd5 afd6 afd7 afd8 } < ${input} # Validate input file ${LOG} -l 1 "${FILE}:" `/bin/cat ${input}` if [ "${version}" -ne 1 ] ; then ${LOG} -l 1 "ERROR: ${FILE}: Different version no. Should be (1) \ rather than (${version})" ; exit 1; elif [ -z "${name}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: Failover script not defined"; exit 1; elif [ -z "${attr}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: Attributes not defined"; exit 1; elif [ -z "${mem1}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: No node membership defined"; exit 1; elif [ -z "${afd1}" ]; then ${LOG} -l 1 "ERROR: ${FILE}: No failover domain defined"; exit 1; fi # Return 0 if $1 is in the membership and return 1 otherwise. check_in_mem() { for j in $mem1 $mem2 $mem3 $mem4 $mem5 $mem6 $mem7 $mem8; do if [ "X${j}" = "X$1" ]; then return 0; fi done return 1; } # Check if owner has to be changed. There is no need to change owner if # owner node is in the possible list of owners. check_in_mem ${owner} if [ $? -eq 0 ]; then nextowner=${owner}; fi # Search for the next owner if [ "X${nextowner}" = "X" ]; then next=0; for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do if [ "X${i}" = "X${owner}" ]; then next=1; continue; fi if [ "X${owner}" = "XNO ONE" ]; then next=1; fi if [ ${next} -eq 1 ]; then # Check if ${i} is in membership check_in_mem ${i}; if [ $? -eq 0 ]; then # found next owner nextowner=${i}; next=0; break; fi fi done fi if [ "X${nextowner}" = "X" ]; then # wrap round the afd list. for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do if [ "X${i}" = "X${owner}" ]; then # Search for next owner complete break; fi # Previous loop should have found new owner if [ "X${owner}" = "XNO ONE" ]; then break; fi if [ ${next} -eq 1 ]; then check_in_mem ${i}; if [ $? -eq 0 ]; then # found next owner nextowner=${i}; next=0; break; fi fi done fi if [ "X${nextowner}" = "X" ]; then ${LOG} -l 1 "ERROR: ${FILE}: Policy script failed" ${LOG} -l 1 "ERROR: ${FILE}: " `/bin/cat ${input}` ${LOG} -l 1 "ERROR: ${FILE}: Could not find new owner" exit 1; fi # nextowner is the new owner print=0; rm -f ${output}; # Print the new afd to the output file echo -n "${nextowner} " > ${output}; for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do if [ "X${nextowner}" = "X${i}" ]; then print=1; elif [ ${print} -eq 1 ]; then echo -n "${i} " >> ${output} fi done print=1; for i in $afd1 $afd2 $afd3 $afd4 $afd5 $afd6 $afd7 $afd8; do if [ "X${nextowner}" = "X${i}" ]; then print=0; elif [ ${print} -eq 1 ]; then echo -n "${i} " >> ${output} fi done echo >> ${output}; exit 0; Creating a New Failover Script If the ordered or round-robin scripts do not meet your needs, you can create a new failover script and place it in the /usr/lib/failsafe/policies directory. You can then configure the cluster configuration database to use your new failover script for the required resource groups.
Failover Script Interface The following is passed to the failover script: failover policyfailover script interface failover scriptinterface function(version, name, owner, attributes , possibleowners, domain) version Linux FailSafe version. The Linux FailSafe release uses version number 1. name Name of the failover script (used for error validations and logging purposes). owner Logical name of the node that has the resource group allocated. attributes Failover attributes (Auto_Failback or Controlled_Failback must be included) possibleowners List of possible owners for the resource group. This list can be subset of the current node membership. domain Ordered list of nodes used at the last failover. (At the first failover, the initial failover domain is used.) The failover script returns the newly generated run-time failover domain to Linux FailSafe, which then chooses the node on which the resource group should be allocated by applying the failover attributes and node membership to the run-time failover domain. Example Failover Policies for Linux FailSafe There are two general types of configuration, each of which can have from 2 through 8 nodes: N nodes that can potentially failover their applications to any of the other nodes in the cluster. N primary nodes that can failover to M backup nodes. For example, you could have 3 primary nodes and 1  backup node. This section shows examples of failover policies for the following types of configuration, each of which can have from 2 through 8 nodes: N primary nodes and one backup node (N+1) N primary nodes and two backup nodes (N+2) N primary nodes and M backup nodes (N+M) The diagrams in the following sections illustrate the configuration concepts discussed here, but they do not address all required or supported elements, such as reset hubs. For configuration details, see the Linux FailSafe Installation and Maintenance Instructions. N+1 Configuration for Linux FailSafe shows a specific instance of an N+1 configuration in which there are three primary nodes and one backup node. (This is also known as a star configuration .) The disks shown could each be disk farms. failover policyexamples N+1 configurations N+1
<replaceable>N+</replaceable>1 Configuration Concept
You could configure the following failover policies for load balancing: Failover policy for RG1: Initial failover domain = A, D Failover attribute = Auto_Failback Failover script = ordered Failover policy for RG2: Initial failover domain = B, D Failover attribute = Auto_Failback Failover script = ordered Failover policy for RG3: Initial failover domain = C, D Failover attribute = Auto_Failback Failover script = ordered If node A fails, RG1 will fail over to node D. As soon as node A reboots, RG1 will be moved back to node A. If you change the failover attribute to Controlled_Failback for RG1 and node A fails, RG1 will fail over to node D and will remain running on node D even if node A reboots.
N+2 Configuration shows a specific instance of an N+2 configuration in which there are four primary nodes and two backup nodes. The disks shown could each be disk farms. failover policy examplesN+2 configurationsN+2
<replaceable>N+</replaceable>2 Configuration Concept
You could configure the following failover policy for resource groups RG7 and RG8: Failover policy for RG7: Initial failover domain = A, E, F Failover attribute = Controlled_Failback Failover script = ordered Failover policy for RG8: Initial failover domain = B, F, E Failover attribute = Auto_Failback Failover script = ordered If node A fails, RG7 will fail over to node E. If node E also fails, RG7 will fail over to node F. If A is rebooted, RG7 will remain on node F. If node B fails, RG8 will fail over to node F. If B is rebooted, RG8 will return to node B.
N+M Configuration for Linux FailSafe shows a specific instance of an N+M configuration in which there are four primary nodes and each can serve as a backup node. The disk shown could be a disk farm. configurationsN+M failover policy examplesN+M
<replaceable>N</replaceable>+<replaceable>M </replaceable> Configuration Concept
You could configure the following failover policy for resource groups RG5 and RG6: Failover policy for RG5: Initial failover domain = A, B, C, D Failover attribute = Controlled_Failback Failover script = ordered Failover policy for RG6: Initial failover domain = C, A, D Failover attribute = Controlled_Failback Failover script = ordered If node C fails, RG6 will fail over to node A. When node C reboots, RG6 will remain running on node A. If node A then fails, RG6 will return to node C and RG5 will move to node B. If node B then fails, RG5 moves to node C.