Linux FailSafe Cluster Configuration
This chapter describes administrative tasks you perform to configure
the components of a Linux FailSafe system. It describes how to perform tasks
using the FailSafe Cluster Manager Graphical User Interface (GUI) and the
FailSafe Cluster Manager Command Line Interface (CLI). The major sections
in this chapter are as follows:
Setting Configuration Defaults
defaults system configuration defaultsBefore
you configure the components of a FailSafe system, you can set default values
for some of the components that Linux FailSafe will use when defining the
components.
Default cluster
Certain cluster manager commands require you to specify a cluster. You
can specify a default cluster to use as the default if you do not specify
a cluster explicitly.
Default node
Certain cluster manager commands require you to specify a node. With
the Cluster Manager CLI, you can specify a default node to use as the default
if you do not specify a node explicitly.
Default resource type
Certain cluster manager commands require you to specify a resource type.
With the Cluster Manager CLI, you can specify a default resource type to use
as the default if you do not specify a resource type explicitly.
Setting Default Cluster with the Cluster Manager GUI
The GUI prompts you to enter the name of the default cluster when you
have not specified one. Alternately, you can set the default cluster by clicking
the “Select Cluster...” button at the bottom of the FailSafe Manager
window.
When using the GUI, there is no need to set a default node or resource
type.
Setting and Viewing Configuration Defaults with the Cluster Manager
CLI
When you are using the Cluster Manager CLI, you can use the following
commands to specify default values. The default values are in effect only
for the current session of the Cluster Manager CLI.
Use the following command to specify a default cluster:
cmgr> set cluster A
Use the following command to specify a default node:
cmgr> set node A
Use the following command to specify a default resource type:
cmgr> set resource_type A
You can view the current default configuration values of the Cluster
Manager CLI with the following command:
cmgr> show set defaults
Name Restrictions
name restrictionsWhen
you specify the names of the various components of a Linux FailSafe system,
the name cannot begin with an underscore (_) or include any whitespace characters.
In addition, the name of any Linux FailSafe component cannot contain a space,
an unprintable character, or a *, ?, \, or #.
The following is the list of permitted characters for the name of a
Linux FailSafe component:
alphanumeric characters
/
.
- (hyphen)
_ (underscore)
:
“
=
@
'
These character restrictions hold true whether you are configuring your
system with the Cluster Manager GUI or the Cluster Manager CLI.
Configuring Timeout Values and Monitoring Intervals
timeout values
monitoring intervalWhen you configure the components
of a Linux FailSafe system, you configure various timeout values and monitoring
intervals that determine the application downtown of a highly-available system
when there is a failure. To determine reasonable values to set for your system,
consider the following equation:
application downtime =
failure detection + time to handle failure +
failure recovery
Failure detection depends on the type of failure that is detected:
When a node goes down, there will be a node failure detection
after the node timeout; this is an HA parameter that you can modify. All failures
that translate into a node failure (such as heartbeat failure and OS failure)
fall into this failure category. Node timeout has a default value of 15 seconds.
For information on modifying the node timeout value, see .
When there is a resource failure, there is a monitor failure
of a resource. The amount of time this will take is determined by the following:
The monitoring interval for the resource type
The monitor timeout for the resource type
The number of restarts defined for the resource type, if the
restart mode is configured on
For information on setting values for a resource type, see .
Reducing these values will result in a shorter failover time, but reducing
these values could lead to significant increase in the Linux FailSafe overhead
on the system performance and could also lead to false failovers.
The time to handle a failure is something that the user cannot
control. In general, this should take a few seconds.
The failure recovery time is determined by the total time it takes for
Linux FailSafe to perform the following:
Execute the failover policy script (approximately five seconds).
Run the stop action script for all resources in the resource
group. This is not required for node failure; the failing node will be reset.
Run the start action script for all resources in the resource
group
Cluster Configuration
To set up a Linux FailSafe system, you configure the cluster that will
support the highly available services. This requires the following steps:
Defining the local host
Defining any additional nodes that are eligible to be included
in the cluster
Defining the cluster
The following subsections describe these tasks.
Defining Cluster Nodes
cluster node
See node
nodedefinition
nodecreationA
cluster node is a single Linux image. Usually, a cluster node
is an individual computer. The term node is also used
in this guide for brevity.
The pool is the entire set of nodes available
for clustering.
The first node you define must be the local host, which is the host
you have logged into to perform cluster administration.
When you are defining multiple nodes, it is advisable to wait for a
minute or so between each node definition. When nodes are added to the configuration
database, the contents of the configuration database are also copied to the
node being added. The node definition operation is completed when the new
node configuration is added to the database, at which point the database configuration
is synchronized. If you define two nodes one after another, the second operation
might fail because the first database synchronization is not complete.
To add a logical node definition to the pool of nodes that are eligible
to be included in a cluster, you must provide the following information about
the node:
Logical name: This name can contain letters and numbers but
not spaces or pound signs. The name must be composed of no more than 255 characters.
Any legal hostname is also a legal node name. For example, for a node whose
hostname is “venus.eng.company.com” you can use a node name of “venus”, “node1”,
or whatever is most convenient.
hostname
Hostname: The fully qualified name of the host, such as “server1.company.com”.
Hostnames cannot begin with an underscore, include any whitespace, or be longer
than 255 characters. This hostname should be the same as the output of the
hostname command on the node you are defining. The IP address associated with
this hostname should not be the same as any IP address you define as highly
available when you define a Linux FailSafe IP address resource. Linux FailSafe
will not accept an IP address (such as “192.0.2.22”) for this
input.
Node ID: This number must be unique for each node in the pool
and be in the range 1 through 32767.
system controller
defining for nodeSystem controller information.
If the node has a system controller and you want Linux FailSafe to use the
controller to reset the node, you must provide the following information about
the system controller:
Type of system controller: chalL,
msc, mmsc
System controller port password (optional)
Administrative status, which you can set to determine whether
Linux FailSafe can use the port: enabled,
disabled
Logical node name of system controller owner (i.e. the system
that is physically attached to the system controller)
Device name of port on owner node that is attached to the
system controller
Type of owner device: tty
control network
defining for node
IP addresscontrol networkhostnamecontrol network
A list of control networks, which are the networks used for heartbeats,
reset messages, and other Linux FailSafe messages. For each network, provide
the following:
Hostname or IP address. This address must not be the same
as any IP address you define as highly available when you define a Linux FailSafe
IP address resource, and it must be resolved in the /etc/hosts
file.
heartbeat network
Flags (hb for heartbeats, ctrl
for control messages, priority). At least
two control networks must use heartbeats, and at least one must use control
messages.
Linux FailSafe requires multiple heartbeat networks. Usually a node
sends heartbeat messages to another node on only one network at a time. However,
there are times when a node might send heartbeat messages to another node
on multiple networks simultaneously. This happens when the sender node does
not know which networks are up and which others are down. This is a transient
state and eventually the heartbeat network converges towards the highest priority
network that is up.
Note that at any time different pairs of nodes might be using different
networks for heartbeats.
Although all nodes in the Linux FailSafe cluster should have two control
networks, it is possible to define a node to add to the pool with one control
network.
Defining a Node with the Cluster Manager GUI
To define a node with the Cluster Manager GUI, perform the following
steps:
Launch the FailSafe Manager.
On the left side of the display, click on the “Nodes
& Cluster” category.
On the right side of the display click on the “Define
a Node” task link to launch the task.
Enter the selected inputs on this screen. Click on “Next”
at the bottom of the screen and continue inputing information on the second
screen.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Defining a Node with the Cluster Manager CLI
Use the following command to add a logical node definition:
cmgr> define node A
Entering this command specifies the name of the node you are defining
and puts you in a mode that enables you to define the parameters of the node.
These parameters correspond to the items defined in .
The following prompts appear:
Enter commands, when finished enter either "done" or "cancel"
A?
When this prompt of the node name appears, you enter the node parameters
in the following format:
set hostname to B
set nodeid to C
set sysctrl_type to D
set sysctrl_password to E
set sysctrl_status to F
set sysctrl_owner to G
set sysctrl_device to H
set sysctrl_owner_type to I
add nic J
You use the add nic J
command to define the network interfaces. You use this command for each network
interface to define. When you enter this command, the following prompt appears:
Enter network interface commands, when finished enter "done" or "cancel"
NIC - J?
When this prompt appears, you use the following commands to specify
the flags for the control network:
set heartbeat to K
set ctrl_msgs to L
set priority to M
After you have defined a network controller, you can use the following
command from the node name prompt to remove it:
cmgr> remove nic N
When you have finished defining a node, enter done.
The following example defines a node called cm1a, with one controller:
cmgr> define node cm1a
Enter commands, when finished enter either "done" or "cancel"
cm1a? set hostname to cm1a
cm1a? set nodeid to 1
cm1a? set sysctrl_type to msc
cm1a? set sysctrl_password to [ ]
cm1a? set sysctrl_status to enabled
cm1a? set sysctrl_owner to cm2
cm1a? set sysctrl_device to /dev/ttyd2
cm1a? set sysctrl_owner_type to tty
cm1a? add nic cm1
Enter network interface commands, when finished enter “done”
or “cancel”
NIC - cm1 > set heartbeat to true
NIC - cm1 > set ctrl_msgs to true
NIC - cm1 > set priority to 0
NIC - cm1 > done
cm1a? done
cmgr>
If you have invoked the Cluster Manager CLI with the -p
option,or you entered the set prompting on command, the
display appears as in the following example:
cmgr> define node cm1a
Enter commands, when finished enter either "done" or "cancel"
Nodename [optional]? cm1a
Node ID? 1
Do you wish to define system controller info[y/n]:y
Sysctrl Type <null>? (null)
Sysctrl Password[optional]? ( )
Sysctrl Status <enabled|disabled>? enabled
Sysctrl Owner? cm2
Sysctrl Device? /dev/ttyd2
Sysctrl Owner Type <tty>? (tty)
Number of Network Interfaces ? (1)
NIC 1 - IP Address? cm1
NIC 1 - Heartbeat HB (use network for heartbeats) <true|false>?
true
NIC 1 - Priority <1,2,...>? 0
NIC 2 - IP Address? cm2
NIC 2 - Heartbeat HB (use network for heartbeats) <true|false>?
true
NIC 2 - (use network for control messages) <true|false>? false
NIC 2 - Priority <1,2,...>? 1
Modifying and Deleting Cluster Nodes
nodedeleting
node
modifyingAfter you have defined a cluster
node, you can modify or delete the cluster with the Cluster Manager GUI or
the Cluster Manager CLI. You must remove a node from a cluster before you
can delete the node.
Modifying a Node with the Cluster Manager GUI
To modify a node with the Cluster Manager GUI, perform the following
steps:
Launch the FailSafe Manager.
On the left side of the display, click on the “Nodes
& Cluster” category.
On the right side of the display click on the “Modify
a Node Definition” task link to launch the task.
Modify the node parameters.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Modifying a Node with the Cluster Manager CLI
You can use the following command to modify an existing node. After
entering this command, you can execute any of the commands you use to define
a node.
cmgr> modify node A
Deleting a Node with the Cluster Manager GUI
To delete a node with the Cluster Manager GUI, perform the following
steps:
Launch the FailSafe Manager.
On the left side of the display, click on the “Nodes
& Cluster” category.
On the right side of the display click on the “Delete
a Node” task link to launch the task.
Enter the name of the node to delete.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Deleting a Node with the Cluster Manager CLI
After defining a node, you can delete it with the following command:
cmgr> delete node A
You can delete a node only if the node is not currently part of a cluster.
This means that first you must modify a cluster that contains the node so
that it no longer contains that node before you can delete it.
Displaying Cluster Nodes
nodedisplaying
After you define cluster nodes, you can perform the
following display tasks:
display the attributes of a node
display the nodes that are members of a specific cluster
display all the nodes that have been defined
You can perform any of these tasks with the FailSafe Cluster Manager
GUI or the Linux FailSafe Cluster Manager CLI.
Displaying Nodes with the Cluster Manager GUI
The Cluster Manager GUI provides a convenient graphic display of the
defined nodes of a cluster and the attributes of those nodes through the
FailSafe Cluster View. You can launch the FailSafe Cluster View directly,
or you can bring it up at any time by clicking on “FailSafe Cluster
View” at the bottom of the “FailSafe Manager” display.
From the View menu of the FailSafe Cluster View, you can select “Nodes
in Pool” to view all nodes defined in the Linux FailSafe pool. You can
also select “Nodes In Cluster” to view all nodes that belong to
the default cluster. Click any node's name or icon to view detailed status
and configuration information about the node.
Displaying Nodes with the Cluster Manager CLI
After you have defined a node, you can display the node's parameters
with the following command:
cmgr> show node A
A show node command on node cm1a would yield the
following display:
cmgr> show node cm1
Logical Node Name: cm1
Hostname: cm1
Nodeid: 1
Reset type: reset
System Controller: msc
System Controller status: enabled
System Controller owner: cm2
System Controller owner device: /dev/ttyd2
System Controller owner type: tty
ControlNet Ipaddr: cm1
ControlNet HB: true
ControlNet Control: true
ControlNet Priority: 0
You can see a list of all of the nodes that have been defined with the
following command:
cmgr> show nodes in pool
You can see a list of all of the nodes that have defined for a specified
cluster with the following command:
cmgr> show nodes [in cluster
A]
If you have specified a default cluster, you do not need to specify
a cluster when you use this command and it will display the nodes defined
in the default cluster.
Linux FailSafe HA Parameters
There are several parameters that determine the behavior of the nodes
in a cluster of a Linux FailSafe system.
The Linux FailSafe parameters are as follows:
The tie-breaker node, which is the logical name of a machine
used to compute node membership in situations where 50% of the nodes in a
cluster can talk to each other. If you do not specify a tie-breaker node,
the node with the lowest node ID number is used.
The tie-breaker node is a cluster-wide parameter.
It is recommended that you configure a tie-breaker node even if there
is an odd number of nodes in the cluster, since one node may be deactivated,
leaving an even number of nodes to determine membership.
In a heterogeneous cluster, where the nodes are of different sizes and
capabilities, the largest node in the cluster with the most important application
or the maximum number of resource groups should be configured as the tie-breaker
node.
Node timeout, which is the timeout period, in milliseconds.
If no heartbeat is received from a node in this period of time, the node is
considered to be dead and is not considered part of the cluster membership.
The node timeout must be at least 5 seconds. In addition, the node timeout
must be at least 10 times the heartbeat interval for proper Linux FailSafe
operation; otherwise, false failovers may be triggered.
Node timeout is a cluster-wide parameter.
The interval, in milliseconds, between heartbeat messages.
This interval must be greater than 500 milliseconds and it must not be greater
than one-tenth the value of the node timeout period. This interval is set
to one second, by default. Heartbeat interval is a cluster-wide parameter.
The higher the number of heartbeats (smaller heartbeat interval), the
greater the potential for slowing down the network. Conversely, the fewer
the number of heartbeats (larger heartbeat interval), the greater the potential
for reducing availability of resources.
The node wait timenode
wait time, in milliseconds, which is the time a node
waits for other nodes to join the cluster before declaring a new cluster membership.
If the value is not set for the cluster, Linux FailSafe assumes the value
to be the node timeout times the number of nodes.
The powerfail mode, which indicates whether a special power
failure algorithm should be run when no response is received from a system
controller after a reset request. This can be set to ON
or OFF. Powerfail is a node-specific parameter, and should
be defined for the machine that performs the reset operation.
Resetting Linux FailSafe Parameters with the Cluster Manager GUI
To set Linux FailSafe parameters with the Cluster Manager GUI, perform
the following steps:
Launch the FailSafe Manager from a menu or the command line.
On the left side of the display, click on the “Nodes
& Cluster” category.
On the right side of the display click on the “Set Linux
FailSafe HA Parameters” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Resetting Linux FailSafe Parameters with the Cluster Manager CLI
You can modify the Linux FailSafe parameters with the following command:
cmgr> modify ha_parameters [on node
A] [in cluster
B]
If you have specified a default node or a default cluster, you do not
have to specify a node or a cluster in this command. Linux FailSafe will use
the default.
Enter commands, when finished enter either "done" or "cancel"
A?
When this prompt of the node name appears, you enter the Linux FailSafe
parameters you wish to modify in the following format:
set node_timeout to A
set heartbeat to B
set run_pwrfail to C
set tie_breaker to D
Defining a Cluster
A cluster is a collection of one or more nodes
coupled with each other by networks or other similar interconnects. In Linux
FailSafe, a cluster is identified by a simple name. A given node may be a
member of only one cluster.
To define a cluster, you must provide the following information:
The logical name of the cluster, with a maximum length of
255 characters.
The mode of operation: normal (the default)
or experimental. Experimental mode allows you to configure
a Linux FailSafe cluster in which resource groups do not fail over when a
node failure is detected. This mode can be useful when you are tuning node
timeouts or heartbeat values. When a cluster is configured in normal mode,
Linux FailSafe fails over resource groups when it detects failure in a node
or resource group.
(Optional) The email address to use to notify the system administrator
when problems occur in the cluster (for example, root@system)
(Optional) The email program to use to notify the system administrator
when problems occur in the cluster (for example, /usr/bin/mail).
Specifying the email program is optional and you can specify only the
notification address in order to receive notifications by mail. If an address
is not specified, notification will not be sent.
Adding Nodes to a Cluster
After you have added nodes to the pool and defined a cluster, you must
provide the names of the nodes to include in the cluster.
Defining a Cluster with the Cluster Manager GUI
To define a cluster with the Cluster Manager GUI, perform the following
steps:
Launch the Linux FailSafe Manager.
On the left side of the display, click on “Guided Configuration”.
On the right side of the display click on “Set Up a
New Cluster” to launch the task link.
In the resulting window, click each task link in turn, as
it becomes available. Enter the selected inputs for each task.
When finished, click “OK” to close the taskset
window.
Defining a Cluster with the Cluster Manager CLI
When you define a cluster with the CLI, you define and cluster and add
nodes to the cluster with the same command.
Use the following cluster manager CLI command to define a cluster:
cmgr> define cluster A
Entering this command specifies the name of the node you are defining
and puts you in a mode that allows you to add nodes to the cluster. The following
prompt appears:
cluster A?
When this prompt appears during cluster creation, you can specify nodes
to include in the cluster and you can specify an email address to direct messages
that originate in this cluster.
You specify nodes to include in the cluster with the following command:
cluster A? add node C
cluster A?
You can add as many nodes as you want to include in the cluster.
You specify an email program to use to direct messages with the following
command:
cluster A? set notify_cmd to B
cluster A?
You specify an email address to direct messages with the following command:
cluster A? set notify_addr to B
cluster A?
You specify a mode for the cluster (normal or experimental) with the
following command:
cluster A? set ha_mode to D
cluster A?
When you are finished defining the cluster, enter done
to return to the cmgr prompt.
Modifying and Deleting Clusters
After you have defined a cluster, you can modify the attributes of the
cluster or you can delete the cluster. You cannot delete a cluster that contains
nodes; you must move those nodes out of the cluster first.
Modifying and Deleting a Cluster with the Cluster Manager GUI
To modify a cluster with the Cluster Manager GUI, perform the following
procedure:
Launch the Linux FailSafe Manager.
On the left side of the display, click on the “Nodes
& Cluster” category.
On the right side of the display click on the “Modify
a Cluster Definition” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To delete a cluster with the Cluster Manager GUI, perform the following
procedure:
Launch the Linux FailSafe Manager.
On the left side of the display, click on the “Nodes
& Cluster” category.
On the right side of the display click on the “Delete
a Cluster” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Modifying and Deleting a Cluster with the Cluster Manager CLI
To modify an existing cluster, enter the following command:
cmgr> modify cluster A
Entering this command specifies the name of the cluster you are modifying
and puts you in a mode that allows you to modify the cluster. The following
prompt appears:
cluster A?
When this prompt appears, you can modify the cluster definition with
the following commands:
cluster A? set notify_addr to
B
cluster A? set notify_cmd to
B
cluster A? add node
C
cluster A? remove node
D
cluster A?
When you are finished modifying the cluster, enter done
to return to the cmgr prompt.
You can delete a defined cluster with the following command:
cmgr> delete cluster A
Displaying Clusters
You can display defined clusters with the Cluster Manager GUI or the
Cluster Manager CLI.
Displaying a Cluster with the Cluster Manager GUI
The Cluster Manager GUI provides a convenient display of a cluster and
its components through the FailSafe Cluster View. You can launch the FailSafe
Cluster View directly, or you can bring it up at any time by clicking on the “FailSafe
Cluster View” prompt at the bottom of the “FailSafe Manager”
display.
From the View menu of the FailSafe Cluster View, you can choose elements
within the cluster to examine. To view details of the cluster, click on the
cluster name or icon. Status and configuration information will appear in
a new window. To view this information within the FailSafe Cluster View window,
select Options. When you then click on the Show Details option, the status
details will appear in the right side of the window.
Displaying a Cluster with the Cluster Manager CLI
After you have defined a cluster, you can display the nodes in that
cluster with the following command:
cmgr> show cluster A
You can see a list of the clusters that have been defined with the following
command:
cmgr> show clusters
Resource Configuration
resourceconfiguration
overviewA resource is a single
physical or logical entity that provides a service to clients or other resources.
A resource is generally available for use on two or more nodes in a cluster,
although only one node controls the resource at any given time. For example,
a resource can be a single disk volume, a particular network address, or an
application such as a web node.
Defining Resources
resourcedefinition
overviewResources are identified by a resource name
and a resource type. A resource name identifies a specific
instance of a resource type. A resource type is a particular
class of resource. All of the resources in a given resource type can be handled
in the same way for the purposes of failover. Every resource is an instance
of exactly one resource type.
A resource type is identified with a simple name. A resource type can
be defined for a specific logical node, or it can be defined for an entire
cluster. A resource type that is defined for a node will override a clusterwide
resource type definition of the same name; this allows an individual node
to override global settings from a clusterwide resource type definition.
The Linux FailSafe software includes many predefined resource types.
If these types fit the application you want to make into a highly available
service, you can reuse them. If none fit, you can define additional resource
types.
To define a resource, you provide the following information:
The name of the resource to define, with a maximum length
of 255 characters.
The type of resource to define. The Linux FailSafe system
contains some pre-defined resource types (template and IP_Address
). You can define your own resource type as well.
The name of the cluster that contains the resource.
The logical name of the node that contains the resource (optional).
If you specify a node, a local version of the resource will be defined on
that node.
Resource type-specific attributes for the resource. Each resource
type may require specific parameters to define for the resource, as described
in the following subsections.
You can define up to 100 resources in a Linux FailSafe configuration.
IP Address Resource Attributes
resourceIP
address IP address
resourceThe IP Address resources
are the IP addresses used by clients to access the highly available services
within the resource group. These IP addresses are moved from one node to another
along with the other resources in the resource group when a failure is detected.
You specify the resource name of an IP address in dotted decimal notation.
IP names that require name resolution should not be used. For example, 192.26.50.1
is a valid resource name of the IP Address resource type.
The IP address you define as a Linux FailSafe resource must not be the
same as the IP address of a node hostname or the IP address of a node's control
network.
When you define an IP address, you can optionally specifying the following
parameters. If you specify any of these parameters, you must specify all of
them.
The broadcast address for the IP address.
The network mask of the IP address.
A comma-separated list of interfaces on which the IP address
can be configured. This ordered list is a superset of all the interfaces on
all nodes where this IP address might be allocated. Hence, in a mixed cluster
with different ethernet drivers, an IP address might be placed on eth0 on
one system and ln0 on a another. In this case the interfaces
field would be eth0,ln0 or ln0,eth0.
The order of the list of interfaces determines the priority order for
determining which IP address will be used for local restarts of the node.
Adding Dependency to a Resource
resourcedependencies
One resource can be dependent on one or more other
resources; if so, it will not be able to start (that is, be made available
for use) unless the dependent resources are started as well. Dependent resources
must be part of the same resource group.
Like resources, a resource type can be dependent on one or more other
resource types. If such a dependency exists, at least one instance of each
of the dependent resource types must be defined. For example, a resource type
named Netscape_web might have resource type dependencies
on a resource types named IP_address and volume
. If a resource named ws1 is defined with the
Netscape_web resource type, then the resource group containing
ws1 must also contain at least one resource of the type
IP_address and one resource of the type volume.
You cannot make resources mutually dependent. For example, if resource
A is dependent on resource B, then you cannot make resource B dependent on
resource A. In addition, you cannot define cyclic dependencies. For example,
if resource A is dependent on resource B, and resource B is dependent on resource
C, then resource C cannot be dependent on resource A.
When you add a dependency to a resource definition, you provide the
following information:
The name of the existing resource to which you are adding
a dependency.
The resource type of the existing resource to which you are
adding a dependency.
The name of the cluster that contains the resource.
Optionally, the logical node name of the node in the cluster
that contains the resource. If specified, resource dependencies are added
to the node's definition of the resource. If this is not specified, resource
dependencies are added to the cluster-wide resource definition.
The resource name of the resource dependency.
The resource type of the resource dependency.
Defining a Resource with the Cluster Manager GUI
To define a resource with the Cluster Manager GUI, perform the following
steps:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Define
a New Resource” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task.
On the right side of the display, click on the “Add/Remove
Dependencies for a Resource Definition” to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task.
When you use this command to define a resource, you define a cluster-wide
resource that is not specific to a node. For information on defining a node-specific
resource, see .
Defining a Resource with the Cluster Manager CLI
Use the following CLI command to define a clusterwide resource:
cmgr> define resource A [
of resource_type B] [
in cluster C]
Entering this command specifies the name and resource type of the resource
you are defining within a specified cluster. If you have specified a default
cluster or a default resource type, you do not need to specify a resource
type or a cluster in this command and the CLI will use the default.
When you use this command to define a resource, you define a clusterwide
resource that is not specific to a node. For information on defining a node-specific
resource, see .
The following prompt appears:
resource A?
When this prompt appears during resource creation, you can enter the
following commands to specify the attributes of the resource you are defining
and to add and remove dependencies from the resource:
resource A? set key
to value
resource A? add dependency E
of type F
resource A? remove dependency E
of type F
The attributes you define with the set
key to value
command will depend on the type of resource you are defining, as described
in .
For detailed information on how to determine the format for defining
resource attributes, see .
When you are finished defining the resource and its dependencies, enter
done to return to the cmgr prompt.
Specifying Resource Attributes with Cluster Manager
CLI
To see the format in which you can specify the user-specific attributes
that you need to set for a particular resource type, you can enter the following
command to see the full definition of that resource type:
cmgr> show resource_type A
in cluster B
For example, to see the key attributes you
define for a resource of a defined resource type IP_address,
you would enter the following command:
cmgr> show resource_type IP_address in cluster nfs-cluster
Name: IP_address
Predefined: true
Order: 401
Restart mode: 1
Restart count: 2
Action name: stop
Executable: /usr/lib/failsafe/resource_types/IP_address/stop
Maximum execution time: 80000ms
Monitoring interval: 0ms
Start monitoring time: 0ms
Action name: exclusive
Executable: /usr/lib/failsafe/resource_types/IP_address/exclusive
Maximum execution time: 100000ms
Monitoring interval: 0ms
Start monitoring time: 0ms
Action name: start
Executable: /usr/lib/failsafe/resource_types/IP_address/start
Maximum execution time: 80000ms
Monitoring interval: 0ms
Start monitoring time: 0ms
Action name: restart
Executable: /usr/lib/failsafe/resource_types/IP_address/restart
Maximum execution time: 80000ms
Monitoring interval: 0ms
Start monitoring time: 0ms
Action name: monitor
Executable: /usr/lib/failsafe/resource_types/IP_address/monitor
Maximum execution time: 40000ms
Monitoring interval: 20000ms
Start monitoring time: 50000ms
Type specific attribute: NetworkMask
Data type: string
Type specific attribute: interfaces
Data type: string
Type specific attribute: BroadcastAddress
Data type: string
No resource type dependencies
The display reflects the format in which you can specify the group id,
the device owner, and the device file permissions for the volume. In this
case, the devname-group key specifies the group id of
the device file, the devname_owner key specifies the
owner of the device file, and the devname_mode key specifies
the device file permissions.
For example, to set the group id to sys, enter
the following command:
resource A? set devname-group to sys
This remainder of this section summarizes the attributes you specify
for the predefined Linux FailSafe resource types with the set
key to value command of the Cluster Manger CLI.
IP address
resource resource
IP addressWhen you define an
IP address, you specify the following attributes:
NetworkMask
The subnet mask of the IP address
interfaces
A comma-separated list of interfaces on which the IP address can be
configured
BroadcastAddress
The broadcast address for the IP address
Defining a Node-Specific Resource
resourcenode-specific
node-specific
resourceYou can redefine an existing resource with a
resource definition that applies only to a particular node. Only existing
clusterwide resources can be redefined; resources already defined for a specific
cluster node cannot be redefined.
== REVIEWERS: NEW – NEEDS TO BE CORRECTED? == You
use this feature when you configure heterogeneous clusters for an
IP_address resource. For example, IP_address
192.26.50.2 can be configured on et0 on an SGI Challenge node and on eth0
on all other Linux servers. Do we support mixing IRIX and Linux nodes?
if not, then this reference needs to be changed to some other Linux interface
name.The clusterwide resource definition for 192.26.50.2 will have
the interfaces field set to eth0 and the node-specific
definition for the Challenge node will have et0 as the interfaces
field.
Defining a Node-Specific Resource with the Cluster Manager GUI
Using the Cluster Manager GUI, you can take an existing clusterwide
resource definition and redefine it for use on a specific node in the cluster:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Redefine
a Resource For a Specific Node” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task.
Defining a Node-Specific Resource with the Cluster
Manager CLI
You can use the Cluster Manager CLI to redefine a clusterwide resource
to be specific to a node just as you define a clusterwide resource, except
that you specify a node on the define resource command.
Use the following CLI command to define a node-specific resource:
cmgr> define resource A
of resource_type B
on node C [in cluster
D]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Modifying and Deleting Resources
resourcemodifying
resource
deletingAfter you have defined resources,
you can modify and delete them.
You can modify only the type-specific attributes for a resource. You
cannot rename a resource once it has been defined.
There are some resource attributes whose modification does not take
effect until the resource group containing that resource is brought online
again. For example, if you modify the export options of a resource of type
NFS, the modifications do not take effect immediately; they take effect when
the resource is brought online.
Modifying and Deleting Resources with the Cluster Manager GUI
To modify a resource with the Cluster Manager GUI, perform the following
procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Modify
a Resource Definition” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To delete a resource with the Cluster Manager GUI, perform the following
procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Delete
a Resource” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Modifying and Deleting Resources with the Cluster Manager CLI
Use the following CLI command to modify a resource:
cmgr> modify resource A
of resource_type B [
in cluster C]
Entering this command specifies the name and resource type of the resource
you are modifying within a specified cluster. If you have specified a default
cluster, you do not need to specify a cluster in this command and the CLI
will use the default.
You modify a resource using the same commands you use to define a resource.
You can use the following command to delete a resource definition:
cmgr> delete resource A
of resource_type B [
in cluster D]
Displaying Resources
resourcedisplaying
You can display resources in various ways. You can
display the attributes of a particular defined resource, you can display all
of the defined resources in a specified resource group, or you can display
all the defined resources of a specified resource type.
Displaying Resources with the Cluster Manager GUI
The Cluster Manager GUI provides a convenient display of resources through
the FailSafe Cluster View. You can launch the FailSafe Cluster View directly,
or you can bring it up at any time by clicking on the “FailSafe Cluster
View” button at the bottom of the “FailSafe Manager” display.
From the View menu of the FailSafe Cluster View, select Resources to
see all defined resources. The status of these resources will be shown in
the icon (green indicates online, grey indicates offline). Alternately, you
can select “Resources of Type” from the View menu to see resources
organized by resource type, or you can select “Resources by Group”
to see resources organized by resource group.
Displaying Resources with the Cluster Manager CLI
Use the following command to view the parameters of a defined resource:
cmgr> show resource A
of resource_type B
Use the following command to view all of the defined resources in a
resource group:
cmgr> show resources in resource_group
A [in cluster B]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Use the following command to view all of the defined resources of a
particular resource type in a specified cluster:
cmgr> show resources of resource_type
A [in cluster B]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Defining a Resource Type
resource type
definitionThe Linux FailSafe software includes many
predefined resource types. If these types fit the application you want to
make into a highly available service, you can reuse them. If none fits, you
can define additional resource types.
Complete information on defining resource types is provided in the
Linux FailSafe Programmer's Guide. This manual provides a summary
of that information.
To define a new resource type, you must have the following information:
Name of the resource type, with a maximum length of 255 characters.
Name of the cluster to which the resource type will apply.
Node on which the resource type will apply, if the resource
type is to be restricted to a specific node.
Order of performing the action scripts for resources of this
type in relation to resources of other types:
Resources are started in the increasing order of this value
Resources are stopped in the decreasing order of this value
See the Linux FailSafe Programmer's Guide for
a full description of the order ranges available.
Restart mode, which can be one of the following values:
0 = Do not restart on monitoring failures
1 = Restart a fixed number of times
Number of local restarts (when restart mode is 1).
Location of the executable script. This is always
/usr/lib/failsafe/resources_types/rtname,
where rtname is the resource type name.
Monitoring interval, which is the time period (in milliseconds)
between successive executions of the monitor action script;
this is only valid for the monitor action script.
Starting time for monitoring. When the resource group is made
in online in a cluster node, Linux FailSafe will start monitoring the resources
after the specified time period (in milliseconds).
Action scripts to be defined for this resource type, You must
specify scripts for start, stop,
exclusive, and monitor, although the
monitor script may contain only a return-success function if you
wish. If you specify 1 for the restart mode, you must specify a
restart script.
Type-specific attributes to be defined for this resource type.
The action scripts use this information to start, stop, and monitor a resource
of this resource type. For example, NFS requires the following resource keys:
export-point, which takes a value that
defines the export disk name. This name is used as input to the
exportfs command. For example:
export-point = /this_disk
export-info, which takes a value that
defines the export options for the filesystem. These options are used in the
exportfs command. For example:
export-info = rw,sync,no_root_squash
filesystem, which takes a value that
defines the raw filesystem. This name is used as input to the mount
) command. For example:
filesystem = /dev/sda1
To define a new resource type, you use the Cluster Manager GUI or the
Cluster Manager CLI.
Defining a Resource Type with the Cluster Manager GUI
To define a resource type with the Cluster Manager GUI, perform the
following steps:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Define
a Resource Type” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task.
Defining a Resource Type with the Cluster Manager CLI
The following steps show the use of cluster_mgr interactively
to define a resource type called test_rt.
Log in as root.
Execute the cluster_mgr command using the
-p option to prompt you for information (the command name can be
abbreviated to cmgr):
# /usr/lib/failsafe/bin/cluster_mgr -p
Welcome to Linux FailSafe Cluster Manager Command-Line Interface
cmgr>
Use the set subcommand to specify the default
cluster used for cluster_mgr operations. In this example,
we use a cluster named test:
cmgr> set cluster test
If you prefer, you can specify the cluster name as needed with each
subcommand.
Use the define resource_type subcommand.
By default, the resource type will apply across the cluster; if you wish to
limit the resource_type to a specific node, enter the node name when prompted.
If you wish to enable restart mode, enter 1 when prompted.
The following example only shows the prompts and answers for two
action scripts (start and stop) for
a new resource type named test_rt.
cmgr> define resource_type test_rt
(Enter "cancel" at any time to abort)
Node[optional]?
Order ? 300
Restart Mode ? (0)
DEFINE RESOURCE TYPE OPTIONS
0) Modify Action Script.
1) Add Action Script.
2) Remove Action Script.
3) Add Type Specific Attribute.
4) Remove Type Specific Attribute.
5) Add Dependency.
6) Remove Dependency.
7) Show Current Information.
8) Cancel. (Aborts command)
9) Done. (Exits and runs command)
Enter option:1
No current resource type actions
Action name ? start
Executable Time? 40000
Monitoring Interval? 0
Start Monitoring Time? 0
0) Modify Action Script.
1) Add Action Script.
2) Remove Action Script.
3) Add Type Specific Attribute.
4) Remove Type Specific Attribute.
5) Add Dependency.
6) Remove Dependency.
7) Show Current Information.
8) Cancel. (Aborts command)
9) Done. (Exits and runs command)
Enter option:1
Current resource type actions:
Action - 1: start
Action name stop
Executable Time? 40000
Monitoring Interval? 0
Start Monitoring Time? 0
0) Modify Action Script.
1) Add Action Script.
2) Remove Action Script.
3) Add Type Specific Attribute.
4) Remove Type Specific Attribute.
5) Add Dependency.
6) Remove Dependency.
7) Show Current Information.
8) Cancel. (Aborts command)
9) Done. (Exits and runs command)
Enter option:3
No current type specific attributes
Type Specific Attribute ? integer-att
Datatype ? integer
Default value[optional] ? 33
0) Modify Action Script.
1) Add Action Script.
2) Remove Action Script.
3) Add Type Specific Attribute.
4) Remove Type Specific Attribute.
5) Add Dependency.
6) Remove Dependency.
7) Show Current Information.
8) Cancel. (Aborts command)
9) Done. (Exits and runs command)
Enter option:3
Current type specific attributes:
Type Specific Attribute - 1: export-point
Type Specific Attribute ? string-att
Datatype ? string
Default value[optional] ? rw
0) Modify Action Script.
1) Add Action Script.
2) Remove Action Script.
3) Add Type Specific Attribute.
4) Remove Type Specific Attribute.
5) Add Dependency.
6) Remove Dependency.
7) Show Current Information.
8) Cancel. (Aborts command)
9) Done. (Exits and runs command)Enter option:5
No current resource type dependencies
Dependency name ? filesystem
0) Modify Action Script.
1) Add Action Script.
2) Remove Action Script.
3) Add Type Specific Attribute.
4) Remove Type Specific Attribute.
5) Add Dependency.
6) Remove Dependency.
7) Show Current Information.
8) Cancel. (Aborts command)
9) Done. (Exits and runs command)
Enter option:7
Current resource type actions:
Action - 1: start
Action - 2: stop
Current type specific attributes:
Type Specific Attribute - 1: integer-att
Type Specific Attribute - 2: string-att
No current resource type dependencies
Resource dependencies to be added:
Resource dependency - 1: filesystem
0) Modify Action Script.
1) Add Action Script.
2) Remove Action Script.
3) Add Type Specific Attribute.
4) Remove Type Specific Attribute.
5) Add Dependency.
6) Remove Dependency.
7) Show Current Information.
8) Cancel. (Aborts command)
9) Done. (Exits and runs command)
Enter option:9
Successfully created resource_type test_rt
cmgr> show resource_types
NFS
template
Netscape_web
test_rt
statd
Oracle_DB
MAC_address
IP_address
INFORMIX_DB
filesystem
volume
cmgr> exit
#
Defining a Node-Specific Resource Type
resource type
node-specific
node-specific resource typeYou can redefine an existing
resource type with a resource definition that applies only to a particular
node. Only existing clusterwide resource types can be redefined; resource
types already defined for a specific cluster node cannot be redefined.
A resource type that is defined for a node overrides a cluster-wide
resource type definition with the same name; this allows an individual node
to override global settings from a clusterwide resource type definition. You
can use this feature if you want to have different script timeouts for a node
or you want to restart a resource on only one node in the cluster.
For example, the IP_address resource has local restart
enabled by default. If you would like to have an IP address type without local
restart for a particular node, you can make a copy of the IP_address
clusterwide resource type with all of the parameters the same except
for restart mode, which you set to 0.
Defining a Node-Specific Resource Type with the Cluster Manager GUI
Using the Cluster Manager GUI, you can take an existing clusterwide
resource type definition and redefine it for use on a specific node in the
cluster. Perform the following tasks:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Redefine
a Resource Type For a Specific Node” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task.
Defining a Node-Specific Resource Type with the Cluster Manager CLI
With the Cluster Manager CLI, you redefine a node-specific resource
type just as you define a cluster-wide resource type, except that you specify
a node on the define resource_type command.
Use the following CLI command to define a node-specific resource type:
cmgr> define resource_type
A on node B [
in cluster C]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Adding Dependencies to a Resource Type
resource type
dependenciesLike resources, a resource type can be
dependent on one or more other resource types. If such a dependency exists,
at least one instance of each of the dependent resource types must be defined.
For example, a resource type named Netscape_web might have
resource type dependencies on a resource type named IP_address
and volume. If a resource named ws1
is defined with the Netscape_web resource type, then the
resource group containing ws1 must also contain at least
one resource of the type IP_address nd one resource of
the type volume.
When using the Cluster Manager GUI, you add or remove dependencies for
a resource type by selecting the “Add/Remove Dependencies for a Resource
Type” from the “Resources & Resource Types” display
and providing the indicated input. When using the Cluster Manager CLI, you
add or remove dependencies when you define or modify the resource type.
Modifying and Deleting Resource Types
resource type
modifying resource
typedeletingAfter you have defined
a resource types, you can modify and delete them.
Modifying and Deleting Resource Types with the Cluster Manager GUI
To modify a resource type with the Cluster Manager GUI, perform the
following procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Modify
a Resource Type Definition” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To delete a resource type with the Cluster Manager GUI, perform the
following procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Resources
& Resource Types” category.
On the right side of the display click on the “Delete
a Resource Type” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Modifying and Deleting Resource Types with the Cluster Manager CLI
Use the following CLI command to modify a resource:
cmgr> modify resource_type
A [in cluster B]
Entering this command specifies the resource type you are modifying
within a specified cluster. If you have specified a default cluster, you do
not need to specify a cluster in this command and the CLI will use the default.
You modify a resource type using the same commands you use to define
a resource type.
You can use the following command to delete a resource type:
cmgr> delete resource_type
A [in cluster B]
Installing (Loading) a Resource Type on a Cluster
installing resource type
resource type
installingWhen you define a cluster, Linux
FailSafe installs a set of resource type definitions that you can use that
include default values. If you need to install additional standard Silicon
Graphics-supplied resource type definitions on the cluster, or if you delete
a standard resource type definition and wish to reinstall it, you can load
that resource type definition on the cluster.
The resource type definition you are installing cannot exist on the
cluster.
Installing a Resource Type with the Cluster Manager GUI
To install a resource type using the GUI, select the “Load a Resource”
task from the “Resources & Resource Types” task page and enter
the resource type to load.
Installing a Resource Type with the Cluster Manager CLI
Use the following CLI command to install a resource type on a cluster:
cmgr> install resource_type
A [in cluster B]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Displaying Resource Types
resource type
displayingAfter you have defined a resource types,
you can display them.
Displaying Resource Types with the Cluster Manager GUI
The Cluster Manager GUI provides a convenient display of resource types
through the FailSafe Cluster View. You can launch the FailSafe Cluster View
directly, or you can bring it up at any time by clicking on the “FailSafe
Cluster View” prompt at the bottom of the “FailSafe Manager”
display.
From the View menu of the FailSafe Cluster View, select Types to see
all defined resource types. You can then click on any of the resource type
icons to view the parameters of the resource type.
Displaying Resource Types with the Cluster Manager CLI
Use the following command to view the parameters of a defined resource
type in a specified cluster:
cmgr> show resource_type A [
in cluster B]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Use the following command to view all of the defined resource types
in a cluster:
cmgr> show resource_types [in cluster
A]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Use the following command to view all of the defined resource types
that have been installed:
cmgr> show resource_types
installed
Defining a Failover Policy
Before you can configure your resources into a resource group, you must
determine which failover policy to apply to the resource group. To define
a failover policy, you provide the following information:
failover policydefinition
The name of the failover policy, with a maximum length of
63 characters, which must be unique within the pool.
The name of an existing failover script.
The initial failover domain, which is an ordered list of the
nodes on which the resource group may execute. The administrator supplies
the initial failover domain when configuring the failover policy; this is
input to the failover script, which generates the runtime failover domain.
The failover attributes, which modify the behavior of the
failover script.
Complete information on failover policies and failover scripts, with
an emphasis on writing your own failover policies and scripts, is provided
in the Linux FailSafe Programmer's Guide.
Failover Scripts
failover policy
failover script
failover scriptA failover script
helps determine the node that is chosen for a failed resource group. The failover
script takes the initial failover domain and transforms it into the runtime
failover domain. Depending upon the contents of the script, the initial and
the runtime domains may be identical.
The ordered failover script is provided with the
Linux FailSafe release. The ordered script never changes
the initial domain; when using this script, the initial and runtime domains
are equivalent.
The round-robin failover script is also provided
with the Linux FailSafe release. The round-robin cript
selects the resource group owner in a round-robin (circular) fashion. This
policy can be used for resource groups that can be run in any node in the
cluster.
Failover scripts are stored in the /usr/lib/failsafe/policies
directory. If the ordered script does not meet
your needs, you can define a new failover script and place it in the
/usr/lib/failsafe/policies directory. When you are using the FailSafe
GUI, the GUI automatically detects your script and presents it to you as a
choice for you to use. You can configure the Linux FailSafe database to use
your new failover script for the required resource groups. For information
on defining failover scripts, see the Linux FailSafe Programmer's
Guide.
Failover Domain
A failover domain is the ordered list of nodes
on which a given resource group can be allocated. The nodes listed in the
failover domain must be within the same cluster; however, the failover domain
does not have to include every node in the cluster. The failover domain can
be used to statically load balance the resource groups in a cluster.
Examples:
In a four-node cluster, two nodes might share a volume. The
failover domain of the resource group containing the volume will be the two
nodes that share the volume.
If you have a cluster of nodes named venus, mercury, and pluto,
you could configure the following initial failover domains for resource groups
RG1 and RG2:failover policy
failover domain
domain
failover domain
venus, mercury, pluto for RG1
pluto, mercury for RG2
When you define a failover policy, you specify the initial
failover domain. The initial failover domain is used when a cluster
is first booted. The ordered list specified by the initial failover domain
is transformed into a runtime failover domainrun-time failover domain by
the failover script. With each failure, the failover script takes the current
run-time failover domain and potentially modifies it; the initial failover
domain is never used again. Depending on the run-time conditions and contents
of the failover script, the initial and run-time failover domains may be identical.
Linux FailSafe stores the run-time failover domain and uses it as input
to the next failover script invocation.
Failover Attributes
failover policy
failover attributes
failover attributesA failover attribute is a value that
is passed to the failover script and used by Linux FailSafe for the purpose
of modifying the run-time failover domain used for a specific resource group.
You can specify a failover attribute of Auto_Failback,
Controlled_Failback, Auto_Recovery, or InPlace_Recovery.
Auto_Failback and Controlled_Failback
are mutually exclusive, but you must specify
one or the other. Auto_Recovery and InPlace_Recovery
are mutually exclusive, but whether you specify one or the other
is optional.
A failover attribute of Auto_Failback specifies
that the resource group will be run on the first available node in the runtime
failover domain. If the first node fails, the next available node will be
used; when the first node reboots, the resource group will return to it. This
attribute is best used when some type of load balancing is required.
A failover attribute of Controlled_Failback specifies
that the resource group will be run on the first available node in the runtime
failover domain, and will remain running on that node until it fails. If the
first node fails, the next available node will be used; the resource group
will remain on this new node even after the first node reboots.This attribute
is best used when client/server applications have expensive recovery mechanisms,
such as databases or any application that uses tcp to communicate.
The recovery attributes Auto_Recovery and
InPlace_Recovery determine the node on which a resource group will
be allocated when its state changes to online and a member of the group is
already allocated (such as when volumes are present). Auto_Recovery
specifies that the failover policy will be used to allocate the
resource group; this is the default recovery attribute if you have specified
the Auto_Failback attribute. InPlace_Recovery
specifies that the resource group will be allocated on the node
that already contains part of the resource group; this is the default recovery
attribute if you have specified the Controlled_Failback
attribute.
See the Linux FailSafe Programmer's Guide for
a full discussions of example failover policies.
Defining a Failover Policy with the Cluster Manager GUI
To define a failover policy using the GUI, perform the following steps:
Launch the FailSafe Manager.
On the left side of the display, click on the “Failover
Policies & Resource Groups” category.
On the right side of the display click on the “Define
a Failover Policy” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task.
Defining a Failover Policy with the Cluster Manager CLI
To define a failover policy, enter the following command at the
cmgr prompt to specify the name of the failover policy:
cmgr> define failover_policy
A
The following prompt appears:
failover_policy A?
When this prompt appears you can use the following commands to specify
the components of a failover policy:
failover_policy A? set attribute to
B
failover policy A? set script to
C
failover policy A? set domain to
D
failover_policy A?
When you define a failover policy, you can set as many attributes and
domains as your setup requires, but executing the add attribute
and add domain commands with different values. The CLI
also allows you to specify multiple domains in one command of the following
format:
failover_policy A? set domain to
A B C ...
The components of a failover policy are described in detail in the
Linux FailSafe Programmer's Guide and in summary in .
When you are finished defining the failover policy, enter
done to return to the cmgr prompt.
Modifying and Deleting Failover Policies
After you have defined a failover policy, you can modify or delete it.
Modifying and Deleting Failover Policies with the Cluster Manager GUI
To modify a failover policy with the Cluster Manager GUI, perform the
following procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Failover
Policies & Resource Groups” category.
On the right side of the display click on the “Modify
a Failover Policy Definition” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To delete a failover policy with the Cluster Manager GUI, perform the
following procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Failover
Policies & Resource Groups” category.
On the right side of the display click on the “Delete
a Failover Policy” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Modifying and Deleting Failover Policies with the Cluster Manager CLI
Use the following CLI command to modify a failover policy:
cmgr> modify failover_policy
A
You modify a failover policy using the same commands you use to define
a failover policy.
You can use the following command to delete a failover policy definition:
cmgr> delete failover_policy
A
Displaying Failover Policies
You can use Linux FailSafe to display any of the following:
The components of a specified failover policy
All of the failover policies that have been defined
All of the failover policy attributes that have been defined
All of the failover policy scripts that have been defined
Displaying Failover Policies with the Cluster Manager GUI
The Cluster Manager GUI provides a convenient display of failover policies
through the FailSafe Cluster View. You can launch the FailSafe Cluster View
directly, or you can bring it up at any time by clicking on the “FailSafe
Cluster View” prompt at the bottom of the “FailSafe Manager”
display.
From the View menu of the FailSafe Cluster View, select Failover Policies
to see all defined failover policies.
Displaying Failover Policies with the Cluster Manager CLI
Use the following command to view the parameters of a defined failover
policy:
cmgr> show failover_policy A
Use the following command to view all of the defined failover policies:
cmgr> show failover policies
Use the following command to view all of the defined failover policy
attributes:
cmgr> show failover_policy attributes
Use the following command to view all of the defined failover policy
scripts:
cmgr> show failover_policy scripts
Defining Resource Groups
resource group
definitionResources are configured together into
resource groups. A resource group is a collection of interdependent
resources. If any individual resource in a resource group becomes unavailable
for its intended use, then the entire resource group is considered unavailable.
Therefore, a resource group is the unit of failover for Linux FailSafe.
For example, a resource group could contain all of the resources that
are required for the operation of a web node, such as the web node itself,
the IP address with which it communicates to the outside world, and the disk
volumes containing the content that it serves.
When you define a resource group, you specify a failover
policy. A failover policy controls the behavior of a resource
group in failure situations.
To define a resource group, you provide the following information:
The name of the resource group, with a maximum length of 63
characters.
The name of the cluster to which the resource group is available
The resources to include in the resource group, and their
resource types
The name of the failover policy that determines which node
will take over the services of the resource group on failure
Linux FailSafe does not allow resource groups that do not contain any
resources to be brought online.
You can define up to 100 resources configured in any number of resource
groups.
Defining a Resource Group with the Cluster Manager GUI
To define a resource group with the Cluster Manager GUI, perform the
following steps:
Launch the FailSafe Manager.
On the left side of the display, click on “Guided Configuration”.
On the right side of the display click on “Set Up Highly
Available Resource Groups” to launch the task link.
In the resulting window, click each task link in turn, as
it becomes available. Enter the selected inputs for each task.
When finished, click “OK” to close the taskset
window.
Defining a Resource Group with the Cluster Manager CLI
To configure a resource group, enter the following command at the
cmgr prompt to specify the name of a resource group and the cluster
to which the resource group is available:
cmgr> define resource_group A
[in cluster B]
Entering this command specifies the name of the resource group you are
defining within a specified cluster. If you have specified a default cluster,
you do not need to specify a cluster in this command and the CLI will use
the default.
The following prompt appears:
Enter commands, when finished enter either "done" or "cancel"
resource_group A?
When this prompt appears you can use the following commands to specify
the resources to include in the resource group and the failover policy to
apply to the resource group:
resource_group A? add resource
B of resource_type
C
resource_group A? set failover_policy to
D
After you have set the failover policy and you have finished adding
resources to the resource group, enter done to return
to the cmgr prompt.
For a full example of resource group creation using the Cluster Manager
CLI, see .
Modifying and Deleting Resource Groups
resource group
modifying resource
groupdeletingAfter you have defined
resource groups, you can modify and delete the resource groups. You can change
the failover policy of a resource group by specifying a new failover policy
associated with that resource group, and you can add or delete resources to
the existing resource group. Note, however, that since you cannot have a resource
group online that does not contain any resources, Linux FailSafe does not
allow you to delete all resources from a resource group once the resource
group is online. Likewise, Linux FailSafe does not allow you to bring a resource
group online if it has no resources. Also, resources must be added and deleted
in atomic units; this means that resources which are interdependent must be
added and deleted together.
Modifying and Deleting Resource Groups with the Cluster Manager GUI
To modify a failure policy with the Cluster Manager GUI, perform the
following procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Failover
Policies & Resource Groups” category.
On the right side of the display click on the “Modify
a Resource Group Definition” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To add or delete resources to a resource group definition with the Cluster
Manager GUI, perform the following procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Failover
Policies & Resource Groups” category.
On the right side of the display click on the “Add/Remove
Resources in Resource Group” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To delete a resource group with the Cluster Manager GUI, perform the
following procedure:
Launch the FailSafe Manager.
On the left side of the display, click on the “Failover
Policies & Resource Groups” category.
On the right side of the display click on the “Delete
a Resource Group” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Modifying and Deleting Resource Groups with the Cluster Manager CLI
Use the following CLI command to modify a resource group:
cmgr> modify resource_group A
[in cluster B]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default. You modify a resource
group using the same commands you use to define a failover policy:
resource_group A? add resource
B of resource_type
C
resource_group A? set failover_policy to
D
You can use the following command to delete a resource group definition:
cmgr> delete resource_group A
[in cluster B]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Displaying Resource Groups
resource group
displayingYou can display the parameters of a defined
resource group, and you can display all of the resource groups defined for
a cluster.
Displaying Resource Groups with the Cluster Manager GUI
The Cluster Manager GUI provides a convenient display of resource groups
through the FailSafe Cluster View. You can launch the FailSafe Cluster View
directly, or you can bring it up at any time by clicking on the “FailSafe
Cluster View” prompt at the bottom of the “FailSafe Manager”
display.
From the View menu of the FailSafe Cluster View, select Groups to see
all defined resource groups.
To display which nodes are currently running which groups, select “Groups
owned by Nodes.” To display which groups are running which failover
policies, select “Groups by Failover Policies.”
Displaying Resource Groups with the Cluster Manager CLI
Use the following command to view the parameters of a defined resource
group:
cmgr> show resource_group A [
in cluster B]
If you have specified a default cluster, you do not need to specify
a cluster in this command and the CLI will use the default.
Use the following command to view all of the defined failover policies:
cmgr> show resource_groups [in cluster
A]
Linux FailSafe System Log Configuration
log groupsLinux
FailSafe maintains system logs for each of the Linux FailSafe daemons. You
can customize the system logs according to the level of logging you wish to
maintain.
A log group is a set of processes that log to the same log file according
to the same logging configuration. All Linux FailSafe daemons make one log
group each. Linux FailSafe maintains the following log groups:
cli log
cli
Commands log
crsd log
crsd
Cluster reset services (crsd) log
diags log
diags
Diagnostics log
ha_agent
logha_agent
HA monitoring agents (ha_ifmx2) log
ha_cmsd
logha_cmsd
Cluster membership daemon (ha_cmsd) log
ha_fsd
logha_fsd
Linux FailSafe daemon (ha_fsd) log
ha_gcd
logha_gcd
Group communication daemon (ha_gcd) log
ha_ifd
logha_ifd
network interface monitoring daemon (ha_ifd) log
ha_script
logha_script
Action and Failover policy scripts log
ha_srmd
logha_srmd
System resource manager (ha_srmd) log
Log group configuration information is maintained for all nodes in the
pool for the cli and crsd log groups
or for all nodes in the cluster for all other log groups.You can also customize
the log group configuration for a specific node in the cluster or pool.
When you configure a log group, you specify the following information:
The log level, specified as character strings with the CUI
and numerically (1 to 19) with the CLI, as described below
The log file to log to
The node whose specified log group you are customizing (optional)
log level
The log level specifies the verbosity of the logging, controlling the amount
of log messages that Linux FailSafe will write into an associated log group's
file. There are 10 debug level. , shows the
logging levels as you specify them with the GUI and the CLI.
Log Levels
GUI level
CLI levelMeaning
Off
0
No logging
Minimal
1
Logs notification of critical errors
and normal operation
Info
2
Logs minimal notification plus warning
Default
5
Logs all Info messages plus additional
notifications
Debug0
10
...
Debug0 through
Debug9 (11 -19 in CLI) log increasingly more debug information,
including data structures. Many megabytes of disk space can be consumed on
the server when debug levels are used in a log configuration.
Debug9
19
Notifications of critical errors and normal operations are always sent
to /var/log/failsafe/. Changes you make to the log level
for a log group do not affect SYSLOG.
The Linux FailSafe software appends the node name to the name of the
log file you specify. For example, when you specify the log file name for
a log group as /var/log/failsafe/cli, the file name will
be /var/log/failsafe/cli_nodename.
log filesThe
default log file names are as follows.
/var/log/failsafe/cmsd_
nodename
log file for cluster membership services daemon in node
nodename
/var/log/failsafe/gcd_
nodename
log file for group communication daemon in node nodename
/var/log/failsafe/srmd_
nodename
log file for system resource manager daemon in node nodename
/var/log/failsafe/failsafe_
nodename
log file for Linux FailSafe daemon, a policy implementor for resource
groups, in node nodename
/var/log/failsafe/agent_nodename
log file for monitoring agent named agent
in node nodename. For example, ifd_
nodename is the log file for the interface daemon monitoring
agent that monitors interfaces and IP addresses and performs local failover
of IP addresses.
/var/log/failsafe/crsd_
nodename
log file for reset daemon in node nodename
/var/log/failsafe/script_
nodename
log file for scripts in node nodename
/var/log/failsafe/cli_
nodename
log file or internal administrative commands in node nodename
invoked by the Cluster Manager GUI and Cluster Manager CLI
For information on using log groups in system recovery, see .
Configuring Log Groups with the Cluster Manager GUI
To configure a log group with the Cluster Manager GUI, perform the following
steps:
Launch the FailSafe Manager.
On the left side of the display, click on the “Nodes
& Clusters” category.
On the right side of the display click on the “Set Log
Configuration” task link to launch the task.
Enter the selected inputs.
Click on “OK” at the bottom of the screen to complete
the task.
Configuring Log Groups with the Cluster Manager CLI
You can configure a log group with the following CLI command:
cmgr> define log_group A [
on node B] [in cluster
C]
You specify the node if you wish to customize the log group configuration
for a specific node only. If you have specified a default cluster, you do
not have to specify a cluster in this command; Linux FailSafe will use the
default.
The following prompt appears:
Enter commands, when finished enter either "done" or "cancel"
log_group A?
When this prompt of the node name appears, you enter the log group parameters
you wish to modify in the following format:
log_group A? set log_level to
A
log_group A? add log_file
A
log_group A? remove log_file
A
When you are finished configuring the log group, enter done
to return to the cmgr prompt.
Modifying Log Groups with the Cluster Manager CLI
Use the following CLI command to modify a log group:
cmgr> modify log_group A
on [node B] [
in cluster C]
You modify a log group using the same commands you use to define a log
group.
Displaying Log Group Definitions with the Cluster Manager GUI
To display log group definitions with the Cluster Manager GUI, run “Set
Log Configuration” and choose the log group to display from the rollover
menu. The current log level and log file for that log group will be displayed
in the task window, where you can change those settings if you desire.
Displaying Log Group Definitions with the Cluster Manager CLI
Use the following command to view the parameters of a defined resource:
cmgr> show log_groups
This command shows all of the log groups currently defined, with the
log group name, the logging levels and the log files.
For information on viewing the contents of the log file, see .
Resource Group Creation Example
resource group
creation exampleUse the following procedure to create
a resource group using the Cluster Manager CLI:
Determine the list of resources that belong to the resource
group you are defining. The list of resources that belong to a resource group
are the resources that move from one node to another as one unit.
resourceNFS
resource type
NFSA resource group that provides
NFS services would contain a resource of each of the following types:
IP_address
volume
filesystem
NFS
All resource and resource type dependencies of resources in a resource
group must be satisfied. For example, the NFS resource
type depends on the filesystem resource type, so a resource
group containing a resource of NFS resource type should
also contain a resource of filesystem resource type.
Determine the failover policy to be used by the resource group.
Use the template cluster_mgr script available
in the /usr/lib/failsafe/cmgr-templates/cmgr-create-resource_group
file.
This example shows a script that creates a resource group with the following
characteristics:
The resource group is named nfs-group
The resource group is in cluster HA-cluster
The resource group uses the failover policy
the resource group contains IP_Address,
volume, filesystem, and NFS
resources
The following script can be used to create this resource group:
define resource_group nfs-group in cluster HA-cluster
set failover_policy to n1_n2_ordered
add resource 192.0.2.34 of resource_type IP_address
add resource havol1 of resource_type volume
add resource /hafs1 of resource_type filesystem
add resource /hafs1 of resource_type NFS
done
Run this script using the -f option of
the cluster_mgr command.
Linux FailSafe Configuration Example CLI Script
The following Cluster Manager CLI script provides an example which shows
how to configure a cluster in the cluster database. The script illustrates
the CLI commands that you execute when you define a cluster. You will use
the parameters of your own system when you configure your cluster. After you
create a CLI script, you can set the execute permissions and execute the script
directly.
For general information on CLI scripts, see .
For information on the CLI template files that you can use to create your
own configuration script, see .
#!/usr/lib/failsafe/bin/cluster_mgr -f
#################################################################
# #
# Sample cmgr script to create a 2-node cluster in the cluster #
# database (cdb). #
# This script is created using cmgr template files under #
# /usr/lib/failsafe/cmgr-scripts directory. #
# The cluster has 2 resource groups: #
# 1. nfs-group - Has 2 NFS, 2 filesystem, 2 volume, 1 statd and #
# 1 IP_address resources. #
# 2. web-group - Has 1 Netscape_web and 1 IP_address resources. #
# #
# NOTE: After running this script to define the cluster in the #
# cdb, the user has to enable the two resource groups using the #
# cmgr admin online resource_group command. #
# #
#################################################################
#
# Create the first node.
# Information to create a node is obtained from template script:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-node
#
#
#
# logical name of the node. It is recommended that logical name of the # node be
output of hostname(1) command.
#
define node sleepy
#
# Hostname of the node. This is optional. If this field is not
# specified,logical name of the node is assumed to be hostname.
# This value has to be
# the output of hostname(1) command.
#
set hostname to sleepy
#
# Node identifier. Node identifier is a 16 bit integer that uniquely
# identifies the node. This field is optional. If value is
# not provided,cluster software generates node identifier.
# Example value: 1
set nodeid to 101
#
# Description of the system controller of this node.
# System controller can be “chalL” or “msc” or “mmsc”. If the node is a
# Challenge DM/L/XL, then system controller type is “chalL”. If the
# node is Origin 200 or deskside Origin 2000, then the system
# controller type is “msc”. If the node is rackmount Origin 2000, the
# system controller type is “mmsc”.
# Possible values: msc, mmsc, chalL
#
set sysctrl_type to msc
#
# You can enable or disable system controller definition. Users are
# expected to enable system controller definition after verify the
# serial reset cables connected to this node.
# Possible values: enabled, disabled
#
set sysctrl_status to enabled
#
# The system controller password for doing privileged system controller
# commands.
# This field is optional.
#
set sysctrl_password to none
#
# System controller owner. The node name of the machine that is
# connected using serial cables to system controller of this node.
# System controller node also has to be defined in the CDB.
#
set sysctrl_owner to grumpy
#
# System controller device. The absolute device path name of the tty
# to which the serial cable is connected in this node.
# Example value: /dev/ttyd2
#
set sysctrl_device to /dev/ttyd2
#
# Currently, the system controller owner can be connected to the system
# controller on this node using “tty” device.
# Possible value: tty
#
set sysctrl_owner_type to tty
#
# List of control networks. There can be multiple control networks
# specified for a node. HA cluster software uses these control
# networks for communication between nodes. At least two control
# networks should be specified for heartbeat messages and one
# control network for failsafe control messages.
# For each control network for the node, please add one more
# control network section.
#
# Name of control network IP address. This IP address must
# be configured on the network interface in /etc/rc.config
# file in the node.
# It is recommended that the IP address in internet dot notation
# is provided.
# Example value: 192.26.50.3
#
add nic 192.26.50.14
#
# Flag to indicate if the control network can be used for sending
# heartbeat messages.
# Possible values: true, false
#
set heartbeat to true
#
# Flag to indicate if the control network can be used for sending
# failsafe control messages.
# Possible values: true, false
#
set ctrl_msgs to true
#
# Priority of the control network. Higher the priority value, lower the
# priority of the control network.
# Example value: 1
#
set priority to 1
#
# Control network information complete
#
done
#
# Add more control networks information here.
#
# Name of control network IP address. This IP address must be
# configured on the network interface in /etc/rc.config
# file in the node.
# It is recommended that the IP address in internet dot
# notation is provided.
# Example value: 192.26.50.3
#
add nic 150.166.41.60
#
# Flag to indicate if the control network can be used for sending
# heartbeat messages.
# Possible values: true, false
#
set heartbeat to true
#
# Flag to indicate if the control network can be used for sending
# failsafe control messages.
# Possible values: true, false
#
set ctrl_msgs to false
#
# Priority of the control network. Higher the priority value, lower the
# priority of the control network.
# Example value: 1
#
set priority to 2
#
# Control network information complete
#
done
#
# Node definition complete
#
done
#
# Create the second node.
# Information to create a node is obtained from template script:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-node
#
#
#
# logical name of the node. It is recommended that logical name of
# the node be output of hostname(1) command.
#
define node grumpy
#
# Hostname of the node. This is optional. If this field is not
# specified,logical name of the node is assumed to be hostname.
# This value has to be
# the output of hostname(1) command.
#
set hostname to grumpy
#
# Node identifier. Node identifier is a 16 bit integer that uniquely
# identifies the node. This field is optional. If value is
# not provided,cluster software generates node identifier.
# Example value: 1
set nodeid to 102
#
# Description of the system controller of this node.
# System controller can be “chalL” or “msc” or “mmsc”. If the node is a
# Challenge DM/L/XL, then system controller type is “chalL”. If the
# node is Origin 200 or deskside Origin 2000, then the system
# controller type is “msc”. If the node is rackmount Origin 2000,
# the system controller type is “mmsc”.
# Possible values: msc, mmsc, chalL
#
set sysctrl_type to msc
#
# You can enable or disable system controller definition. Users are
# expected to enable system controller definition after verify the
# serial reset cables connected to this node.
# Possible values: enabled, disabled
#
set sysctrl_status to enabled
#
# The system controller password for doing privileged system controller
# commands.
# This field is optional.
#
set sysctrl_password to none
#
# System controller owner. The node name of the machine that is
# connected using serial cables to system controller of this node.
# System controller node also has to be defined in the CDB.
#
set sysctrl_owner to sleepy
#
# System controller device. The absolute device path name of the tty
# to which the serial cable is connected in this node.
# Example value: /dev/ttyd2
#
set sysctrl_device to /dev/ttyd2
#
# Currently, the system controller owner can be connected to the system
# controller on this node using “tty” device.
# Possible value: tty
#
set sysctrl_owner_type to tty
#
# List of control networks. There can be multiple control networks
# specified for a node. HA cluster software uses these control
# networks for communication between nodes. At least two control
# networks should be specified for heartbeat messages and one
# control network for failsafe control messages.
# For each control network for the node, please add one more
# control network section.
#
# Name of control network IP address. This IP address must be
# configured on the network interface in /etc/rc.config
# file in the node.
# It is recommended that the IP address in internet dot notation
# is provided.
# Example value: 192.26.50.3
#
add nic 192.26.50.15
#
# Flag to indicate if the control network can be used for sending
# heartbeat messages.
# Possible values: true, false
#
set heartbeat to true
#
# Flag to indicate if the control network can be used for sending
# failsafe control messages.
# Possible values: true, false
#
set ctrl_msgs to true
#
# Priority of the control network. Higher the priority value, lower the
# priority of the control network.
# Example value: 1
#
set priority to 1
#
# Control network information complete
#
done
#
# Add more control networks information here.
#
# Name of control network IP address. This IP address must be
# configured on the network interface in /etc/rc.config
# file in the node.
# It is recommended that the IP address in internet dot notation
# is provided.
# Example value: 192.26.50.3
#
add nic 150.166.41.61
#
# Flag to indicate if the control network can be used for sending
# heartbeat messages.
# Possible values: true, false
#
set heartbeat to true
#
# Flag to indicate if the control network can be used for sending
# failsafe control messages.
# Possible values: true, false
#
set ctrl_msgs to false
#
# Priority of the control network. Higher the priority value, lower the
# priority of the control network.
# Example value: 1
#
set priority to 2
#
# Control network information complete
#
done
#
# Node definition complete
#
done
#
# Define (create) the cluster.
# Information to create the cluster is obtained from template script:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-cluster
#
#
# Name of the cluster.
#
define cluster failsafe-cluster
#
# Notification command for the cluster. This is optional. If this
# field is not specified, /usr/bin/mail command is used for
# notification. Notification is sent when there is change in status of
# cluster, node and resource group.
#
set notify_cmd to /usr/bin/mail
#
# Notification address for the cluster. This field value is passed as
# argument to the notification command. Specifying the notification
# command is optional and user can specify only the notification
# address in order to receive notifications by mail. If address is
# not specified, notification will not be sent.
# Example value: failsafe_alias@sysadm.company.com
set notify_addr to robinhood@sgi.com princejohn@sgi.com
#
# List of nodes added to the cluster.
# Repeat the following line for each node to be added to the cluster.
# Node should be already defined in the CDB and logical name of the
# node has to be specified.
add node sleepy
#
# Add more nodes to the cluster here.
#
add node grumpy
#
# Cluster definition complete
#
done
#
# Create failover policies
# Information to create the failover policies is obtained from
# template script:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-cluster
#
#
# Create the first failover policy.
#
#
# Name of the failover policy.
#
define failover_policy sleepy-primary
#
# Failover policy attribute. This field is mandatory.
# Possible values: Auto_Failback, Controlled_Failback, Auto_Recovery,
# InPlace_Recovery
#
set attribute to Auto_Failback
set attribute to Auto_Recovery
#
# Failover policy script. The failover policy scripts have to
# be present in
# /usr/lib/failsafe/policies directory. This field is mandatory.
# Example value: ordered (file name not the full path name).
set script to ordered
#
# Failover policy domain. Ordered list of nodes in the cluster
# separated by spaces. This field is mandatory.
#
set domain to sleepy grumpy
#
# Failover policy definition complete
#
done
#
# Create the second failover policy.
#
#
# Name of the failover policy.
#
define failover_policy grumpy-primary
#
# Failover policy attribute. This field is mandatory.
# Possible values: Auto_Failback, Controlled_Failback, Auto_Recovery,
# InPlace_Recovery
#
set attribute to Auto_Failback
set attribute to InPlace_Recovery
#
# Failover policy script. The failover policy scripts have
# to be present in
# /usr/lib/failsafe/policies directory. This field is mandatory.
# Example value: ordered (file name not the full path name).
set script to ordered
#
# Failover policy domain. Ordered list of nodes in the cluster
# separated by spaces. This field is mandatory.
#
set domain to grumpy sleepy
#
# Failover policy definition complete
#
done
#
# Create the IP_address resources.
# Information to create an IP_address resource is obtained from:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-resource-IP_address
#
#
# If multiple resources of resource type IP_address have to be created,
# repeat the following IP_address definition template.
#
# Name of the IP_address resource. The name of the resource has to
# be IP address in the internet “.” notation. This IP address is used
# by clients to access highly available resources.
# Example value: 192.26.50.140
#
define resource 150.166.41.179 of resource_type IP_address in cluster failsafe-cluster
#
# The network mask for the IP address. The network mask value is used
# to configure the IP address on the network interface.
# Example value: 0xffffff00
set NetworkMask to 0xffffff00
#
# The ordered list of interfaces that can be used to configure the IP
# address.The list of interface names are separated by comma.
# Example value: eth0, eth1
set interfaces to eth1
#
# The broadcast address for the IP address.
# Example value: 192.26.50.255
set BroadcastAddress to 150.166.41.255
#
# IP_address resource definition for the cluster complete
#
done
#
# Name of the IP_address resource. The name of the resource has to be
# IP address in the internet “.” notation. This IP address is used by
# clients to access highly available resources.
# Example value: 192.26.50.140
#
define resource 150.166.41.99 of resource_type IP_address in cluster failsafe-cluster
#
# The network mask for the IP address. The network mask value is used
# to configure the IP address on the network interface.
# Example value: 0xffffff00
set NetworkMask to 0xffffff00
#
# The ordered list of interfaces that can be used to configure the IP
# address.
# The list of interface names are separated by comma.
# Example value: eth0, eth1
set interfaces to eth1
#
# The broadcast address for the IP address.
# Example value: 192.26.50.255
set BroadcastAddress to 150.166.41.255
#
# IP_address resource definition for the cluster complete
#
done
#
# Create the volume resources.
# Information to create a volume resource is obtained from:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-resource-volume
#
#
# If multiple resources of resource type volume have to be created,
# repeat the following volume definition template.
#
# Name of the volume. The name of the volume has to be:
# Example value: HA_vol (not /dev/xlv/HA_vol)
#
define resource bagheera of resource_type volume in cluster failsafe-cluster
#
# The user name of the device file name. This field is optional. If
# this field is not specified, value ``root'' is used.
# Example value: oracle
set devname-owner to root
#
# The group name of the device file name. This field is optional.
# If this field is not specified, value ``sys” is used.
# Example value: oracle
set devname-group to sys
#
# The device file permissions. This field is optional. If this
# field is not specified, value ``666” is used. The file permissions
# have to be specified in octal notation. See chmod(1) for more
# information.
# Example value: 666
set devname-mode to 666
#
# Volume resource definition for the cluster complete
#
done
#
# Name of the volume. The name of the volume has to be:
# Example value: HA_vol (not /dev/xlv/HA_vol)
#
define resource bhaloo of resource_type volume in cluster failsafe-cluster
#
# The user name of the device file name. This field is optional. If this
# field is not specified, value “root” is used.
# Example value: oracle
set devname-owner to root
#
# The group name of the device file name. This field is optional.
# If this field is not specified, value “sys” is used.
# Example value: oracle
set devname-group to sys
#
# The device file permissions. This field is optional. If this field is
# not specified, value “666” is used. The file permissions
# have to be specified in octal notation. See chmod(1) for more
# information.
# Example value: 666
set devname-mode to 666
#
# Volume resource definition for the cluster complete
#
done
#
# Create the filesystem resources.
# Information to create a filesystem resource is obtained from:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-resource-filesystem
#
#
# filesystem resource type is for XFS filesystem only.
# If multiple resources of resource type filesystem have to be created,
# repeat the following filesystem definition template.
#
# Name of the filesystem. The name of the filesystem resource has
# to be absolute path name of the filesystem mount point.
# Example value: /shared_vol
#
define resource /haathi of resource_type filesystem in cluster failsafe-cluster
#
# The name of the volume resource corresponding to the filesystem. This
# resource should be the same as the volume dependency, see below.
# This field is mandatory.
# Example value: HA_vol
set volume-name to bagheera
#
# The options to be used when mounting the filesystem. This field is
# mandatory. For the list of mount options, see fstab(4).
# Example value: “rw”
set mount-options to rw
#
# The monitoring level for the filesystem. This field is optional. If
# this field is not specified, value “1” is used.
# Monitoring level can be
# 1 - Checks if filesystem exists in the mtab file (see mtab(4)). This
# is a lightweight check compared to monitoring level 2.
# 2 - Checks if the filesystem is mounted using stat(1m) command.
#
set monitoring-level to 2
done
#
# Add filesystem resource type dependency
#
modify resource /haathi of resource_type filesystem in cluster failsafe-cluster
#
# The filesystem resource type definition also contains a resource
# dependency on a volume resource.
# This field is mandatory.
# Example value: HA_vol
add dependency bagheera of type volume
#
# filesystem resource definition for the cluster complete
#
done
#
# Name of the filesystem. The name of the filesystem resource has
# to be absolute path name of the filesystem mount point.
# Example value: /shared_vol
#
define resource /sherkhan of resource_type filesystem in cluster failsafe-cluster
#
# The name of the volume resource corresponding to the filesystem. This
# resource should be the same as the volume dependency, see below.
# This field is mandatory.
# Example value: HA_vol
set volume-name to bhaloo
#
# The options to be used when mounting the filesystem. This field is
# mandatory.For the list of mount options, see fstab(4).
# Example value: “rw”
set mount-options to rw
#
# The monitoring level for the filesystem. This field is optional. If
# this field is not specified, value “1” is used.
# Monitoring level can be
# 1 - Checks if filesystem exists in the mtab file (see mtab(4)). This
# is a lightweight check compared to monitoring level 2.
# 2 - Checks if the filesystem is mounted using stat(1m) command.
#
set monitoring-level to 2
done
#
# Add filesystem resource type dependency
#
modify resource /sherkhan of resource_type filesystem in cluster failsafe-cluster
#
# The filesystem resource type definition also contains a resource
# dependency on a volume resource.
# This field is mandatory.
# Example value: HA_vol
add dependency bhaloo of type volume
#
# filesystem resource definition for the cluster complete
#
done
#
# Create the statd resource.
# Information to create a filesystem resource is obtained from:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-resource-statd
#
#
# If multiple resources of resource type statd have to be created,
# repeat the following filesystem definition template.
#
# Name of the statd. The name of the resource has to be the location
# of the NFS/lockd directory.
# Example value: /disk1/statmon
#
define resource /haathi/statmon of resource_type statd in cluster failsafe-cluster
#
# The IP address on which the NFS clinets connect, this resource should
# be the same as the IP_address dependency, see below.
# This field is mandatory.
# Example value: 128.1.2.3
set InterfaceAddress to 150.166.41.99
done
#
# Add the statd resource type dependencies
#
modify resource /haathi/statmon of resource_type statd in cluster failsafe-cluster
#
# The statd resource type definition also contains a resource
# dependency on a IP_address resource.
# This field is mandatory.
# Example value: 128.1.2.3
add dependency 150.166.41.99 of type IP_address
#
# The statd resource type definition also contains a resource
# dependency on a filesystem resource. It defines the location of
# the NFS lock directory filesystem.
# This field is mandatory.
# Example value: /disk1
add dependency /haathi of type filesystem
#
# statd resource definition for the cluster complete
#
done
#
# Create the NFS resources.
# Information to create a NFS resource is obtained from:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-resource-NFS
#
#
# If multiple resources of resource type NFS have to be created, repeat
# the following NFS definition template.
#
# Name of the NFS export point. The name of the NFS resource has to be
# export path name of the filesystem mount point.
# Example value: /disk1
#
define resource /haathi of resource_type NFS in cluster failsafe-cluster
#
# The export options to be used when exporting the filesystem. For the
# list of export options, see exportfs(1M).
# This field is mandatory.
# Example value: “rw,wsync,anon=root”
set export-info to rw
#
# The name of the filesystem resource corresponding to the export
# point. This resource should be the same as the filesystem dependency,
# see below.
# This field is mandatory.
# Example value: /disk1
set filesystem to /haathi
done
#
# Add the resource type dependency
#
modify resource /haathi of resource_type NFS in cluster failsafe-cluster
#
# The NFS resource type definition also contains a resource dependency
# on a filesystem resource.
# This field is mandatory.
# Example value: /disk1
add dependency /haathi of type filesystem
#
# The NFS resource type also contains a pseudo resource dependency
# on a statd resource. You really must have a statd resource associated
# with a NFS resource, so the NFS locks can be failed over.
# This field is mandatory.
# Example value: /disk1/statmon
add dependency /haathi/statmon of type statd
#
# NFS resource definition for the cluster complete
#
done
#
# Name of the NFS export point. The name of the NFS resource has to be
# export path name of the filesystem mount point.
# Example value: /disk1
#
define resource /sherkhan of resource_type NFS in cluster failsafe-cluster
#
# The export options to be used when exporting the filesystem. For the
# list of export options, see exportfs(1M).
# This field is mandatory.
# Example value: “rw,wsync,anon=root”
set export-info to rw
#
# The name of the filesystem resource corresponding to the export
# point. This
# resource should be the same as the filesystem dependency, see below.
# This field is mandatory.
# Example value: /disk1
set filesystem to /sherkhan
done
#
# Add the resource type dependency
#
modify resource /sherkhan of resource_type NFS in cluster failsafe-cluster
#
# The NFS resource type definition also contains a resource dependency
# on a filesystem resource.
# This field is mandatory.
# Example value: /disk1
add dependency /sherkhan of type filesystem
#
# The NFS resource type also contains a pseudo resource dependency
# on a statd resource. You really must have a statd resource associated
# with a NFS resource, so the NFS locks can be failed over.
# This field is mandatory.
# Example value: /disk1/statmon
add dependency /haathi/statmon of type statd
#
# NFS resource definition for the cluster complete
#
done
#
# Create the Netscape_web resource.
# Information to create a Netscape_web resource is obtained from:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-resource-Netscape_web
#
#
# If multiple resources of resource type Netscape_web have to be
# created, repeat the following filesystem definition template.
#
# Name of the Netscape WEB server. The name of the resource has to be
# a unique identifier.
# Example value: ha80
#
define resource web-server of resource_type Netscape_web in cluster failsafe-cluster
#
# The locations of the servers startup and stop scripts.
# This field is mandatory.
# Example value: /usr/ns-home/ha86
set admin-scripts to /var/netscape/suitespot/https-control3
#
# the TCP port number with the server listens on.
# This field is mandatory.
# Example value: 80
set port-number to 80
#
# The desired monitoring level, the user can specify either;
# 1 - checks for process existence
# 2 - issues an HTML query to the server.
# This field is mandatory.
# Example value: 2
set monitor-level to 2
#
# The locations of the WEB servers initial HTML page
# This field is mandatory.
# Example value: /var/www/htdocs
set default-page-location to /var/www/htdocs
#
# The WEB servers IP address, this must be a configured IP_address
# resource.
# This resource should be the same as the IP_address dependency, see
# below.
# This field is mandatory.
# Example value: 28.12.9.5
set web-ipaddr to 150.166.41.179
done
#
# Add the resource dependency
#
modify resource web-server of resource_type Netscape_web in cluster failsafe-cluster
#
# The Netscape_web resource type definition also contains a resource
# dependency on a IP_address resource.
# This field is mandatory.
# Example value: 28.12.9.5
add dependency 150.166.41.179 of type IP_address
#
# Netscape_web resource definition for the cluster complete
#
done
#
# Create the resource groups.
# Information to create a resource group is obtained from:
# /usr/lib/failsafe/cmgr-templates/cmgr-create-resource_group
#
#
# Name of the resource group. Name of the resource group must be unique
# in the cluster.
#
define resource_group nfs-group in cluster failsafe-cluster
#
# Failover policy for the resource group. This field is mandatory.
# Failover policy should be already defined in the CDB.
#
set failover_policy to sleepy-primary
#
# List of resources in the resource group.
# Repeat the following line for each resource to be added to the
# resource group.
add resource 150.166.41.99 of resource_type IP_address
#
# Add more resources to the resource group here.
#
add resource bagheera of resource_type volume
add resource bhaloo of resource_type volume
add resource /haathi of resource_type filesystem
add resource /sherkhan of resource_type filesystem
add resource /haathi/statmon of resource_type statd
add resource /haathi of resource_type NFS
add resource /sherkhan of resource_type NFS
#
# Resource group definition complete
#
done
#
# Name of the resource group. Name of the resource group must be unique
# in the cluster.
#
define resource_group web-group in cluster failsafe-cluster
#
# Failover policy for the resource group. This field is mandatory.
# Failover policy should be already defined in the CDB.
#
set failover_policy to grumpy-primary
#
# List of resources in the resource group.
# Repeat the following line for each resource to be added to the
# resource group.
add resource 150.166.41.179 of resource_type IP_address
#
# Add more resources to the resource group here.
#
add resource web-server of resource_type Netscape_web
#
# Resource group definition complete
#
done
#
# Script complete. This should be last line of the script
#
quit