Introduction to Writing Application Scripts
Linux FailSafe provides several highly available services for a two–node
cluster. These services are monitored by the Linux FailSafe software. You
can create additional services that are highly available by using the instructions
in this guide.
This chapter provides an introduction to Linux FailSafe programming.
The sections are as follows:
For an overview of the software layers, communication paths, and cluster
configuration database, see the Linux FailSafe Administrator's
Guide.
Concepts
In order to use Linux FailSafe, you must understand the concepts in
this section.concepts
Cluster Node (or Node)
A cluster node is a single Linux execution environment.
In other words, a single physical or virtual machine. In current Linux environments
this will always be an individual computer. The term node
is used to indicate this meaning in this guide for brevity, as opposed to
any meaning such as a network node.
cluster node
node
Pool
A pool is the entire set of nodes having membership
in a group of clusters. The clusters are usually close together and should
always serve a common purpose. A replicated cluster configuration database
is stored on each node in the pool.
pool
Cluster
A cluster is a collection of one or more nodes
coupled to each other by networks or other similar interconnections. A cluster
belongs to one pool and only one pool. A cluster is identified by a simple
name; this name must be unique within the pool. A particular node may be
a member of only one cluster. All nodes in a cluster are also in the pool;
however, all nodes in the pool are not necessarily in the cluster.cluster
Node Membership
A node membership is the list of nodes in a cluster
on which Linux FailSafe can allocate resource groups.
node membership
membership
Process Membership
A process
membership process membership
is the list of process instances in a cluster that form a process group. There
can be multiple process groups per node.
Resource
A resource is a single physical or logical entity
that provides a service to clients or other resources. For example, a resource
can be a single disk volume, a particular network address, or an application
such as a web server. A resource is generally available for use over time
on two or more nodes in a cluster, although it can only be allocated to one
node at any given time. resource
definition
Resources are identified by a resource name and a resource type. One
resource can be dependent on one or more other resources; if so, it will not
be able to start (that is, be made available for use) unless the dependent
resources are also started. Dependent resources must be part of the same resource
group and are identified in a resource dependency list.
Resource Type
A resource type is a particular class of resource.
All of the resources in a particular resource type can be handled in the same
way for the purposes of failover. Every resource is an instance of exactly
one resource type.resource type
description
A resource type is identified by a simple name; this name should be
unique within the cluster. A resource type can be defined for a specific node,
or it can be defined for an entire cluster. A resource type definition for
a specific node overrides a clusterwide resource type definition with the
same name; this allows an individual node to override global settings from
a clusterwide resource type definition.
Like resources, a resource type can be dependent on one or more other
resource types. If such a dependency exists, at least one instance of each
of the dependent resource types must be defined. For example, a resource type
named Netscape_web might have resource type dependencies
on resource types named IP_address and volume
. If a resource named web1 is defined with the
Netscape_web resource type, then the resource group containing
web1 must also contain at least one resource of the type
IP_address and one resource of the type volume.
The Linux FailSafe software includes some predefined resource types.
If these types fit the application you want to make highly available, you
can reuse them. If none fit, you can create additional resource types by using
the instructions in this guide.
Resource Name
A resource name identifies a specific instance
of a resource type. A resource name must be unique for a given resource type.resourcename
Resource Group
A resource group is a collection of interdependent
resources. A resource group is identified by a simple name; this name must
be unique within a cluster. shows an example
of the resources and their corresponding resource types for a resource group
named WebGroup.
resource groupdefinition
Example Resource Group
Resource
Resource Type
10.10.48.22
IP_address
/fs1
filesystem
vol1
volume
web1
Netscape_web
If any individual resource in a resource group becomes unavailable for
its intended use, then the entire resource group is considered unavailable.
Therefore, a resource group is the unit of failover.
Resource groups cannot overlap; that is, two resource groups cannot
contain the same resource.
For information about configuring resource groups, see the
Linux FailSafe Administrator's Guide.
Resource Dependency List
A resource dependency list is a list of resources
upon which a resource depends. Each resource instance must have resource dependencies
that satisfy its resource type dependencies before it can be added to a resource
group.
Resource Type Dependency List
A resource type dependency list is a list of
resource types upon which a resource type depends. For example, the
filesystem resource type depends upon the volume
resource type, and the Netscape_web resource type depends
upon the filesystem and IP_address resource
types.resource type
dependency list
dependency list
For example, suppose a file system instance fs1 is
mounted on volume vol1. Before fs1 can
be added to a resource group, fs1 must be defined to depend
on vol1. Linux FailSafe only knows that a file system instance
must have one volume instance in its dependency list. This requirement is
inferred from the resource type dependency list.
resourcedependency list
Failover
A failover is the process of allocating a resource
group (or application) to another node, according to a failover policy. A
failover may be triggered by the failure of a resource, a change in the node
membership (such as when a node fails or starts), or a manual request by the
administrator.failover
Failover Policy
A failover policy is the method used by Linux
FailSafe to determine the destination node of a failover. A failover policy
consists of the following:
Failover domain
Failover attributes
Failover script
Linux FailSafe uses the failover domain output from a failover script
along with failover attributes to determine on which node a resource group
should reside.
The administrator must configure a failover policy for each resource
group. A failover policy name must be unique within the pool. Linux FailSafe
includes predefined failover policies, but youcan define your own failover
algorithms as well. failover policy
Failover Domain
A failover domain is the ordered list of nodes
on which a given resource group can be allocated. The nodes listed in the
failover domain must be within the same cluster; however, the failover domain
does not have to include every node in the cluster.
failover domain
domain
application failover domain
The administrator defines the initial failover domain when creating
a failover policy. This list is transformed into a run-time failover domain
by the failover script; Linux FailSafe uses the run-time failover domain along
with failover attributes and the node membership to determine the node on
which a resource group should reside. Linux FailSafe stores the run-time failover
domain and uses it as input to the next failover script invocation. Depending
on the run-time conditions and contents of the failover script, the initial
and run-time failover domains may be identical.
In general, Linux FailSafe allocates a given resource group to the first
node listed in the run-time failover domain that is also in the node membership;
the point at which this allocation takes place is affected by the failover
attributes.
Failover Attribute
A failover attribute is a string that affects
the allocation of a resource group in a cluster. The administrator must specify
system attributes (such as Auto_Failback or Controlled_Failback
), and can optionally supply site-specific attributes.failover attributes
Failover Scripts
A failover script is a shell script that generates
a run-time failover domain and returns it to the Linux FailSafe process. The
Linux FailSafe process ha_fsd applies the failover attributes
and then selects the first node in the returned failover domain that is also
in the current node membership.failover
scriptdescription
The following failover scripts are provided with the Linux FailSafe
release:
ordered, which never changes the initial
failover domain. When using this script, the initial and run-time failover
domains are equivalent.
round-robin, which selects the resource
group owner in a round-robin (circular) fashion. This policy can be used for
resource groups that can be run in any node in the cluster.
If these scripts do not meet your needs, you can create a new failover
script using the information in this guide.
Action Scripts
The action scripts are the set of scripts that
determine how a resource is started, monitored, and stopped. There must be
a set of action scripts specified for each resource type.
action scripts
The following is the complete set of action scripts that can be specified
for each resource type:
exclusive, which verifies that a resource
is not already running
start, which starts a resource
stop, which stops a resource
monitor, which monitors a resource
restart, which restarts a resource on the
same server after a monitoring failure occurs
The release includes action scripts for predefined resource types. If
these scripts fit the resource type that you want to make highly available,
you can reuse them by copying them and modifying them as needed. If none fits,
you can create additional action scripts by using the instructions in this
guide.
Highly Available Services Included with Linux FailSafe
highly available
servicesThe base release includes the software required
to make IP addresses (the IP_address resource type) highly
available.resource type
provided with Linux FailSafe
IP address service
Plug-Ins
Optional software packages, known as plug-ins,
are available to make additional applications highly available.
The following plug-ins are available for Linux FailSafe:
plug-ins
Logical volumes (the volume resource type)
such as provided by LVM
LVM logical volume service
volume resource type
Filesystems such as reiserfs and
ext2fs (the filesystem resource type)XFS file system service
filesystemresource
type
MAC addresses (the MAC_address resource
type)MAC address service
MAC_address
resource type
Linux FailSafe Samba
Linux FailSafe NFS
Linux FailSafe NFS is not part of the core Linux FailSafe software,
but it is documented with the base release.
If you want to create new highly available services, or change the functionality
of the provided failover scripts and action scripts by writing new scripts,
you will use the instructions in this guide. However, not all resources can
be made highly available; see .
Characteristics that Permit an Application to be
Highly Available
The characteristics of an application that can be made highly available
are as follows:high availability characterists
The application can be easily restarted and monitored.
It should be able to recover from failures as does most client/server
software. The failure could be a hardware failure, an operating system failure,
or an application failure. If a node crashed and reboots, client/server software
should be able to attach again automatically.
The application must have a start and stop procedure.
When the application fails over, the instances of the application are
stopped on one node using the stop procedure and restarted on the other node
using the start procedure.
The application can be moved from one node to another after
failures.
If the resource has failed, it must still be possible to run the resource
stop procedure. In addition, the resource must recover from the failed state
when the resource start procedure is executed in another node.
Ensure that there is no affinity for a specific node.
The application does not depend on knowing the primary host
name (as returned by hostname); that is, required resources
can be configured to work with an IP address.
Other resources on which the application depends can be made
highly available. If they are not provided by Linux FailSafe and its optional
products (see ), you must make these resources
highly available, using the information in this guide.
An application itself is not modified to make it highly available.
Overview of the Programming Steps
If you do not want to write the scripts yourself, you can establish
a contract with the Silicon Graphics Professional Services group to create
customized scripts. See:
http://www.sgi.com/services/index.html.
overview of the programming steps programming steps overview
To make an application highly available, follow these steps:
Understand the application and determine:
The configuration required for the application, such as user
names, permissions, data location (volumes), and so on. For more information
about configuration, see the Linux FailSafe Administrator's Guide
.
The other resources on which the application depends. All
interdependent resources must be part of the same resource group.
The resource type that best suits this application.
The number of instances of the resource type that will constitute
the application. (Each instance of a given application, or resource
type, is a separate resource.) For example, a web server may depend
upon two filesystem resources.
The commands and arguments required to start, stop, and monitor
this application (that is, the resources in the resource group).
The order in which all resources in the resource group must
be started and stopped.
Determine whether existing action scripts can be reused. If
they cannot, write a new set of action scripts, using existing scripts and
the templates in /usr/lib/failsafe/resource_types/template
as a guide. See .
Determine whether the existing ordered
or round-robin failover scripts can be reused for the resource
group. If they cannot, write a new failover script. See .
Determine whether an existing resource type can be reused.
If none applies, create a new resource type or modify an existing resource
type. See .
Configure the following in the cluster configuration database
(for more information, see the Linux FailSafe Administrator's Guide
):
Resource group
Resource type
Failover policy
Test the action scripts and failover script. See ,
and .
Do not modify the scripts included with the Linux FailSafe product.
New or customized scripts must have different names from the files included
with the release.