Introduction to Writing Application Scripts Linux FailSafe provides several highly available services for a two–node cluster. These services are monitored by the Linux FailSafe software. You can create additional services that are highly available by using the instructions in this guide. This chapter provides an introduction to Linux FailSafe programming. The sections are as follows: For an overview of the software layers, communication paths, and cluster configuration database, see the Linux FailSafe Administrator's Guide. Concepts In order to use Linux FailSafe, you must understand the concepts in this section.concepts Cluster Node (or Node) A cluster node is a single Linux execution environment. In other words, a single physical or virtual machine. In current Linux environments this will always be an individual computer. The term node is used to indicate this meaning in this guide for brevity, as opposed to any meaning such as a network node. cluster node node Pool A pool is the entire set of nodes having membership in a group of clusters. The clusters are usually close together and should always serve a common purpose. A replicated cluster configuration database is stored on each node in the pool. pool Cluster A cluster is a collection of one or more nodes coupled to each other by networks or other similar interconnections. A cluster belongs to one pool and only one pool. A cluster is identified by a simple name; this name must be unique within the pool. A particular node may be a member of only one cluster. All nodes in a cluster are also in the pool; however, all nodes in the pool are not necessarily in the cluster.cluster Node Membership A node membership is the list of nodes in a cluster on which Linux FailSafe can allocate resource groups. node membership membership Process Membership A process membership process membership is the list of process instances in a cluster that form a process group. There can be multiple process groups per node. Resource A resource is a single physical or logical entity that provides a service to clients or other resources. For example, a resource can be a single disk volume, a particular network address, or an application such as a web server. A resource is generally available for use over time on two or more nodes in a cluster, although it can only be allocated to one node at any given time. resource definition Resources are identified by a resource name and a resource type. One resource can be dependent on one or more other resources; if so, it will not be able to start (that is, be made available for use) unless the dependent resources are also started. Dependent resources must be part of the same resource group and are identified in a resource dependency list. Resource Type A resource type is a particular class of resource. All of the resources in a particular resource type can be handled in the same way for the purposes of failover. Every resource is an instance of exactly one resource type.resource type description A resource type is identified by a simple name; this name should be unique within the cluster. A resource type can be defined for a specific node, or it can be defined for an entire cluster. A resource type definition for a specific node overrides a clusterwide resource type definition with the same name; this allows an individual node to override global settings from a clusterwide resource type definition. Like resources, a resource type can be dependent on one or more other resource types. If such a dependency exists, at least one instance of each of the dependent resource types must be defined. For example, a resource type named Netscape_web might have resource type dependencies on resource types named IP_address and volume . If a resource named web1 is defined with the Netscape_web resource type, then the resource group containing web1 must also contain at least one resource of the type IP_address and one resource of the type volume. The Linux FailSafe software includes some predefined resource types. If these types fit the application you want to make highly available, you can reuse them. If none fit, you can create additional resource types by using the instructions in this guide. Resource Name A resource name identifies a specific instance of a resource type. A resource name must be unique for a given resource type.resourcename Resource Group A resource group is a collection of interdependent resources. A resource group is identified by a simple name; this name must be unique within a cluster. shows an example of the resources and their corresponding resource types for a resource group named WebGroup. resource groupdefinition Example Resource Group Resource Resource Type 10.10.48.22 IP_address /fs1 filesystem vol1 volume web1 Netscape_web
If any individual resource in a resource group becomes unavailable for its intended use, then the entire resource group is considered unavailable. Therefore, a resource group is the unit of failover. Resource groups cannot overlap; that is, two resource groups cannot contain the same resource. For information about configuring resource groups, see the Linux FailSafe Administrator's Guide.
Resource Dependency List A resource dependency list is a list of resources upon which a resource depends. Each resource instance must have resource dependencies that satisfy its resource type dependencies before it can be added to a resource group. Resource Type Dependency List A resource type dependency list is a list of resource types upon which a resource type depends. For example, the filesystem resource type depends upon the volume resource type, and the Netscape_web resource type depends upon the filesystem and IP_address resource types.resource type dependency list dependency list For example, suppose a file system instance fs1 is mounted on volume vol1. Before fs1 can be added to a resource group, fs1 must be defined to depend on vol1. Linux FailSafe only knows that a file system instance must have one volume instance in its dependency list. This requirement is inferred from the resource type dependency list. resourcedependency list Failover A failover is the process of allocating a resource group (or application) to another node, according to a failover policy. A failover may be triggered by the failure of a resource, a change in the node membership (such as when a node fails or starts), or a manual request by the administrator.failover Failover Policy A failover policy is the method used by Linux FailSafe to determine the destination node of a failover. A failover policy consists of the following: Failover domain Failover attributes Failover script Linux FailSafe uses the failover domain output from a failover script along with failover attributes to determine on which node a resource group should reside. The administrator must configure a failover policy for each resource group. A failover policy name must be unique within the pool. Linux FailSafe includes predefined failover policies, but youcan define your own failover algorithms as well. failover policy Failover Domain A failover domain is the ordered list of nodes on which a given resource group can be allocated. The nodes listed in the failover domain must be within the same cluster; however, the failover domain does not have to include every node in the cluster. failover domain domain application failover domain The administrator defines the initial failover domain when creating a failover policy. This list is transformed into a run-time failover domain by the failover script; Linux FailSafe uses the run-time failover domain along with failover attributes and the node membership to determine the node on which a resource group should reside. Linux FailSafe stores the run-time failover domain and uses it as input to the next failover script invocation. Depending on the run-time conditions and contents of the failover script, the initial and run-time failover domains may be identical. In general, Linux FailSafe allocates a given resource group to the first node listed in the run-time failover domain that is also in the node membership; the point at which this allocation takes place is affected by the failover attributes. Failover Attribute A failover attribute is a string that affects the allocation of a resource group in a cluster. The administrator must specify system attributes (such as Auto_Failback or Controlled_Failback ), and can optionally supply site-specific attributes.failover attributes Failover Scripts A failover script is a shell script that generates a run-time failover domain and returns it to the Linux FailSafe process. The Linux FailSafe process ha_fsd applies the failover attributes and then selects the first node in the returned failover domain that is also in the current node membership.failover scriptdescription The following failover scripts are provided with the Linux FailSafe release: ordered, which never changes the initial failover domain. When using this script, the initial and run-time failover domains are equivalent. round-robin, which selects the resource group owner in a round-robin (circular) fashion. This policy can be used for resource groups that can be run in any node in the cluster. If these scripts do not meet your needs, you can create a new failover script using the information in this guide. Action Scripts The action scripts are the set of scripts that determine how a resource is started, monitored, and stopped. There must be a set of action scripts specified for each resource type. action scripts The following is the complete set of action scripts that can be specified for each resource type: exclusive, which verifies that a resource is not already running start, which starts a resource stop, which stops a resource monitor, which monitors a resource restart, which restarts a resource on the same server after a monitoring failure occurs The release includes action scripts for predefined resource types. If these scripts fit the resource type that you want to make highly available, you can reuse them by copying them and modifying them as needed. If none fits, you can create additional action scripts by using the instructions in this guide.
Highly Available Services Included with Linux FailSafe highly available servicesThe base release includes the software required to make IP addresses (the IP_address resource type) highly available.resource type provided with Linux FailSafe IP address service Plug-Ins Optional software packages, known as plug-ins, are available to make additional applications highly available. The following plug-ins are available for Linux FailSafe: plug-ins Logical volumes (the volume resource type) such as provided by LVM LVM logical volume service volume resource type Filesystems such as reiserfs and ext2fs (the filesystem resource type)XFS file system service filesystemresource type MAC addresses (the MAC_address resource type)MAC address service MAC_address resource type Linux FailSafe Samba Linux FailSafe NFS Linux FailSafe NFS is not part of the core Linux FailSafe software, but it is documented with the base release. If you want to create new highly available services, or change the functionality of the provided failover scripts and action scripts by writing new scripts, you will use the instructions in this guide. However, not all resources can be made highly available; see . Characteristics that Permit an Application to be Highly Available The characteristics of an application that can be made highly available are as follows:high availability characterists The application can be easily restarted and monitored. It should be able to recover from failures as does most client/server software. The failure could be a hardware failure, an operating system failure, or an application failure. If a node crashed and reboots, client/server software should be able to attach again automatically. The application must have a start and stop procedure. When the application fails over, the instances of the application are stopped on one node using the stop procedure and restarted on the other node using the start procedure. The application can be moved from one node to another after failures. If the resource has failed, it must still be possible to run the resource stop procedure. In addition, the resource must recover from the failed state when the resource start procedure is executed in another node. Ensure that there is no affinity for a specific node. The application does not depend on knowing the primary host name (as returned by hostname); that is, required resources can be configured to work with an IP address. Other resources on which the application depends can be made highly available. If they are not provided by Linux FailSafe and its optional products (see ), you must make these resources highly available, using the information in this guide. An application itself is not modified to make it highly available. Overview of the Programming Steps If you do not want to write the scripts yourself, you can establish a contract with the Silicon Graphics Professional Services group to create customized scripts. See: http://www.sgi.com/services/index.html. overview of the programming steps programming steps overview To make an application highly available, follow these steps: Understand the application and determine: The configuration required for the application, such as user names, permissions, data location (volumes), and so on. For more information about configuration, see the Linux FailSafe Administrator's Guide . The other resources on which the application depends. All interdependent resources must be part of the same resource group. The resource type that best suits this application. The number of instances of the resource type that will constitute the application. (Each instance of a given application, or resource type, is a separate resource.) For example, a web server may depend upon two filesystem resources. The commands and arguments required to start, stop, and monitor this application (that is, the resources in the resource group). The order in which all resources in the resource group must be started and stopped. Determine whether existing action scripts can be reused. If they cannot, write a new set of action scripts, using existing scripts and the templates in /usr/lib/failsafe/resource_types/template as a guide. See . Determine whether the existing ordered or round-robin failover scripts can be reused for the resource group. If they cannot, write a new failover script. See . Determine whether an existing resource type can be reused. If none applies, create a new resource type or modify an existing resource type. See . Configure the following in the cluster configuration database (for more information, see the Linux FailSafe Administrator's Guide ): Resource group Resource type Failover policy Test the action scripts and failover script. See , and . Do not modify the scripts included with the Linux FailSafe product. New or customized scripts must have different names from the files included with the release.