<!-- Fragment document type declaration subset:
ArborText, Inc., 1988-1997, v.4001
<!DOCTYPE SET PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
<!ENTITY ha.cluster.messages SYSTEM "figures/ha.cluster.messages.eps" NDATA eps>
<!ENTITY machine.not.in.ha.cluster SYSTEM "figures/machine.not.in.ha.cluster.eps" NDATA eps>
<!ENTITY ha.cluster.config.info.flow SYSTEM "figures/ha.cluster.config.info.flow.eps" NDATA eps>
<!ENTITY software.layers SYSTEM "figures/software.layers.eps" NDATA eps>
<!ENTITY n1n4 SYSTEM "figures/n1n4.eps" NDATA eps>
<!ENTITY example.sgml SYSTEM "example.sgml">
<!ENTITY appupgrade.sgml SYSTEM "appupgrade.sgml">
<!ENTITY a1-1.failsafe.components SYSTEM "figures/a1-1.failsafe.components.eps" NDATA eps>
<!ENTITY a1-6.disk.storage.takeover SYSTEM "figures/a1-6.disk.storage.takeover.eps" NDATA eps>
<!ENTITY a2-3.non.shared.disk.config SYSTEM "figures/a2-3.non.shared.disk.config.eps" NDATA eps>
<!ENTITY a2-4.shared.disk.config SYSTEM "figures/a2-4.shared.disk.config.eps" NDATA eps>
<!ENTITY a2-5.shred.disk.2active.cnfig SYSTEM "figures/a2-5.shred.disk.2active.cnfig.eps" NDATA eps>
<!ENTITY a2-1.examp.interface.config SYSTEM "figures/a2-1.examp.interface.config.eps" NDATA eps>
<!ENTITY intro.sgml SYSTEM "intro.sgml">
<!ENTITY planning.sgml SYSTEM "planning.sgml">
<!ENTITY nodeconfig.sgml SYSTEM "nodeconfig.sgml">
<!ENTITY admintools.sgml SYSTEM "admintools.sgml">
<!ENTITY config.sgml SYSTEM "config.sgml">
<!ENTITY operate.sgml SYSTEM "operate.sgml">
<!ENTITY diag.sgml SYSTEM "diag.sgml">
<!ENTITY recover.sgml SYSTEM "recover.sgml">
<!ENTITY clustproc.sgml SYSTEM "clustproc.sgml">
<!ENTITY appfiles.sgml SYSTEM "appfiles.sgml">
<!ENTITY gloss.sgml SYSTEM "gloss.sgml">
<!ENTITY preface.sgml SYSTEM "preface.sgml">
<!ENTITY index.sgml SYSTEM "index.sgml">
]>
-->
<chapter id="LE73529-PARENT">
<title id="LE73529-TITLE">Overview of the Linux FailSafe System</title>
<para>This chapter provides an overview of the components and operation of
the Linux FailSafe system. It contains these major sections:</para>
<itemizedlist><?Pub Dtl>
<listitem><para><xref linkend="LE27299-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE89728-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE94860-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE20463-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE32900-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE45765-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE79484-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE85141-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE19101-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE19267-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE24477-PARENT"></para>
</listitem>
</itemizedlist>
<sect1 id="LE27299-PARENT">
<title id="LE27299-TITLE">High Availability and Linux FailSafe</title>
<para>In the world of mission critical computing, the availability of information
and computing resources is extremely important. The availability of a system
is affected by how long it is unavailable after a failure in any of its components.
Different degrees of availability are provided by different types of systems:
</para>
<itemizedlist>
<listitem><para><indexterm id="IToverview-0"><primary>fault-tolerant systems,
definition</primary></indexterm> Fault-tolerant systems (continuous availability).
These systems use redundant components and specialized logic to ensure continuous
operation and to provide complete data integrity. On these systems the degree
of availability is extremely high. Some of these systems can also tolerate
outages due to hardware or software upgrades (continuous availability). This
solution is very expensive and requires specialized hardware and software.
</para>
</listitem>
<listitem><para>Highly available systems. These systems survive single points
of failure by using redundant off-the-shelf components and specialized software.
They provide a lower degree of availability than the fault-tolerant systems,
but at much lower cost. Typically these systems provide high availability
only for client/server applications, and base their redundancy on cluster
architectures with shared resources.</para>
</listitem>
</itemizedlist>
<para>The Silicon Graphics® Linux FailSafe product provides a general
facility for providing highly available services. Linux FailSafe provides
highly available services for a cluster that contains multiple nodes (<replaceable>
N</replaceable>-node configuration). Using Linux FailSafe, you can configure
a highly available system in any of the following topologies:</para>
<itemizedlist>
<listitem><para>Basic two-node configuration</para>
</listitem>
<listitem><para>Ring configuration</para>
</listitem>
<listitem><para>Star configuration, in which multiple applications running
on multiple nodes are backed up by one node</para>
</listitem>
<listitem><para>Symmetric pool configuration</para>
</listitem>
</itemizedlist>
<para>These configurations provide redundancy of processors and I/O controllers.
Redundancy of storage can either be obtained through the use of multi-hosted
RAID disk devices and mirrored disks, or with redundant disk systems which
are kept in synchronization.</para>
<para>If one of the nodes in the cluster or one of the nodes' components fails,
a different node in the cluster restarts the highly available services of
the failed node. To clients, the services on the replacement node are indistinguishable
from the original services before failure occurred. It appears as if the original
node has crashed and rebooted quickly. The clients notice only a brief interruption
in the highly available service.</para>
<para>In a Linux FailSafe highly available system, nodes can serve as backup
for other nodes. Unlike the backup resources in a fault-tolerant system, which
serve purely as redundant hardware for backup in case of failure, the resources
of each node in a highly available system can be used during normal operation
to run other applications that are not necessarily highly available services.
All highly available services are owned and accessed by one node at a time.
</para>
<para>Highly available services are monitored by the Linux FailSafe software.
During normal operation, if a failure is detected on any of these components,
a <firstterm>failover</firstterm> process is initiated. Using Linux FailSafe,
you can define a failover policy to establish which node will take over the
services under what conditions. This process consists of resetting the failed
node (to ensure data consistency), doing any recovery required by the failed
over services, and quickly restarting the services on the node that will take
them over.</para>
<para>Linux FailSafe supports <firstterm>selective failover</firstterm> in
which individual highly available applications can be failed over to a backup
node independent of the other highly available applications on that node.
</para>
<para>Linux FailSafe highly available services fall into two groups: highly
available resources and highly available applications. Highly available resources
include network interfaces, logical volumes, and filesystems such as ext2f
or reiserfs that have been configured for Linux FailSafe. Silicon Graphics
has also developed Linux FailSafe NFS. Highly available applications can include
applications such as NFS, Apache, etc.</para>
<para><indexterm id="IToverview-1"><primary>Linux FailSafe</primary><secondary><emphasis>
See</emphasis> Linux FailSafe</secondary></indexterm> Linux FailSafe provides
a framework for making additional applications into highly available services.
If you want to add highly available applications on a Linux FailSafe cluster,
you must write scripts to handle application monitoring functions. Information
on developing these scripts is described in the <citetitle>Linux FailSafe
Programmer's Guide</citetitle>. If you need assistance in this regard, contact
SGI Global Services, which offers custom Linux FailSafe agent development
and HA integration services.</para>
</sect1>
<sect1 id="LE89728-PARENT">
<title id="LE60545-TITLE">Concepts</title>
<para>In order to use Linux FailSafe, you must understand the concepts in
this section.<indexterm><primary>concepts</primary></indexterm></para>
<sect2>
<title>Cluster Node (or Node)</title>
<para>A <firstterm>cluster node</firstterm> is a single Linux execution environment.
In other words, a single physical or virtual machine. In current Linux environments
this will always be an individual computer. The term <firstterm>node</firstterm>
is used to indicate this meaning in this guide for brevity, as opposed to
any meaning such as a network node. <indexterm><primary>cluster node</primary>
</indexterm> <indexterm id="IToverview-2"><primary>node</primary></indexterm></para>
</sect2>
<sect2>
<title>Pool</title>
<para>A <firstterm>pool</firstterm> is the entire set of nodes having membership
in a group of clusters. The clusters are usually close together and should
always serve a common purpose. A replicated cluster configuration database
is stored on each node in the pool. <indexterm id="IToverview-3"><primary>
pool</primary></indexterm></para>
</sect2>
<sect2>
<title>Cluster</title>
<para>A <firstterm>cluster</firstterm> is a collection of one or more nodes
coupled to each other by networks or other similar interconnections. A cluster
belongs to one pool and only one pool. A cluster is identified by a simple
name; this name must be unique within the pool. A particular node may be
a member of only one cluster. All nodes in a cluster are also in the pool;
however, all nodes in the pool are not necessarily in the cluster.<indexterm
id="IToverview-4"><primary>cluster</primary></indexterm></para>
</sect2>
<sect2>
<title>Node Membership</title>
<para>A <firstterm>node membership</firstterm> is the list of nodes in a cluster
on which Linux FailSafe can allocate resource groups.<indexterm id="IToverview-5">
<primary>node membership</primary></indexterm> <indexterm id="IToverview-6">
<primary>membership</primary></indexterm></para>
</sect2>
<sect2>
<title>Process Membership</title>
<para>A <indexterm id="IToverview-7"><primary>process</primary><secondary>
membership</secondary></indexterm> <firstterm>process membership</firstterm>
is the list of process instances in a cluster that form a process group. There
can be multiple process groups per node.</para>
</sect2>
<sect2>
<title>Resource</title>
<para>A <firstterm>resource</firstterm> is a single physical or logical entity
that provides a service to clients or other resources. For example, a resource
can be a single disk volume, a particular network address, or an application
such as a web server. A resource is generally available for use over time
on two or more nodes in a cluster, although it can only be allocated to one
node at any given time. <indexterm id="IToverview-8"><primary>resource</primary>
<secondary>definition</secondary></indexterm></para>
<para>Resources are identified by a resource name and a resource type. One
resource can be dependent on one or more other resources; if so, it will not
be able to start (that is, be made available for use) unless the dependent
resources are also started. Dependent resources must be part of the same resource
group and are identified in a resource dependency list.</para>
</sect2>
<sect2>
<title>Resource Type</title>
<para>A <firstterm>resource type</firstterm> is a particular class of resource.
All of the resources in a particular resource type can be handled in the same
way for the purposes of failover. Every resource is an instance of exactly
one resource type.<indexterm id="IToverview-10"><primary>resource type</primary>
<secondary>description</secondary></indexterm></para>
<para>A resource type is identified by a simple name; this name should be
unique within the cluster. A resource type can be defined for a specific node,
or it can be defined for an entire cluster. A resource type definition for
a specific node overrides a clusterwide resource type definition with the
same name; this allows an individual node to override global settings from
a clusterwide resource type definition.</para>
<para>Like resources, a resource type can be dependent on one or more other
resource types. If such a dependency exists, at least one instance of each
of the dependent resource types must be defined. For example, a resource type
named <literal>Netscape_web</literal> might have resource type dependencies
on resource types named <literal>IP_address</literal> and <literal>volume
</literal>. If a resource named <literal>web1</literal> is defined with the <literal>
Netscape_web</literal> resource type, then the resource group containing <literal>
web1</literal> must also contain at least one resource of the type <literal>
IP_address</literal> and one resource of the type <literal>volume</literal>.
</para>
<para>The Linux FailSafe software includes some predefined resource types.
If these types fit the application you want to make highly available, you
can reuse them. If none fit, you can create additional resource types by using
the instructions in the <citetitle>Linux FailSafe Programmer's Guide</citetitle>.
</para>
</sect2>
<sect2>
<title>Resource Name</title>
<para>A <firstterm>resource name</firstterm> identifies a specific instance
of a resource type. A resource name must be unique for a given resource type.<indexterm
id="IToverview-9"><primary>resource</primary><secondary>name</secondary></indexterm></para>
</sect2>
<sect2>
<title>Resource Group</title>
<para>A <firstterm>resource group</firstterm> is a collection of interdependent
resources. A resource group is identified by a simple name; this name must
be unique within a cluster. <xref linkend="LE99232-PARENT"> shows an example
of the resources and their corresponding resource types for a resource group
named <literal>WebGroup.</literal> <indexterm id="IToverview-11"><primary>
resource group</primary><secondary>definition</secondary></indexterm></para>
<table frame="topbot" id="LE99232-PARENT">
<title id="LE99232-TITLE">Example Resource Group</title>
<tgroup cols="2" colsep="0" rowsep="0">
<colspec colwidth="198*">
<colspec colwidth="198*">
<thead>
<row rowsep="1"><entry align="left" valign="bottom"><para>Resource</para></entry>
<entry align="left" valign="bottom"><para>Resource Type</para></entry></row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><para><literal>10.10.48.22</literal></para></entry>
<entry align="left" valign="top"><para><literal>IP_address</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>/fs1</literal></para></entry>
<entry align="left" valign="top"><para><literal>filesystem</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>vol1</literal></para></entry>
<entry align="left" valign="top"><para><literal>volume</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>web1</literal></para></entry>
<entry align="left" valign="top"><para><literal>Netscape_web</literal></para></entry>
</row>
</tbody>
</tgroup>
</table>
<para>If any individual resource in a resource group becomes unavailable for
its intended use, then the entire resource group is considered unavailable.
Therefore, a resource group is the unit of failover.</para>
<para>Resource groups cannot overlap; that is, two resource groups cannot
contain the same resource.</para>
</sect2>
<sect2>
<title>Resource Dependency List</title>
<para>A <firstterm>resource dependency list</firstterm> is a list of resources
upon which a resource depends. Each resource instance must have resource dependencies
that satisfy its resource type dependencies before it can be added to a resource
group.</para>
</sect2>
<sect2>
<title>Resource Type Dependency List</title>
<para>A <firstterm>resource type dependency list</firstterm> is a list of
resource types upon which a resource type depends. For example, the <literal>
filesystem</literal> resource type depends upon the <literal>volume</literal>
resource type, and the <literal>Netscape_web</literal> resource type depends
upon the <literal>filesystem</literal> and <literal>IP_address</literal> resource
types.<indexterm id="IToverview-12"><primary>resource type</primary><secondary>
dependency list</secondary></indexterm> <indexterm id="IToverview-13"><primary>
dependency list</primary></indexterm></para>
<para>For example, suppose a file system instance <literal>fs1</literal> is
mounted on volume <literal>vol1</literal>. Before <literal>fs1</literal> can
be added to a resource group, <literal>fs1</literal> must be defined to depend
on <literal>vol1</literal>. Linux FailSafe only knows that a file system instance
must have one volume instance in its dependency list. This requirement is
inferred from the resource type dependency list. <indexterm id="IToverview-14">
<primary>resource</primary><secondary>dependency list</secondary></indexterm></para>
</sect2>
<sect2>
<title>Failover</title>
<para>A <firstterm>failover</firstterm> is the process of allocating a resource
group (or application) to another node, according to a failover policy. A
failover may be triggered by the failure of a resource, a change in the node
membership (such as when a node fails or starts), or a manual request by the
administrator.<indexterm id="IToverview-15"><primary>failover</primary></indexterm></para>
</sect2>
<sect2>
<title>Failover Policy</title>
<para>A <firstterm>failover policy</firstterm> is the method used by Linux
FailSafe to determine the destination node of a failover. A failover policy
consists of the following:</para>
<itemizedlist>
<listitem><para>Failover domain</para>
</listitem>
<listitem><para>Failover attributes</para>
</listitem>
<listitem><para>Failover script</para>
</listitem>
</itemizedlist>
<para>Linux FailSafe uses the failover domain output from a failover script
along with failover attributes to determine on which node a resource group
should reside.</para>
<para>The administrator must configure a failover policy for each resource
group. A failover policy name must be unique within the pool. Linux FailSafe
includes predefined failover policies, but you can define your own failover
algorithms as well. <indexterm id="IToverview-16"><primary>failover policy
</primary></indexterm></para>
</sect2>
<sect2>
<title>Failover Domain</title>
<para>A <firstterm>failover domain</firstterm> is the ordered list of nodes
on which a given resource group can be allocated. The nodes listed in the
failover domain must be within the same cluster; however, the failover domain
does not have to include every node in the cluster.<indexterm id="IToverview-17">
<primary>failover domain</primary></indexterm> <indexterm id="IToverview-18">
<primary>domain</primary></indexterm> <indexterm id="IToverview-19"><primary>
application failover domain</primary></indexterm>  </para>
<para>The administrator defines the initial failover domain when creating
a failover policy. This list is transformed into a run-time failover domain
by the failover script; Linux FailSafe uses the run-time failover domain along
with failover attributes and the node membership to determine the node on
which a resource group should reside. Linux FailSafe stores the run-time failover
domain and uses it as input to the next failover script invocation. Depending
on the run-time conditions and contents of the failover script, the initial
and run-time failover domains may be identical.</para>
<para>In general, Linux FailSafe allocates a given resource group to the first
node listed in the run-time failover domain that is also in the node membership;
the point at which this allocation takes place is affected by the failover
attributes.</para>
</sect2>
<sect2>
<title>Failover Attribute</title>
<para>A <firstterm>failover attribute</firstterm> is a string that affects
the allocation of a resource group in a cluster. The administrator must specify
system attributes (such as <?Pub _nolinebreak><literal>Auto_Failback</literal><?Pub /_nolinebreak> or <?Pub _nolinebreak><literal>
Controlled_Failback</literal><?Pub /_nolinebreak>), and can optionally supply
site-specific attributes.<indexterm id="IToverview-20"><primary>failover attributes
</primary></indexterm></para>
</sect2>
<sect2>
<title>Failover Scripts</title>
<para>A <firstterm>failover script</firstterm> is a shell script that generates
a run-time failover domain and returns it to the Linux FailSafe process. The
Linux FailSafe process <literal>ha_fsd</literal> applies the failover attributes
and then selects the first node in the returned failover domain that is also
in the current node membership.<indexterm id="IToverview-21"><primary>failover
script</primary><secondary>description</secondary></indexterm></para>
<para>The following failover scripts are provided with the Linux FailSafe
release:</para>
<itemizedlist>
<listitem><para><filename>ordered</filename>, which never changes the initial
failover domain. When using this script, the initial and run-time failover
domains are equivalent.</para>
</listitem>
<listitem><para><filename>round-robin</filename>, which selects the resource
group owner in a round-robin (circular) fashion. This policy can be used for
resource groups that can be run in any node in the cluster.</para>
</listitem>
</itemizedlist>
<para>If these scripts do not meet your needs, you can create a new failover
script using the information in this guide.</para>
</sect2>
<sect2>
<title>Action Scripts</title>
<para>The <firstterm>action scripts</firstterm> are the set of scripts that
determine how a resource is started, monitored, and stopped. There must be
a set of action scripts specified for each resource type.<indexterm id="IToverview-22">
<primary>action scripts</primary></indexterm></para>
<para>The following is the complete set of action scripts that can be specified
for each resource type:</para>
<itemizedlist>
<listitem><para><literal>exclusive</literal>, which verifies that a resource
is not already running</para>
</listitem>
<listitem><para><literal>start</literal>, which starts a resource</para>
</listitem>
<listitem><para><literal>stop</literal>, which stops a resource</para>
</listitem>
<listitem><para><literal>monitor</literal>, which monitors a resource</para>
</listitem>
<listitem><para><literal>restart</literal>, which restarts a resource on the
same server after a monitoring failure occurs</para>
</listitem>
</itemizedlist>
<para>The release includes action scripts for predefined resource types. If
these scripts fit the resource type that you want to make highly available,
you can reuse them by copying them and modifying them as needed. If none fits,
you can create additional action scripts by using the instructions in the <citetitle>
Linux FailSafe Programmer's Guide</citetitle>.</para>
</sect2>
</sect1>
<sect1 id="LE94860-PARENT">
<title id="LE94860-TITLE">Additional Linux FailSafe Features</title>
<para><indexterm><primary>Linux FailSafe</primary><secondary>features</secondary>
</indexterm>Linux FailSafe provides the following features to increase the
flexibility and ease of operation of a highly available system:</para>
<itemizedlist>
<listitem><para>Dynamic management</para>
</listitem>
<listitem><para>Fine grain failover</para>
</listitem>
<listitem><para>Local restarts</para>
</listitem>
</itemizedlist>
<para>These features are summarized in the following sections.</para>
<sect2>
<title>Dynamic Management</title>
<para>Linux FailSafe allows you to perform a variety of administrative tasks
while the system is running:</para>
<itemizedlist>
<listitem><para>Dynamically managed application monitoring</para>
<para>Linux FailSafe allows you to turn monitoring of an application on and
off while other highly available applications continue to run. This allows
you to perform online application upgrades without bringing down the Linux
FailSafe system.</para>
</listitem>
<listitem><para>Dynamically managed Linux FailSafe resources</para>
<para>Linux FailSafe allows you to add resources while the Linux FailSafe
system is online.</para>
</listitem>
<listitem><para>Dynamically managed Linux FailSafe upgrades</para>
<para>Linux FailSafe allows you to upgrade Linux FailSafe software on one
node at a time without taking down the entire Linux FailSafe cluster.</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Fine Grain Failover</title>
<para>Using Linux FailSafe, you can specify <firstterm>fine-grain failover
</firstterm>. Fine-grain failover is a process in which a specific resource
group is failed over from one node to another node while other resource groups
continue to run on the first node, where possible. Fine-grain failover is
possible in Linux FailSafe because the unit of failover is the resource group,
and not the entire node.</para>
</sect2>
<sect2>
<title>Local Restarts</title>
<para>Linux FailSafe allows you to fail over a resource group onto the same
node. This feature enables you to configure a single-node system, where backup
for a particular application is provided on the same machine, if possible.
It also enables you to indicate that a specified number of local restarts
be attempted before the resource group fails over to a different node.</para>
</sect2>
</sect1>
<sect1 id="LE20463-PARENT">
<title id="LE20463-TITLE">Linux FailSafe Administration</title>
<para>You can perform all Linux FailSafe administrative tasks by means of
the Linux FailSafe Cluster Manager Graphical User Interface (GUI). The Linux
FailSafe GUI provides a guided interface to configure, administer, and monitor
a Linux FailSafe-controlled highly available cluster. The Linux FailSafe GUI
also provides screen-by-screen help text.</para>
<para>If you wish, you can perform Linux FailSafe administrative tasks directly
by means of the Linux FailSafe Cluster Manager CLI, which provides a command-line
interface for the administration tasks.</para>
<para>For information on Linux FailSafe Cluster Manager tools, see <xref linkend="LE73346-PARENT">.
</para>
<para>For information on Linux FailSafe configuration and administration tasks,
see <xref linkend="LE94219-PARENT">, and <xref linkend="LE99367-PARENT">.
</para>
</sect1>
<sect1 id="LE32900-PARENT">
<title id="LE32900-TITLE">Hardware Components of a Linux FailSafe Cluster
</title>
<para><indexterm><primary>Linux FailSafe</primary><secondary>hardware components
</secondary></indexterm> <xref linkend="LE72758-PARENT">, shows an example
of Linux FailSafe hardware components, in this case for a two-node system.
</para>
<para><figure id="LE72758-PARENT">
<title id="LE72758-TITLE">Sample Linux FailSafe System Components</title>
<graphic entityref="a1-1.failsafe.components"></graphic>
</figure></para>
<para>The hardware components of the Linux FailSafe system are as follows:
</para>
<itemizedlist>
<listitem><para>Up to eight Linux nodes</para>
</listitem>
<listitem><para>Two or more interfaces on each node to control networks (Ethernet,
FDDI, or any other available network interface)</para>
<para>At least two network interfaces on each node are required for the control
network <firstterm>heartbeat</firstterm> connection, by which each node monitors
the state of other nodes. The Linux FailSafe software also uses this connection
to pass <firstterm>control</firstterm> messages between nodes. These interfaces
have distinct IP addresses.</para>
</listitem>
<listitem><para>A mechanism for remote reset of nodes</para>
<para>A reset ensures that the failed node is not using the shared disks when
the replacement node takes them over.</para>
</listitem>
<listitem><para>Disk storage and SCSI bus shared by the nodes in the cluster
</para>
<para>The nodes in the Linux FailSafe system can share dual-hosted disk storage
over a shared fast and wide SCSI bus where this is supported by the SCSI controller
and Linux driver. <note>
<para>Note that few Linux drivers are currently known to implement this correctly.
Please check hardware compatibility lists if this is a configuration you
plan to use. Fibre Channel solutions should universally support this.</para>
</note> <note>
<para>The Linux FailSafe system is designed to survive a single point of failure.
Therefore, when a system component fails, it must be restarted, repaired,
or replaced as soon as possible to avoid the possibility of two or more failed
components.</para>
</note></para>
</listitem>
</itemizedlist>
</sect1>
<sect1 id="LE45765-PARENT">
<title id="LE45765-TITLE">Linux FailSafe Disk Connections</title>
<para>A Linux FailSafe system supports the following disk connections:</para>
<itemizedlist>
<listitem><para>RAID support</para>
<itemizedlist>
<listitem><para>Single controller or dual controllers</para>
</listitem>
<listitem><para>Single or dual hubs</para>
</listitem>
<listitem><para>Single or dual pathing</para>
</listitem>
</itemizedlist>
</listitem>
<listitem><para>JBOD support</para>
<itemizedlist>
<listitem><para>Single or dual vaults</para>
</listitem>
<listitem><para>Single or dual hubs</para>
</listitem>
</itemizedlist>
</listitem>
<listitem><para>Network-mirrored support</para>
<itemizedlist>
<listitem><para>Clustered filesystems such as GFS</para>
</listitem>
<listitem><para>Network mirroring block devices such as with DRBD</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
<note>
<para>Network mirrored devices are not discussed in the examples within this
guide. However, the Linux FailSafe configuration items that are set for shared
storage apply validly to network-duplicated storage.</para>
</note>
<para>SCSI disks can be connected to two machines only. Fibre channel disks
can be connected to multiple machines.</para>
</sect1>
<sect1 id="LE79484-PARENT">
<title id="LE79484-TITLE">Linux FailSafe Supported Configurations</title>
<para>Linux FailSafe supports the following highly available configurations:
</para>
<itemizedlist>
<listitem><para>Basic two-node configuration</para>
</listitem>
<listitem><para>Star configuration of multiple primary and 1 backup node</para>
</listitem>
<listitem><para>Ring configuration</para>
</listitem>
</itemizedlist>
<para>You can use the following reset models when configuring a Linux FailSafe
system:</para>
<itemizedlist>
<listitem><para>Server-to-server. Each server is directly connected to another
for reset. May be unidirectional.</para>
</listitem>
<listitem><para>Network. Each server can reset any other by sending a signal
over the control network to a multiplexer.</para>
</listitem>
</itemizedlist>
<para>The following sections provide descriptions of the different Linux FailSafe
configurations.</para>
<sect2>
<title>Basic Two-Node Configuration</title>
<para>In a basic two-node configuration, the following arrangements are possible:
</para>
<itemizedlist>
<listitem><para>All highly available services run on one node. The other node
is the backup node. After failover, the services run on the backup node. In
this case, the backup node is a hot standby for failover purposes only. The
backup node can run other applications that are not highly available services.
</para>
</listitem>
<listitem><para>Highly available services run concurrently on both nodes.
For each service, the other node serves as a backup node. For example, both
nodes can be exporting different NFS filesystems. If a failover occurs, one
node then exports all of the NFS filesystems.</para>
</listitem>
</itemizedlist>
</sect2>
</sect1>
<sect1 id="LE85141-PARENT">
<title id="LE85141-TITLE">Highly Available Resources</title>
<para>This section discusses the highly available resources that are provided
on a Linux FailSafe system.</para>
<sect2>
<title>Nodes</title>
<para>If a node crashes or hangs (for example, due to a parity error or bus
error), the Linux FailSafe software detects this. A different node, determined
by the failover policy, takes over the failed node's services after resetting
the failed node.</para>
<para>If a node fails, the interfaces, access to storage, and services also
become unavailable. See the succeeding sections for descriptions of how the
Linux FailSafe system handles or eliminates these points of failure.</para>
</sect2>
<sect2 id="LE80214-PARENT">
<title id="LE80214-TITLE">Network Interfaces and IP Addresses</title>
<para><indexterm><primary>network interface</primary><secondary>overview</secondary>
</indexterm> <indexterm><primary>IP address</primary><secondary>overview</secondary>
</indexterm>Clients access the highly available services provided by the Linux
FailSafe cluster using IP addresses. Each highly available service can use
multiple IP addresses. The IP addresses are not tied to a particular highly
available service; they can be shared by all the highly available services
in the cluster.</para>
<para>Linux FailSafe uses the IP aliasing mechanism to support multiple IP
addresses on a single network interface. Clients can use a highly available
service that uses multiple IP addresses even when there is only one network
interface in the server node.</para>
<para>The IP aliasing mechanism allows a Linux FailSafe configuration that
has a node with multiple network interfaces to be backed up by a node with
a single network interface. IP addresses configured on multiple network interfaces
are moved to the single interface on the other node in case of a failure.
</para>
<para>Linux FailSafe requires that each network interface in a cluster have
an IP address that does not failover. These IP addresses, called <firstterm>
fixed IP addresses</firstterm>, are used to monitor network interfaces. Each
fixed IP address must be configured to a network interface at system boot
up time. All other IP addresses in the cluster are configured as <firstterm>
highly available IP addresses</firstterm>.</para>
<para>Highly available IP addresses are configured on a network interface.
During failover and recovery processes they are moved to another network interface
in the other node by Linux FailSafe. Highly available IP addresses are specified
when you configure the Linux FailSafe system. Linux FailSafe uses the <command>
ifconfig</command> command to configure an IP address on a network interface
and to move IP addresses from one interface to another.</para>
<para>In some networking implementations, IP addresses cannot be moved from
one interface to another by using only the <command>ifconfig</command> command.
Linux FailSafe uses <firstterm>re-MACing</firstterm> (<firstterm>MAC address
impersonation</firstterm>) to support these networking implementations. Re-MACing
moves the physical (MAC) address of a network interface to another interface.
It is done by using the <command>macconfig</command> command. Re-MACing is
done in addition to the standard <command>ifconfig</command> process that
Linux FailSafe uses to move IP addresses. To do RE-MACing in Linux FailSafe,
a resource of type MAC_Address is used.</para>
<note>
<para>Re-MACing can be used only on Ethernet networks. It cannot be used on
FDDI networks.</para>
</note>
<para>Re-MACing is required when packets called gratuitous ARP packets are
not passed through the network. These packets are generated automatically
when an IP address is added to an interface (as in a failover process). They
announce a new mapping of an IP address to MAC address. This tells clients
on the local subnet that a particular interface now has a particular IP address.
Clients then update their internal ARP caches with the new MAC address for
the IP address. (The IP address just moved from interface to interface.) When
gratuitous ARP packets are not passed through the network, the internal ARP
caches of subnet clients cannot be updated. In these cases, re-MACing is used.
This moves the MAC address of the original interface to the new interface.
Thus, both the IP address and the MAC address are moved to the new interface
and the internal ARP caches of clients do not need updating.</para>
<para>Re-MACing is not done by default; you must specify that it be done for
each pair of primary and secondary interfaces that requires it. A procedure
in the section <xref linkend="LE93615-PARENT"> describes how you can determine
whether re-MACing is required. In general, routers and PC/NFS clients may
require re-MACing interfaces.</para>
<para>A side effect of re-MACing is that the original MAC address of an interface
that has received a new MAC address is no longer available for use. Because
of this, each network interface has to be backed up by a dedicated backup
interface. This backup interface cannot be used by clients as a primary interface.
(After a failover to this interface, packets sent to the original MAC address
are ignored by every node on the network.) Each backup interface backs up
only one network interface.</para>
</sect2>
<sect2>
<title>Disks</title>
<para>The Linux FailSafe cluster can include shared SCSI-based storage in
the form of individual disks, RAID systems, or Fibre Channel storage systems.
</para>
<para><indexterm><primary>disks, shared</primary><secondary>and disk failure
</secondary></indexterm> <indexterm><primary>disks, shared</primary><secondary>
and disk controller failure</secondary></indexterm>With mirrored volumes on
the disks in a RAID or Fibre Channel system, the device system should provide
redundancy. No participation of the Linux FailSafe system software is required
for a disk failure. If a disk controller fails, the Linux FailSafe system
software initiates the failover process.</para>
<para><indexterm><primary>failover</primary><secondary>of disk storage</secondary>
</indexterm><xref linkend="LE77061-PARENT">, shows disk storage takeover on
a two-node system. The surviving node takes over the shared disks and recovers
the logical volumes and filesystems on the disks. This process is expedited
by a filesystem such as ReiserFS or XFS, because of journaling technology
that does not require the use of the <command>fsck</command> command for filesystem
consistency checking.</para>
<para><figure id="LE77061-PARENT">
<title id="LE77061-TITLE">Disk Storage Failover on a Two-Node System</title>
<graphic entityref="a1-6.disk.storage.takeover"></graphic>
</figure></para>
</sect2>
</sect1>
<sect1 id="LE19101-PARENT">
<title id="LE19101-TITLE">Highly Available Applications</title>
<para>Each application has a primary node and up to seven additional nodes
that you can use as a backup node, according to the failover policy you define.
The primary node is the node on which the application runs when Linux FailSafe
is in <firstterm>normal state</firstterm>. When a failure of any highly available
resources or highly available application is detected by Linux FailSafe software,
all highly available resources in the affected resource group on the failed
node are failed over to a different node and the highly available applications
on the failed node are stopped. When these operations are complete, the highly
available applications are started on the backup node.</para>
<para>All information about highly available applications, including the primary
node, components of the resource group, and failover policy for the application
and monitoring, is specified when you configure your Linux FailSafe system
with the Cluster Manager GUI or with the Cluster Manager CLI. Information
on configuring the system is provided in <xref linkend="LE94219-PARENT">.
Monitoring scripts detect the failure of a highly available application.</para>
<para>The Linux FailSafe software provides a framework for making applications
highly available services. By writing scripts and configuring the system in
accordance with those scripts, you can turn client/server applications into
highly available applications. For information, see the <citetitle>Linux
FailSafe Programmer's Guide</citetitle>.</para>
</sect1>
<sect1 id="LE19267-PARENT"><?Pub Dtl>
<title id="LE19267-TITLE">Failover and Recovery Processes</title>
<para><indexterm><primary>failover</primary><secondary>description</secondary>
</indexterm> <indexterm><primary>failover</primary><secondary>and recovery
processes</secondary></indexterm>When a failure is detected on one node (the
node has crashed, hung, or been shut down, or a highly available service is
no longer operating), a different node performs a failover of the highly available
services that are being provided on the node with the failure (called the <firstterm>
failed node</firstterm>). Failover allows all of the highly available services,
including those provided by the failed node, to remain available within the
cluster.</para>
<para>A failure in a highly available service can be detected by Linux FailSafe
processes running on another node. Depending on which node detects the failure,
the sequence of actions following the failure is different.</para>
<para>If the failure is detected by the Linux FailSafe software running on
the same node, the failed node performs these operations:</para>
<itemizedlist>
<listitem><para>Stops the highly available resource group running on the node
</para>
</listitem>
<listitem><para>Moves the highly available resource group to a different node,
according to the defined failover policy for the resource group</para>
</listitem>
<listitem><para>Sends a message to the node that will take over the services
to start providing all resource group services previously provided by the
failed node</para>
</listitem>
</itemizedlist>
<para>When it receives the message, the node that is taking over the resource
group performs these operations:</para>
<itemizedlist>
<listitem><para>Transfers ownership of the resource group from the failed
node to itself</para>
</listitem>
<listitem><para>Starts offering the resource group services that were running
on the failed node</para>
</listitem>
</itemizedlist>
<para>If the failure is detected by Linux FailSafe software running on a different
node, the node detecting the failure performs these operations:</para>
<itemizedlist>
<listitem><para>Using the serial connection between the nodes, reboots the
failed node to prevent corruption of data</para>
</listitem>
<listitem><para>Transfers ownership of the resource group from the failed
node to the other nodes in the cluster, based on the resource group failover
policy.</para>
</listitem>
<listitem><para>Starts offering the resource group services that were running
on the failed node</para>
</listitem>
</itemizedlist>
<para>When a failed node comes back up, whether the node automatically starts
to provide highly available services again depends on the failover policy
you define. For information on defining failover policies, see <xref linkend="fs-definefailover">.
</para>
<para>Normally, a node that experiences a failure automatically reboots and
resumes providing highly available services. This scenario works well for
transient errors (as well as for planned outages for equipment and software
upgrades). However, if there are persistent errors, automatic reboot can cause
recovery and an immediate failover again. To prevent this, the Linux FailSafe
software checks how long the rebooted node has been up since the last time
it was started. If the interval is less than five minutes (by default), the
Linux FailSafe software automatically disables Linux FailSafe from booting
on the failed node and does not start up the Linux FailSafe software on this
node. It also writes error messages to <filename>/var/log/failsafe</filename>
and to the appropriate log file.</para>
</sect1>
<sect1 id="LE24477-PARENT"><?Pub Dtl>
<title id="LE24477-TITLE">Overview of Configuring and Testing a New Linux
FailSafe Cluster</title>
<para>After the Linux FailSafe cluster hardware has been installed, follow
this general procedure to configure and test the Linux FailSafe system:</para>
<orderedlist>
<listitem><para>Become familiar with Linux FailSafe terms by reviewing this
chapter.</para>
</listitem>
<listitem><para>Plan the configuration of highly available applications and
services on the cluster using <xref linkend="LE88622-PARENT">.</para>
</listitem>
<listitem><para>Perform various administrative tasks, including the installation
of prerequisite software, that are required by Linux FailSafe, as described
in <xref linkend="LE32854-PARENT">.</para>
</listitem>
<listitem><para>Define the Linux FailSafe configuration as explained in <xref
linkend="LE94219-PARENT">.</para>
</listitem>
<listitem><para>Test the Linux FailSafe system in three phases: test individual
components prior to starting Linux FailSafe software, test normal operation
of the Linux FailSafe system, and simulate failures to test the operation
of the system after a failure occurs.</para>
</listitem>
</orderedlist>
</sect1>
<sect1 id="LE15726-PARENT">
<title id="LE15726-TITLE">Linux FailSafe System Software </title>
<para>This section describes the software layers, communication paths, and
cluster configuration database.</para>
<sect2>
<title>Layers</title>
<para>A Linux FailSafe system has the following software layers:<indexterm
id="IToverview-36"><primary>system software</primary><secondary>layers</secondary>
</indexterm> <indexterm id="IToverview-37"><primary>layers</primary></indexterm></para>
<itemizedlist>
<listitem><para>Plug-ins, which create highly available services. If the
application plug-in you want is not available, you can hire the Silicon Graphics
Global Services group to develop the required software, or you can use the <citetitle>
Linux FailSafe Programmer's Guide</citetitle> to write the software yourself.<indexterm
id="IToverview-38"><primary>plug-ins</primary></indexterm></para>
</listitem>
<listitem><para>Linux FailSafe base, which includes the ability to define
resource groups and failover policies<indexterm id="IToverview-39"><primary>
base</primary></indexterm></para>
</listitem>
<listitem><para>High-availability cluster infrastructure that lets you define
clusters, resources, and resource types (this consists of the <literal>cluster_services
</literal> installation package) <indexterm id="IToverview-40"><primary>infrastructure
</primary></indexterm> <indexterm id="IToverview-41"><primary>high-availability
</primary><secondary>infrastructure</secondary></indexterm> <indexterm id="IToverview-42">
<primary><literal>cluster_ha</literal> subsystem</primary></indexterm> <indexterm
id="IToverview-43"><primary><literal>cluster_admin</literal> subsystem</primary>
</indexterm> <indexterm id="IToverview-44"><primary><literal>cluster_control
</literal> subsystem</primary></indexterm></para>
</listitem>
<listitem><para>Cluster software infrastructure, which lets you do the following:
</para>
<itemizedlist>
<listitem><para>Perform node logging</para>
</listitem>
<listitem><para>Administer the cluster</para>
</listitem>
<listitem><para>Define nodes</para>
</listitem>
</itemizedlist>
<para>The cluster software infrastructure consists of the <?Pub _nolinebreak><literal>
cluster_admin</literal><?Pub /_nolinebreak> and <?Pub _nolinebreak><literal>
cluster_control</literal><?Pub /_nolinebreak> subsystems).</para>
</listitem>
</itemizedlist>
<para><xref linkend="LE28867-PARENT"> shows a graphic representation of these
layers. <xref linkend="LE12498-PARENT"> describes the layers for Linux FailSafe,
which are located in the <filename>/usr/lib/failsafe/bin</filename> directory.
</para>
<para><figure id="LE28867-PARENT">
<title id="LE28867-TITLE">Software Layers</title>
<graphic entityref="software.layers"></graphic>
</figure></para>
<table frame="topbot" pgwide="1" id="LE12498-PARENT">
<title id="LE12498-TITLE">Contents of <filename>/usr/lib/failsafe/bin</filename></title>
<tgroup cols="4" colsep="0" rowsep="0">
<colspec colwidth="66*">
<colspec colwidth="95*">
<colspec colwidth="89*">
<colspec colwidth="146*">
<thead>
<row rowsep="1"><entry align="left" valign="bottom"><para>Layer</para></entry>
<entry align="left" valign="bottom"><para>Subsystem</para></entry><entry align="left"
valign="bottom"><para>Process</para></entry><entry align="left" valign="bottom"><para>
Description</para></entry></row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><para>Linux FailSafe Base</para></entry>
<entry align="left" valign="top"><para><literal>failsafe2</literal></para></entry>
<entry align="left" valign="top"><para><literal>ha_fsd</literal></para></entry>
<entry align="left" valign="top"><para>Linux FailSafe daemon. Provides basic
component of the Linux FailSafe software.<indexterm id="IToverview-49"><primary><literal>
failsafe2</literal> subsystem</primary></indexterm> <indexterm id="IToverview-50">
<primary><literal>ha_fsd</literal> process</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para>High-availability cluster infrastructure
</para></entry>
<entry align="left" valign="top"><para><literal>cluster_ha </literal></para></entry>
<entry align="left" valign="top"><para><literal>ha_cmsd</literal></para></entry>
<entry align="left" valign="top"><para>Cluster membership daemon. Provides
the list of nodes, called <firstterm>node membership</firstterm>, available
to the cluster.<indexterm id="IToverview-51"><primary><literal>ha_cmsd</literal>
process</primary></indexterm> <indexterm id="IToverview-52"><primary><literal>
cluster_ha</literal> subsystem</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para><literal>ha_gcd</literal></para></entry>
<entry align="left" valign="top"><para>Group membership daemon. Provides group
membership and reliable communication services in the presence of failures
to Linux FailSafe processes<indexterm id="IToverview-53"><primary><literal>
ha_gcd</literal> process</primary></indexterm>.</para></entry>
</row>
<row>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para><literal>ha_srmd</literal></para></entry>
<entry align="left" valign="top"><para>System resource manager daemon. Manages
resources, resource groups, and resource types. Executes action scripts for
resources.<indexterm id="IToverview-54"><primary><literal>ha_srmd</literal>
process</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para><literal>ha_ifd</literal></para></entry>
<entry align="left" valign="top"><para>Interface agent daemon. Monitors the
local node's network interfaces.<indexterm id="IToverview-55"><primary><literal>
ha_ifd</literal> process</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Cluster software infrastructure</para></entry>
<entry align="left" valign="top"><para><literal>cluster_admin </literal></para></entry>
<entry align="left" valign="top"><para><literal>cad </literal></para></entry>
<entry align="left" valign="top"><para>Cluster administration daemon. Provides
administration services.<indexterm id="IToverview-56"><primary><literal>cluster_admin
</literal> subsystem</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para><?Pub _nolinebreak><literal>cluster_control
</literal><?Pub /_nolinebreak></para></entry>
<entry align="left" valign="top"><para><literal>crsd </literal></para></entry>
<entry align="left" valign="top"><para>Node control daemon. Monitors the serial
connection to other nodes. Has the ability to reset other nodes.<indexterm
id="IToverview-57"><primary><literal>cluster_control</literal> subsystem
</primary></indexterm> <indexterm id="IToverview-58"><primary><literal>crsd
</literal> process</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para><literal>cmond</literal></para></entry>
<entry align="left" valign="top"><para>Daemon that manages all other daemons.
This process starts other processes in all nodes in the cluster and restarts
them on failures.</para></entry>
</row>
<row>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para></para></entry>
<entry align="left" valign="top"><para><literal>cdbd</literal></para></entry>
<entry align="left" valign="top"><para>Manages the configuration database
and keeps each copy in sync on all nodes in the pool</para></entry>
</row>
</tbody>
</tgroup>
</table>
<para></para>
</sect2>
<sect2><?Pub Dtl>
<title>Communication Paths</title>
<para>The following figures show communication paths in Linux FailSafe. Note
that they do not represent <literal>cmond</literal>. <indexterm id="IToverview-59">
<primary>communication paths</primary></indexterm> <indexterm id="IToverview-60">
<primary>system software</primary><secondary>communication paths</secondary>
</indexterm></para>
<para><figure><indexterm id="IToverview-61"><primary>read/write actions to
the cluster configuration database diagram</primary></indexterm>
<title> Read/Write Actions to the Cluster Configuration Database</title>
<graphic entityref="ha.cluster.config.info.flow"></graphic>
</figure></para>
<para><xref linkend="LE25208-PARENT"> shows the communication path for a node
that is in the pool but not in a cluster. </para>
<para><figure id="LE25208-PARENT"><indexterm id="IToverview-63"><primary>
node not in a cluster diagram</primary></indexterm>
<title id="LE25208-TITLE">Communication Path for a Node that is Not in a Cluster
</title>
<graphic entityref="machine.not.in.ha.cluster"></graphic>
</figure></para>
</sect2>
<sect2>
<title>Conditions Under Which Action Scripts are Executed</title>
<para>Action scripts are executed under the following conditions:</para>
<itemizedlist>
<listitem><para><literal>exclusive</literal>: the resource group is made online
by the user or HA processes are started</para>
</listitem>
<listitem><para><literal>start</literal>: the resource group is made online
by the user, HA processes are started, or there is a resource group failover
</para>
</listitem>
<listitem><para><literal>stop</literal>: the resource group is made offline,
HA process are stopped, the resource group fails over, or the node is shut
down</para>
</listitem>
<listitem><para><literal>monitor</literal>: the resource group is online
</para>
</listitem>
<listitem><para><literal>restart</literal>: the <literal>monitor</literal>
script fails</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title id="Z942863066lhj">When Does FailSafe Execute Action and Failover Scripts
</title>
<para>The order of execution is as follows:</para>
<orderedlist>
<listitem><para>Linux FailSafe is started, usually at node boot or manually,
and reads the resource group information from the cluster configuration database.
</para>
</listitem>
<listitem><para>Linux FailSafe asks the system resource manager (SRM) to run <literal>
exclusive</literal> scripts for all resource groups that are in the <literal>
Online ready</literal> state.</para>
</listitem>
<listitem><para>SRM returns one of the following states for each resource
group:<itemizedlist>
<listitem><para><literal>running</literal></para>
</listitem>
<listitem><para><literal>partially running</literal></para>
</listitem>
<listitem><para><literal>not running</literal></para>
</listitem>
</itemizedlist></para>
</listitem>
<listitem><para>If a resource group has a state of <literal>not running</literal>
in a node where HA services have been started, the following occurs:<orderedlist>
<listitem><para>Linux FailSafe runs the failover policy script associated
with the resource group. The failover policy scripts take the list of nodes
that are capable of running the resource group (the <firstterm>failover domain
</firstterm>) as a parameter.</para>
</listitem>
<listitem><para>The failover policy script returns an ordered list of nodes
in descending order of priority (the <firstterm>run-time failover domain</firstterm>)
where the resource group can be placed.</para>
</listitem>
<listitem><para>Linux FailSafe sends a request to SRM to move the resource
group to the first node in the run-time failover domain.</para>
</listitem>
<listitem><para>SRM executes the <literal>start</literal> action script for
all resources in the resource group:<itemizedlist>
<listitem><para>If the <literal>start</literal> script fails, the resource
group is marked <literal>online</literal> on that node with an <literal>srmd
executable error</literal> error.</para>
</listitem>
<listitem><para>If the <literal>start</literal> script is successful, SRM
automatically starts monitoring those resources. After the specified start
monitoring time passes, SRM executes the <literal>monitor</literal> action
script for the resource in the resource group.</para>
</listitem>
</itemizedlist></para>
</listitem>
</orderedlist></para>
</listitem>
<listitem><para>If the state of the resource group is <literal>running</literal>
or<literal> partially running </literal> on only one node in the cluster,
Linux FailSafe runs the associated failover policy script:<itemizedlist>
<listitem><para>If the highest priority node is the same node where the resource
group is partially running or running, the resource group is made online on
the same node. In the <literal> partially running</literal> case, Linux FailSafe
asks SRM to execute <literal>start</literal> scripts for resources in the
resource group that are not running.</para>
</listitem>
<listitem><para>If the highest priority node is a another node in the cluster,
Linux FailSafe asks SRM to execute <literal>stop</literal> action scripts
for resources in the resource group. Linux FailSafe makes the resource group
online in the highest priority node in the cluster.</para>
</listitem>
</itemizedlist></para>
</listitem>
<listitem><para>If the state of the resource group is <literal>running</literal>
or<literal> partially running </literal>in multiple nodes in the cluster,
the resource group is marked with an <literal> error exclusivity </literal>error.
These resource groups will require operator intervention to become online
in the cluster.</para>
</listitem>
</orderedlist>
<para><xref linkend="Z944249330lhj-PARENT"> shows the message paths for action
scripts and failover policy scripts.</para>
<figure id="Z944249330lhj-PARENT"><indexterm id="IToverview-62"><primary>
message paths diagram</primary></indexterm>
<title id="Z944249330lhj">Message Paths for Action Scripts and Failover Policy
Scripts</title>
<graphic entityref="ha.cluster.messages"></graphic>
</figure>
</sect2>
<sect2>
<title>Components</title>
<para>The cluster configuration database is a key component of Linux FailSafe
software. It contains all information about the following:<indexterm id="IToverview-64">
<primary>system software</primary><secondary>components</secondary></indexterm> <indexterm
id="IToverview-65"><primary>components</primary></indexterm></para>
<itemizedlist>
<listitem><para>Resources</para>
</listitem>
<listitem><para>Resource types</para>
</listitem>
<listitem><para>Resource groups</para>
</listitem>
<listitem><para>Failover policies</para>
</listitem>
<listitem><para>Nodes</para>
</listitem>
<listitem><para>Clusters</para>
</listitem>
</itemizedlist>
<para>The cluster configuration database daemon (<literal>cdbd</literal>)
maintains identical databases on each node in the cluster.<indexterm id="IToverview-66">
<primary>cluster administration daemon</primary></indexterm> <indexterm id="IToverview-67">
<primary>administration daemon</primary></indexterm></para>
<para>The following are the contents of the failsafe directories under the <filename>
/usr/lib</filename> and <filename>/var</filename> hierarchies:</para>
<itemizedlist>
<listitem><para><filename>/var/run/failsafe/comm/</filename></para>
<para>Directory that contains files that communicate between various daemons.
</para>
</listitem>
<listitem><para><filename>/usr/lib/failsafe/common_scripts/</filename></para>
<para>Directory that contains the script library (the common functions that
may be used in action scripts).</para>
</listitem>
<listitem><para><filename>/var/log/failsafe/</filename></para>
<para>Directory that contains the logs of all scripts and daemons executed
by Linux FailSafe. The outputs and errors from the commands within the scripts
are logged in the <filename>script_<replaceable>nodename</replaceable></filename>
file.</para>
</listitem>
<listitem><para><filename>/usr/lib/failsafe/policies/</filename></para>
<para>Directory that contains the failover scripts used for resource groups.
</para>
</listitem>
<listitem><para><filename>/usr/lib/failsafe/resource_types/template</filename></para>
<para>Directory that contains the template action scripts.</para>
</listitem>
<listitem><para><filename>/usr/lib/failsafe/resource_types/<replaceable>rt_name
</replaceable></filename></para>
<para>Directory that contains the action scripts for the <replaceable>rt_name
</replaceable> resource type. For example, <?Pub _nolinebreak><literal>/usr/lib/failsafe/resource_types/filesystem
</literal><?Pub /_nolinebreak><?Pub Caret> .</para>
</listitem>
<listitem><para><filename>resource_types/<replaceable>rt_name</replaceable>/exclusive
</filename></para>
<para>Script that verifies that a resource of this resource type is not already
running. For example, <literal>resource_types/filesystem/exclusive</literal>.
</para>
</listitem>
<listitem><para><filename>resource_types/<replaceable>rt_name</replaceable>/monitor
</filename></para>
<para>Script that monitors a resource of this type.</para>
</listitem>
<listitem><para><filename>resource_types/<replaceable>rt_name</replaceable>/restart
</filename></para>
<para>Script that restarts a resource of this resource type on the same node
after a monitoring failure.</para>
</listitem>
<listitem><para><filename>resource_types/<replaceable>rt_name</replaceable>/start
</filename></para>
<para>Script that starts a resource of this resource type.</para>
</listitem>
<listitem><para><filename>resource_types/<replaceable>rt_name</replaceable>/stop
</filename></para>
<para>Script that stops a resource of this resource type.</para>
</listitem>
</itemizedlist>
<para><xref linkend="LE21811-PARENT"> shows the administrative commands available
for use in scripts.</para>
<table frame="topbot" pgwide="1" id="LE21811-PARENT"><indexterm id="IToverview-70">
<primary><literal>ha_cilog</literal> command</primary></indexterm><indexterm
id="IToverview-69"><primary>administrative commands</primary></indexterm>
<indexterm id="IToverview-68"><primary>commands</primary></indexterm>
<title id="LE21811-TITLE">Administrative Commands for Use in Scripts</title>
<tgroup cols="2" colsep="0" rowsep="0">
<colspec colwidth="95*">
<colspec colwidth="301*">
<thead>
<row rowsep="1"><entry align="left" valign="bottom"><para>Command</para></entry>
<entry align="left" valign="bottom"><para>Purpose</para></entry></row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><para><literal>ha_cilog</literal></para></entry>
<entry align="left" valign="top"><para>Logs messages to the <filename>script_
</filename> <userinput></userinput><replaceable>nodename</replaceable> log
files.<indexterm id="IToverview-71"><primary>log messages</primary></indexterm> <indexterm
id="IToverview-72"><primary>message logging</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>ha_execute_lock</literal></para></entry>
<entry align="left" valign="top"><para>Executes a command with a file lock.
This allows command execution to be serialized</para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>ha_exec2</literal></para></entry>
<entry align="left" valign="top"><para>Executes a command and retries the
command on failure or timeout.<indexterm id="IToverview-73"><primary>monitoring
</primary><secondary>processes</secondary></indexterm> <indexterm id="IToverview-74">
<primary>process</primary><secondary>monitoring</secondary></indexterm><indexterm
id="IToverview-75"><primary><literal>ha_cilog</literal> command</primary>
</indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>ha_filelock</literal></para></entry>
<entry align="left" valign="top"><para>Locks a file.<indexterm id="IToverview-76">
<primary><literal>ha_filelock</literal> command</primary></indexterm> <indexterm
id="IToverview-77"><primary>lock a file</primary></indexterm> <indexterm id="IToverview-78">
<primary>file locking and unlocking</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>ha_fileunlock</literal></para></entry>
<entry align="left" valign="top"><para>Unlocks a file.<indexterm id="IToverview-79">
<primary><literal>ha_fileunlock</literal> command</primary></indexterm> <indexterm
id="IToverview-80"><primary>unlock a file</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>ha_ifdadmin</literal></para></entry>
<entry align="left" valign="top"><para>Communicates with the <literal>ha_ifd
</literal> network interface agent daemon.<indexterm id="IToverview-81"><primary><literal>
ha_ifdadmin</literal> command</primary></indexterm> <indexterm id="IToverview-82">
<primary>communicate with the network interface agent daemon</primary></indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>ha_http_ping2 </literal></para></entry>
<entry align="left" valign="top"><para>Checks if a web server is running.<indexterm
id="IToverview-83"><primary><literal>ha_http_ping2</literal> command</primary>
</indexterm> <indexterm id="IToverview-84"><primary>Netscape node check</primary>
</indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>ha_macconfig2 </literal></para></entry>
<entry align="left" valign="top"><para>Displays or modifies MAC addresses
of a network interface.<indexterm id="IToverview-85"><primary><literal>ha_macconfig2
</literal> command</primary></indexterm> <indexterm id="IToverview-86"><primary>
MAC address modification and display</primary></indexterm></para></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
</sect1>
</chapter>
<?Pub *0000066416>