[BACK]Return to overview.sgml CVS log [TXT][DIR] Up to [Development] / failsafe / FailSafe-books / LnxFailSafe_PG

File: [Development] / failsafe / FailSafe-books / LnxFailSafe_PG / overview.sgml (download)

Revision 1.1, Wed Nov 29 22:01:12 2000 UTC (16 years, 10 months ago) by vasa
Branch: MAIN
CVS Tags: HEAD

New documentation files for the Programmers' Guide.

<!-- Fragment document type declaration subset:
ArborText, Inc., 1988-1997, v.4001
<!DOCTYPE SET PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
<!ENTITY scriptlib.sgml SYSTEM "scriptlib.sgml">
<!ENTITY scriptlibapp.sgml SYSTEM "scriptlibapp.sgml">
<!ENTITY startgui.sgml SYSTEM "startgui.sgml">
<!ENTITY preface.sgml SYSTEM "preface.sgml">
<!ENTITY action.sgml SYSTEM "action.sgml">
<!ENTITY failover.sgml SYSTEM "failover.sgml">
<!ENTITY database.sgml SYSTEM "database.sgml">
<!ENTITY install.sgml SYSTEM "install.sgml">
<!ENTITY gloss.sgml SYSTEM "gloss.sgml">
<!ENTITY index.sgml SYSTEM "index.sgml">
<!ENTITY monitor SYSTEM "figures/monitor.eps" NDATA eps>
<!ENTITY resource.ai SYSTEM "figures/resource.ai.eps" NDATA eps>
<!ENTITY optional.ai SYSTEM "figures/optional.ai.eps" NDATA eps>
<!ENTITY manager.ai SYSTEM "figures/manager.ai.eps" NDATA eps>
<!ENTITY depend.ai SYSTEM "figures/depend.ai.eps" NDATA eps>
<!ENTITY type.ai SYSTEM "figures/type.ai.eps" NDATA eps>
<!ENTITY attrib.ai SYSTEM "figures/attrib.ai.eps" NDATA eps>
<!ENTITY action.ai SYSTEM "figures/action.ai.eps" NDATA eps>
<!ENTITY star.configuration SYSTEM "figures/star.configuration.eps" NDATA eps>
<!ENTITY n.plus.2.configuration SYSTEM "figures/n.plus.2.configuration.eps" NDATA eps>
<!ENTITY square.configuration SYSTEM "figures/square.configuration.eps" NDATA eps>
]>
-->
<chapter id="LE63369-PARENT">
<title id="LE63369-TITLE">Introduction to Writing Application Scripts</title>
<para>Linux FailSafe provides several highly available services for a two&ndash;node
cluster. These services are monitored by the Linux FailSafe software. You
can create additional services that are highly available by using the instructions
in this guide.</para>
<para>This chapter provides an introduction to Linux FailSafe programming.
The sections are as follows:</para>
<itemizedlist>
<listitem><para><xref linkend="LE60545-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE37432-PARENT"></para>
</listitem>
<listitem><para><xref linkend="plugin"></para>
</listitem>
<listitem><para><xref linkend="LE37841-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE56070-PARENT"></para>
</listitem>
<listitem><para><xref linkend="LE37841-PARENT"></para>
</listitem>
</itemizedlist>
<para>For an overview of the software layers, communication paths, and cluster
configuration database, see the <citetitle>Linux FailSafe Administrator's
Guide</citetitle>.</para>
<sect1 id="LE60545-PARENT"><?Pub Dtl>
<title id="LE60545-TITLE">Concepts</title>
<para>In order to use Linux FailSafe, you must understand the concepts in
this section.<indexterm id="IToverview-0"><primary>concepts</primary></indexterm></para>
<sect2>
<title>Cluster Node (or Node)</title>
<para>A <firstterm>cluster node</firstterm> is a single Linux execution environment.
In other words, a single physical or virtual machine.  In current Linux environments
this will always be an individual computer. The term <firstterm>node</firstterm>
is used to indicate this meaning in this guide for brevity, as opposed to
any meaning such as a network node. <indexterm id="IToverview-1"><primary>
cluster node</primary></indexterm> <indexterm id="IToverview-2"><primary>
node</primary></indexterm></para>
</sect2>
<sect2>
<title>Pool</title>
<para>A <firstterm>pool</firstterm> is the entire set of nodes having  membership
in a group of clusters. The clusters are usually close together and should
always serve a common purpose. A replicated cluster configuration database
is stored on each node in the pool. <indexterm id="IToverview-3"><primary>
pool</primary></indexterm></para>
</sect2>
<sect2>
<title>Cluster</title>
<para>A <firstterm>cluster</firstterm> is a collection of one or more nodes
coupled to each other by networks or other similar interconnections. A cluster
belongs to one pool and only one pool.  A cluster is identified by a simple
name; this name must be unique within the pool.  A particular node may be
a member of only one cluster. All nodes in a cluster are also in the pool;
however, all nodes in the pool are not necessarily in the cluster.<indexterm
id="IToverview-4"><primary>cluster</primary></indexterm></para>
</sect2>
<sect2>
<title>Node Membership</title>
<para>A <firstterm>node membership</firstterm> is the list of nodes in a cluster
on which Linux FailSafe can allocate resource  groups.<indexterm id="IToverview-5">
<primary>node membership</primary></indexterm> <indexterm id="IToverview-6">
<primary>membership</primary></indexterm></para>
</sect2>
<sect2>
<title>Process Membership</title>
<para>A <indexterm id="IToverview-7"><primary>process</primary><secondary>
membership</secondary></indexterm> <firstterm>process membership</firstterm>
is the list of process instances in a cluster that form a process group. There
can be multiple process groups per node.</para>
</sect2>
<sect2>
<title>Resource</title>
<para>A <firstterm>resource</firstterm> is a single physical or logical entity
that provides a service to clients or other resources. For example, a resource
can be a single disk volume, a particular network address, or an application
such as a web server. A resource is generally available for use over time
on two or more nodes in a cluster, although it can only be allocated to one
node at any given time. <indexterm id="IToverview-8"><primary>resource</primary>
<secondary>definition</secondary></indexterm></para>
<para>Resources are identified by a resource name and a resource type. One
resource can be dependent on one or more other resources; if so, it will not
be able to start (that is, be made available for use) unless the dependent
resources are also started. Dependent resources must be part of the same resource
group and are identified in a resource dependency list.</para>
</sect2>
<sect2>
<title>Resource Type</title>
<para>A <firstterm>resource type</firstterm> is a particular class of resource.
All of the resources in a particular resource type can be handled in the same
way for the purposes of failover. Every resource is an instance of exactly
one resource type.<indexterm id="IToverview-10"><primary>resource type</primary>
<secondary>description</secondary></indexterm></para>
<para>A resource type is identified by a simple name; this name should be
unique within the cluster. A resource type can be defined for a specific node,
or it can be defined for an entire cluster. A resource type definition for
a specific node overrides a clusterwide resource type definition with the
same name; this allows an individual node to override global settings from
a clusterwide resource type definition.</para>
<para>Like resources, a resource type can be dependent on one or more other
resource types. If such a dependency exists, at least one instance of each
of the dependent resource types must be defined. For example, a resource type
named <literal>Netscape_web</literal> might have resource type dependencies
on resource types named <literal>IP_address</literal> and <literal>volume
</literal>. If a resource named <literal>web1</literal> is defined with the <literal>
Netscape_web</literal> resource type, then the resource group containing <literal>
web1</literal> must also contain at least one resource of the type <literal>
IP_address</literal> and one resource of the type <literal>volume</literal>.
</para>
<para>The Linux FailSafe software includes some predefined resource types.
If these types fit the application you want to make highly available, you
can reuse them. If none fit, you can create additional resource types by using
the instructions in this guide.</para>
</sect2>
<sect2>
<title>Resource Name</title>
<para>A <firstterm>resource name</firstterm> identifies a specific instance
of a resource type. A resource name must be unique for a given resource type.<indexterm
id="IToverview-9"><primary>resource</primary><secondary>name</secondary></indexterm></para>
</sect2>
<sect2>
<title>Resource Group</title>
<para>A <firstterm>resource group</firstterm> is a collection of interdependent
resources. A resource group is identified by a simple name; this name must
be unique within a cluster.  <xref linkend="LE99232-PARENT"> shows an example
of the resources and their corresponding resource types for a resource group
named <literal>WebGroup.</literal> <indexterm id="IToverview-11"><primary>
 resource group</primary><secondary>definition</secondary></indexterm></para>
<table frame="topbot" id="LE99232-PARENT">
<title id="LE99232-TITLE">Example Resource Group</title>
<tgroup cols="2" colsep="0" rowsep="0">
<colspec colwidth="198*">
<colspec colwidth="198*">
<thead>
<row rowsep="1"><entry align="left" valign="bottom"><para>Resource</para></entry>
<entry align="left" valign="bottom"><para>Resource Type</para></entry></row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><para><literal>10.10.48.22</literal></para></entry>
<entry align="left" valign="top"><para><literal>IP_address</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>/fs1</literal></para></entry>
<entry align="left" valign="top"><para><literal>filesystem</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>vol1</literal></para></entry>
<entry align="left" valign="top"><para><literal>volume</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>web1</literal></para></entry>
<entry align="left" valign="top"><para><literal>Netscape_web</literal></para></entry>
</row>
</tbody>
</tgroup>
</table>
<para>If any individual resource in a resource group becomes unavailable for
its intended use, then the entire resource group is considered unavailable.
Therefore, a resource group is the unit of failover.</para>
<para>Resource groups cannot overlap; that is, two resource groups cannot
contain the same resource.</para>
<para>For information about configuring resource groups, see the <citetitle>
Linux FailSafe Administrator's Guide</citetitle>.</para>
</sect2>
<sect2>
<title>Resource Dependency List</title>
<para>A <firstterm>resource dependency list</firstterm> is a list of resources
upon which a resource depends. Each resource instance must have resource dependencies
that satisfy its resource type dependencies before it can be added to a resource
group.</para>
</sect2>
<sect2>
<title>Resource Type Dependency List</title>
<para>A <firstterm>resource type dependency list</firstterm> is a list of
resource types upon which a resource type depends. For example, the <literal>
filesystem</literal> resource type depends upon the <literal>volume</literal>
resource type, and the <literal>Netscape_web</literal> resource type depends
upon the <literal>filesystem</literal> and <literal>IP_address</literal> resource
types.<indexterm id="IToverview-12"><primary>resource type</primary><secondary>
dependency list</secondary></indexterm> <indexterm id="IToverview-13"><primary>
dependency list</primary></indexterm></para>
<para>For example, suppose a file system instance <literal>fs1</literal> is
mounted on volume <literal>vol1</literal>. Before <literal>fs1</literal> can
be added to a resource group, <literal>fs1</literal> must be defined to depend
on <literal>vol1</literal>. Linux FailSafe only knows that a file system instance
must have one volume instance in its dependency list. This requirement is
inferred from the resource type dependency list. <indexterm id="IToverview-14">
<primary>resource</primary><secondary>dependency list</secondary></indexterm></para>
</sect2>
<sect2>
<title>Failover</title>
<para>A <firstterm>failover</firstterm> is the process of allocating a resource
group (or application) to another node, according to a failover policy. A
failover may be triggered by the failure of a resource, a change in the node
membership (such as when a node fails or starts), or a manual request by the
administrator.<indexterm id="IToverview-15"><primary>failover</primary></indexterm></para>
</sect2>
<sect2>
<title>Failover Policy</title>
<para>A <firstterm>failover policy</firstterm> is the method used by Linux
FailSafe to determine the destination node of a failover. A failover policy
consists of the following:</para>
<itemizedlist>
<listitem><para>Failover domain</para>
</listitem>
<listitem><para>Failover attributes</para>
</listitem>
<listitem><para>Failover script</para>
</listitem>
</itemizedlist>
<para>Linux FailSafe uses the failover domain output from a failover script
along with failover attributes to determine on which node a resource group
should reside.</para>
<para>The administrator must configure a failover policy for each resource
group. A failover policy name must be unique within the pool. Linux FailSafe
includes predefined failover policies, but youcan define your own failover
algorithms as well. <indexterm id="IToverview-16"><primary>failover policy
</primary></indexterm></para>
</sect2>
<sect2>
<title>Failover Domain</title>
<para>A <firstterm>failover domain</firstterm> is the ordered list of nodes
on which a given resource group can be allocated. The nodes listed in the
failover domain must be within the same cluster; however, the failover domain
does not have to include every node in the cluster.<indexterm id="IToverview-17">
<primary>failover domain</primary></indexterm> <indexterm id="IToverview-18">
<primary>domain</primary></indexterm> <indexterm id="IToverview-19"><primary>
application failover domain</primary></indexterm> &ensp;</para>
<para>The administrator defines the initial failover domain when creating
a failover policy. This list is transformed into a run-time failover domain
by the failover script; Linux FailSafe uses the run-time failover domain along
with failover attributes and the node membership to determine the node on
which a resource group should reside. Linux FailSafe stores the run-time failover
domain and uses it as input to the next failover script invocation. Depending
on the run-time conditions and contents of the failover script, the initial
and run-time failover domains may be identical.</para>
<para>In general, Linux FailSafe allocates a given resource group to the first
node listed in the run-time failover domain that is also in the node membership;
the point at which this allocation takes place is affected by the failover
attributes.</para>
</sect2>
<sect2>
<title>Failover Attribute</title>
<para>A <firstterm>failover attribute</firstterm> is a string that affects
the allocation of a resource group in a cluster. The administrator must specify
system attributes (such as <literal>Auto_Failback</literal> or <literal>Controlled_Failback
</literal>), and can optionally supply site-specific attributes.<indexterm
id="IToverview-20"><primary>failover attributes</primary></indexterm></para>
</sect2>
<sect2>
<title>Failover Scripts</title>
<para>A <firstterm>failover script</firstterm> is a shell script that generates
a run-time failover domain and returns it to the Linux FailSafe process. The
Linux FailSafe process <literal>ha_fsd</literal> applies the failover attributes
and then selects the first node in the returned failover domain that is also
in the current node membership.<indexterm id="IToverview-21"><primary>failover
script</primary><secondary>description</secondary></indexterm></para>
<para>The following failover scripts are provided with the Linux FailSafe
release:</para>
<itemizedlist>
<listitem><para><filename>ordered</filename>, which never changes the initial
failover domain. When using this script, the initial and run-time failover
domains are equivalent.</para>
</listitem>
<listitem><para><filename>round-robin</filename>, which selects the resource
group owner in a round-robin (circular) fashion. This policy can be used for
resource groups that can be run in any node in the cluster.</para>
</listitem>
</itemizedlist>
<para>If these scripts do not meet your needs, you can create a new failover
script using the information in this guide.</para>
</sect2>
<sect2>
<title>Action Scripts</title>
<para>The <firstterm>action scripts</firstterm> are the set of scripts that
determine how a resource is started, monitored, and stopped. There must be
a set of action scripts specified for each resource type.<indexterm id="IToverview-22">
<primary>action scripts</primary></indexterm></para>
<para>The following is the complete set of action scripts that can be specified
for each resource type:</para>
<itemizedlist>
<listitem><para><literal>exclusive</literal>, which verifies that a resource
is not already running</para>
</listitem>
<listitem><para><literal>start</literal>, which starts a resource</para>
</listitem>
<listitem><para><literal>stop</literal>, which stops a resource</para>
</listitem>
<listitem><para><literal>monitor</literal>, which monitors a resource</para>
</listitem>
<listitem><para><literal>restart</literal>, which restarts a resource on the
same server after a monitoring failure occurs</para>
</listitem>
</itemizedlist>
<para>The release includes action scripts for predefined resource types. If
these scripts fit the resource type that you want to make highly available,
you can reuse them by copying them and modifying them as needed. If none fits,
you can create additional action scripts by using the instructions in this
guide.</para>
</sect2>
</sect1>
<sect1 id="LE37432-PARENT"><?Pub Dtl>
<title id="LE37432-TITLE">Highly Available Services Included with Linux FailSafe
</title>
<para><indexterm id="IToverview-23"><primary>highly available</primary><secondary>
services</secondary></indexterm>The base release includes the software required
to make IP addresses (the <literal>IP_address</literal> resource type) highly
available.<indexterm id="IToverview-24"><primary>resource type</primary><secondary>
provided with Linux FailSafe</secondary></indexterm><indexterm id="IToverview-25">
<primary>IP address service</primary></indexterm></para>
</sect1>
<sect1 id="plugin">
<title>Plug-Ins</title>
<para>Optional software packages, known as <firstterm>plug-ins</firstterm>,
are available to make additional applications highly available. </para>
<para>The following plug-ins are available for Linux FailSafe:<indexterm id="IToverview-32">
<primary>plug-ins</primary></indexterm></para>
<itemizedlist>
<listitem><para>Logical volumes (the <literal>volume</literal> resource type)
such as provided by <literal>LVM</literal> <indexterm id="IToverview-26">
<primary>LVM logical volume service</primary></indexterm> <indexterm id="IToverview-27">
<primary><literal>volume</literal>  resource type</primary></indexterm></para>
</listitem>
<listitem><para>Filesystems such as <literal>reiserfs</literal> and <literal>
ext2fs</literal> (the <literal>filesystem</literal> resource type)<indexterm
id="IToverview-28"><primary>XFS file system service</primary></indexterm>
 <indexterm id="IToverview-29"><primary><literal>filesystem</literal>resource
type</primary></indexterm></para>
</listitem>
<listitem><para>MAC addresses (the <literal>MAC_address</literal> resource
type)<indexterm id="IToverview-30"><primary>MAC address service</primary>
</indexterm> <indexterm id="IToverview-31"><primary><literal>MAC_address</literal>
 resource type</primary></indexterm></para>
</listitem>
<listitem><para>Linux FailSafe Samba</para>
</listitem>
<listitem><para>Linux FailSafe NFS </para>
<note>
<para>Linux FailSafe NFS is not part of the core Linux FailSafe software,
but it is documented with the base release.</para>
</note>
</listitem>
</itemizedlist>
<para>If you want to create new highly available services, or change the functionality
of the provided failover scripts and action scripts by writing new scripts,
you will use the instructions in this guide. However, not all resources can
be made highly available; see <xref linkend="LE56070-PARENT">.</para>
</sect1>
<sect1 id="LE56070-PARENT"><?Pub Dtl>
<title id="LE56070-TITLE">Characteristics that Permit an Application to be
Highly Available</title>
<para>The characteristics of an application that can be made highly available
are as follows:<indexterm id="IToverview-33"><primary>high availability characterists
</primary></indexterm></para>
<itemizedlist>
<listitem><para>The application can be easily restarted and monitored.</para>
<para>It should be able to recover from failures as does most client/server
software. The failure could be a hardware failure, an operating system failure,
or an application failure. If a node crashed and reboots, client/server software
should be able to attach again automatically.</para>
</listitem>
<listitem><para>The application must have a start and stop procedure.</para>
<para>When the application fails over, the instances of the application are
stopped on one node using the stop procedure and restarted on the other node
using the start procedure. </para>
</listitem>
<listitem><para>The application can be moved from one node to another after
failures.</para>
<para>If the resource has failed, it must still be possible to run the resource
stop procedure. In addition, the resource must recover from the failed state
when the resource start procedure is executed in another node.</para>
<para>Ensure that there is no affinity for a specific node. </para>
</listitem>
<listitem><para>The application does not depend on knowing the primary host
name (as returned by <command>hostname</command>); that is, required resources
can be configured to work with an IP address.</para>
</listitem>
<listitem><para>Other resources on which the application depends can be made
highly available. If they are not provided by Linux FailSafe   and its optional
products (see <xref linkend="LE37432-PARENT">), you must make these resources
highly available, using the information in this guide.</para>
<note>
<para>An application itself is not modified to make it highly available.</para>
</note>
</listitem>
</itemizedlist>
</sect1>
<sect1 id="LE37841-PARENT">
<title id="LE37841-TITLE">Overview of the Programming Steps</title>
<note>
<para>If you do not want to write the scripts yourself, you can establish
a contract with the Silicon Graphics Professional Services group to create
customized scripts. See: <ulink url="http://www.sgi.com/services/index.html">
http://www.sgi.com/services/index.html</ulink>.<indexterm id="IToverview-34">
<primary>overview of the programming steps</primary></indexterm> <indexterm
id="IToverview-35"><primary>programming steps overview</primary></indexterm></para>
</note>
<para>To make an application highly available, follow these steps:</para>
<orderedlist>
<listitem><para>Understand the application and determine:</para>
<itemizedlist>
<listitem><para>The configuration required for the application, such as user
names, permissions, data location (volumes), and so on. For more information
about configuration, see the <citetitle>Linux FailSafe Administrator's Guide
</citetitle>.</para>
</listitem>
<listitem><para>The other resources on which the application depends. All
interdependent resources must be part of the same resource group.</para>
</listitem>
<listitem><para>The resource type that best suits this application.</para>
</listitem>
<listitem><para>The number of instances of the resource type that will constitute
the application. (Each instance of a given application, or <firstterm>resource
type</firstterm>, is a separate resource.) For example, a web server may depend
upon two filesystem resources.</para>
</listitem>
<listitem><para>The commands and arguments required to start, stop, and monitor
this application (that is, the resources in the resource group).</para>
</listitem>
<listitem><para>The order in which all resources in the resource group must
be started and stopped.</para>
</listitem>
</itemizedlist>
</listitem>
<listitem><para>Determine whether existing action scripts can be reused. If
they cannot, write a new set of action scripts, using existing scripts and
the templates in  <?Pub _nolinebreak><filename>/usr/lib/failsafe/resource_types/template
</filename><?Pub /_nolinebreak><?Pub Caret> as a guide. See <xref linkend="LE77672-PARENT">.
</para>
</listitem>
<listitem><para>Determine whether the existing <literal>ordered</literal>
or <literal>round-robin</literal> failover scripts can be reused for the resource
group. If they cannot, write a new failover script. See <xref linkend="LE43007-PARENT">.
</para>
</listitem>
<listitem><para>Determine whether an existing resource type can be reused.
If none applies, create a new resource type or modify an existing resource
type. See <xref linkend="LE43007-PARENT">.</para>
</listitem>
<listitem><para>Configure the following in the cluster configuration database
(for more information, see the <citetitle>Linux FailSafe Administrator's Guide
</citetitle>):</para>
<itemizedlist>
<listitem><para>Resource group</para>
</listitem>
<listitem><para>Resource type</para>
</listitem>
<listitem><para>Failover policy</para>
</listitem>
</itemizedlist>
</listitem>
<listitem><para>Test the action scripts and failover script. See <xref linkend="LE96600-PARENT">,
and <xref linkend="Z943900191lhj">.</para>
<note>
<para>Do not modify the scripts included with the Linux FailSafe   product.
New or customized scripts must have different names from the files included
with the release.</para>
</note>
</listitem>
</orderedlist>
</sect1>
</chapter>
<?Pub *0000025278>