<!-- Fragment document type declaration subset:
ArborText, Inc., 1988-1997, v.4001
<!DOCTYPE SET PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
<!ENTITY scriptlib.sgml SYSTEM "scriptlib.sgml">
<!ENTITY scriptlibapp.sgml SYSTEM "scriptlibapp.sgml">
<!ENTITY startgui.sgml SYSTEM "startgui.sgml">
<!ENTITY preface.sgml SYSTEM "preface.sgml">
<!ENTITY overview.sgml SYSTEM "overview.sgml">
<!ENTITY action.sgml SYSTEM "action.sgml">
<!ENTITY failover.sgml SYSTEM "failover.sgml">
<!ENTITY database.sgml SYSTEM "database.sgml">
<!ENTITY install.sgml SYSTEM "install.sgml">
<!ENTITY index.sgml SYSTEM "index.sgml">
<!ENTITY monitor SYSTEM "figures/monitor.eps" NDATA eps>
<!ENTITY resource.ai SYSTEM "figures/resource.ai.eps" NDATA eps>
<!ENTITY optional.ai SYSTEM "figures/optional.ai.eps" NDATA eps>
<!ENTITY manager.ai SYSTEM "figures/manager.ai.eps" NDATA eps>
<!ENTITY depend.ai SYSTEM "figures/depend.ai.eps" NDATA eps>
<!ENTITY type.ai SYSTEM "figures/type.ai.eps" NDATA eps>
<!ENTITY attrib.ai SYSTEM "figures/attrib.ai.eps" NDATA eps>
<!ENTITY action.ai SYSTEM "figures/action.ai.eps" NDATA eps>
<!ENTITY star.configuration SYSTEM "figures/star.configuration.eps" NDATA eps>
<!ENTITY n.plus.2.configuration SYSTEM "figures/n.plus.2.configuration.eps" NDATA eps>
<!ENTITY square.configuration SYSTEM "figures/square.configuration.eps" NDATA eps>
]>
-->
<glossary>
<title>Glossary</title>
<glossentry><glossterm>action scripts</glossterm>
<glossdef>
<para>The set of scripts that determine how a resource is started, monitored,
and stopped. There must be a set of action scripts specified for each resource
type. The possible set of action scripts is: <literal>exclusive</literal>, <literal>
start</literal>, <literal>stop</literal>, <literal>monitor</literal>, and <literal>
restart</literal>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster</glossterm>
<glossdef>
<para>A collection of one or more cluster nodes coupled to each other by networks
or other similar interconnections. A cluster is identified by a simple name;
this name must be unique within the pool. A particular node may be a member
of only one cluster.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster administrator</glossterm>
<glossdef>
<para>The person responsible for managing and maintaining a cluster.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster configuration database</glossterm>
<glossdef>
<para>Contains configuration information about all resources, resource types,
resource groups, failover policies, nodes, and clusters.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster node</glossterm>
<glossdef>
<para>A single Linux execution environment. In other words, a single physical
machine or single running Linux kernel. In current Linux environments this
will be an individual computer. The term <firstterm>node</firstterm> is used
within this guide to indicate this meaning, as opposed to any alternate meaning
such as a network node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>control messages</glossterm>
<glossdef>
<para>Messages that cluster software sends between the cluster nodes to request
operations on or distribute information about cluster nodes and resource groups.
Linux FailSafe sends control messages for the purpose of ensuring nodes and
groups remain highly available. Control messages and heartbeat messages are
sent through a node's network interfaces that have been attached to a control
network. A node can be attached to multiple control networks.</para>
</glossdef>
<glossdef>
<para>A node's control networks should not be set to accept control messages
if the node is not a dedicated Linux FailSafe node. Otherwise, end users who
run other jobs on the machine can have their jobs killed unexpectedly when
Linux FailSafe resets the node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>control network</glossterm>
<glossdef>
<para>The network that connects nodes through their network interfaces (typically
Ethernet) such that Linux FailSafe can maintain a cluster's high availability
by sending heartbeat messages and control messages through the network to
the attached nodes. Linux FailSafe uses the highest priority network interface
on the control network; it uses a network interface with lower priority when
all higher-priority network interfaces on the control network fail.</para>
</glossdef>
<glossdef>
<para>A node must have at least one control network interface for heartbeat
messages and one for control messages (both heartbeat and control messages
can be configured to use the same interface). A node can have no more than
eight control network interfaces.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>database</glossterm>
<glossdef>
<para>See <glossterm>cluster configuration database</glossterm></para>
</glossdef>
</glossentry>
<glossentry><glossterm>dependency list</glossterm>
<glossdef>
<para>See <glossterm>resource dependency</glossterm> or <glossterm>resource
type dependency</glossterm>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover</glossterm>
<glossdef>
<para>The process of allocating a resource group to another node according
to a failover policy. A failover may be triggered by the failure of a resource,
a change in the node membership (such as when a node fails or starts), or
a manual request by the administrator.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover attribute</glossterm>
<glossdef>
<para>A string that affects the allocation of a resource group in a cluster.
The administrator must specify system-defined attributes (such as <literal>
Auto_Failback</literal> or <literal>Controlled_Failback</literal>), and can
optionally supply site-specific attributes.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover domain</glossterm>
<glossdef>
<para>The ordered list of nodes on which a particular resource group can be
allocated. The nodes listed in the failover domain must be within the same
cluster; however, the failover domain does not have to include every node
in the cluster.The administrator defines the initial failover domain when
creating a failover policy. This list is transformed into the run-time failover
domain by the failover script the run-time failover domain is what is actually
used to select the failover node. Linux FailSafe stores the run-time failover
domain and uses it as input to the next failover script invocation. The initial
and run-time failover domains may be identical, depending upon the contents
of the failover script. In general, Linux FailSafe allocates a given resource
group to the first node listed in the run-time failover domain that is also
in the node membership; the point at which this allocation takes place is
affected by the failover attributes.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover policy</glossterm>
<glossdef>
<para>The method used by Linux FailSafe to determine the destination node
of a failover. A failover policy consists of a failover domain, failover attributes,
and a failover script. A failover policy name must be unique within the pool.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover script</glossterm>
<glossdef>
<para>A failover policy component that generates a run-time failover domain
and returns it to the Linux FailSafe process. The process applies the failover
attributes and then selects the first node in the returned failover domain
that is also in the current node membership.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>Failsafe database</glossterm>
<glossdef>
<para>See <glossterm>cluster configuration database</glossterm></para>
</glossdef>
</glossentry>
<glossentry><glossterm>heartbeat messages</glossterm>
<glossdef>
<para>Messages that cluster software sends between the nodes that indicate
a node is up and running. Heartbeat messages and control messages are sent
through a node's network interfaces that have been attached to a control network.
A node can be attached to multiple control networks.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>heartbeat interval</glossterm>
<glossdef>
<para>Interval between heartbeat messages. The node timeout value must be
at least 10 times the heartbeat interval for proper Linux FailSafe operation
(otherwise false failovers may be triggered). The higher the number of heartbeats
(smaller heartbeat interval), the greater the potential for slowing down the
network. Conversely, the fewer the number of heartbeats (larger heartbeat
interval), the greater the potential for reducing availability of resources.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>initial failover domain</glossterm>
<glossdef>
<para>The ordered list of nodes, defined by the administrator when a failover
policy is first created, that is used the first time a cluster is booted.The
ordered list specified by the initial failover domain is transformed into
a run-time failover domain by the failover script; the run-time failover domain
is used along with failover attributes to determine the node on which a resource
group should reside. With each failure, the failover script takes the current
run-time failover domain and potentially modifies it; the initial failover
domain is never used again. Depending on the run-time conditions and contents
of the failover script, the initial and run-time failover domains may be identical.
See also <glossterm>run-time failover domain</glossterm>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>key/value attribute</glossterm>
<glossdef>
<para>A set of information that must be defined for a particular resource
type. For example, for the resource type <literal>filesystem</literal> one
key/value pair might be <replaceable>mount_point=/fs1</replaceable> where <replaceable>
mount_point</replaceable> is the key and <replaceable>fs1</replaceable> is
the value specific to the particular resource being defined. Depending on
the value, you specify either a <literal>string</literal> or <literal>integer
</literal> data type. In the previous example, you would specify <literal>
string</literal> as the data type for the value <replaceable>fs1</replaceable>.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log configuration</glossterm>
<glossdef>
<para>A log configuration has two parts: a log level and a log file, both
associated with a log group. The cluster administrator can customize the location
and amount of log output, and can specify a log configuration for all nodes
or for only one node. For example, the <literal>crsd</literal> log group can
be configured to log detailed level-10 messages to the <?Pub _nolinebreak><filename>
/var/log/failsafe/crsd_foo</filename><?Pub /_nolinebreak> log only on the
node <literal>foo</literal> and to write only minimal level-1 messages to
the <literal>crsd</literal> log on all other nodes.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log file</glossterm>
<glossdef>
<para>A file containing Linux FailSafe notifications for a particular log
group. A log file is part of the log configuration for a log group. By default,
log files reside in the <?Pub _nolinebreak><filename>/var/log/failsafe</filename><?Pub /_nolinebreak> directory,
but the cluster administrator can customize this. Note: Linux FailSafe logs
both normal operations and critical errors to <?Pub _nolinebreak><filename>
/var/log/failsafe</filename><?Pub /_nolinebreak>, as well as to individual
logs for specific log groups.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log group</glossterm>
<glossdef>
<para>A set of one or more Linux FailSafe processes that use the same log
configuration. A log group usually corresponds to one daemon, such as <literal>
gcd</literal>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log level</glossterm>
<glossdef>
<para>A number controlling the number of log messages that Linux FailSafe
will write into an associated log group's log file. A log level is part of
the log configuration for a log group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>node</glossterm>
<glossdef>
<para>See <glossterm>cluster node</glossterm></para>
</glossdef>
</glossentry>
<glossentry><glossterm>node ID</glossterm>
<glossdef>
<para>A 16-bit positive integer that uniquely defines a cluster node. During
node definition, Linux FailSafe will assign a node ID if one has not been
assigned by the cluster administrator. Once assigned, the node ID cannot be
modified.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>node membership</glossterm>
<glossdef>
<para>The list of nodes in a cluster on which Linux FailSafe can allocate
resource groups.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>node timeout</glossterm>
<glossdef>
<para>If no heartbeat is received from a node in this period of time, the
node is considered to be dead. The node timeout value must be at least 10
times the heartbeat interval for proper Linux FailSafe operation (otherwise
false failovers may be triggered).</para>
</glossdef>
</glossentry>
<glossentry><glossterm>notification command</glossterm>
<glossdef>
<para>The command used to notify the cluster administrator of changes or failures
in the cluster, nodes, and resource groups. The command must exist on every
node in the cluster.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>offline resource group</glossterm>
<glossdef>
<para>A resource group that is not highly available in the cluster. To put
a resource group in offline state, Linux FailSafe stops the group (if needed)
and stops monitoring the group. An offline resource group can be running on
a node, yet not under Linux FailSafe control. If the cluster administrator
specifies the <literal>detach only</literal> option while taking the group
offline, then Linux FailSafe will not stop the group but will stop monitoring
the group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>online resource group</glossterm>
<glossdef>
<para>A resource group that is highly available in the cluster. When Linux
FailSafe detects a failure that degrades the resource group availability,
it moves the resource group to another node in the cluster. To put a resource
group in online state, Linux FailSafe starts the group (if needed) and begins
monitoring the group. If the cluster administrator specifies the <glossterm>
attach only</glossterm> option while bringing the group online, then Linux
FailSafe will not start the group but will begin monitoring the group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>owner host</glossterm>
<glossdef>
<para>A system that can control a node remotely, for example power-cycling
the node. At run time, the owner host must be defined as a node in the pool.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>owner TTY name</glossterm>
<glossdef>
<para>The device file name of the terminal port (TTY) on the owner host to
which the system controller serial cable is connected. The other end of the
cable connects to the node with the system controller port, so the node can
be controlled remotely by the owner host.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>pool</glossterm>
<glossdef>
<para>The entire set of nodes involved with a group of clusters. The group
of clusters are usually close together and should always serve a common purpose.
A replicated cluster configuration database is stored on each node in the
pool.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>port password</glossterm>
<glossdef>
<para>The password for the system controller port, usually set once in firmware
or by setting jumper wires. (This is not the same as the node's <literal>
root</literal> password.)</para>
</glossdef>
</glossentry>
<glossentry><glossterm>powerfail mode</glossterm>
<glossdef>
<para>When powerfail mode is turned <literal>on</literal>, Linux FailSafe
tracks the response from a node's system controller as it makes reset requests
to a cluster node. When these requests fail to reset the node successfully,
Linux FailSafe uses heuristics to try to estimate whether the machine has
been powered down. If the heuristic algorithm returns with success, Linux
FailSafe assumes the remote machine has been reset successfully. When powerfail
mode is turned <literal>off</literal>, the heuristics are not used and Linux
FailSafe may not be able to detect node power failures.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>process group</glossterm>
<glossdef>
<para>A group of application instances. Each application instance can consist
of one or more UNIX processes and spans only one node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>process membership</glossterm>
<glossdef>
<para>A list of process instances in a cluster that form a process group.
There can multiple process groups per node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource</glossterm>
<glossdef>
<para>A single physical or logical entity that provides a service to clients
or other resources. For example, a resource can be a single disk volume, a
particular network address, or an application such as a web server. A resource
is generally available for use over time on two or more nodes in a cluster,
although it can be allocated to only one node at any given time. Resources
are identified by a resource name and a resource type. Dependent resources
must be part of the same resource group and are identified in a resource dependency
list. </para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource dependency</glossterm>
<glossdef>
<para>The condition in which a resource requires the existence of other resources.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource dependency list</glossterm>
<glossdef>
<para>A list of resources upon which a resource depends. Each resource instance
must have resource dependencies that satisfy its resource type dependencies
before it can be added to a resource group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource group</glossterm>
<glossdef>
<para>A collection of resources. A resource group is identified by a simple
name; this name must be unique within a cluster. Resource groups cannot overlap;
that is, two resource groups cannot contain the same resource. All interdependent
resources must be part of the same resource group. If any individual resource
in a resource group becomes unavailable for its intended use, then the entire
resource group is considered unavailable. Therefore, a resource group is the
unit of failover.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource keys</glossterm>
<glossdef>
<para>Variables that define a resource of a given resource type. The action
scripts use this information to start, stop, and monitor a resource of this
resource type.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource name</glossterm>
<glossdef>
<para>The simple name that identifies a specific instance of a resource type.
A resource name must be unique within a given resource type.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource type</glossterm>
<glossdef>
<para>A particular class of resource. All of the resources in a particular
resource type can be handled in the same way for the purposes of failover.
Every resource is an instance of exactly one resource type. A resource type
is identified by a simple name; this name must be unique within a cluster.
A resource type can be defined for a specific node or for an entire cluster.
A resource type that is defined for a node overrides a cluster-wide resource
type definition with the same name; this allows an individual node to override
global settings from a cluster-wide resource type definition.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource type dependency</glossterm>
<glossdef>
<para>A set of resource types upon which a resource type depends. For example,
the <?Pub _nolinebreak><literal>filesystem</literal><?Pub /_nolinebreak><?Pub Caret> resource
type depends upon the <literal>volume</literal> resource type, and the <literal>
Netscape_web</literal> resource type depends upon the <?Pub _nolinebreak><literal>
filesystem</literal><?Pub /_nolinebreak> and <literal>IP_address</literal>
resource types.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource type dependency list</glossterm>
<glossdef>
<para>A list of resource types upon which a resource type depends.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>run-time failover domain</glossterm>
<glossdef>
<para>The ordered set of nodes on which the resource group can execute upon
failures, as modified by the failover script. The run-time failover domain
is used along with failover attributes to determine the node on which a resource
group should reside. See also <glossterm>initial failover domain</glossterm>.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>start/stop order</glossterm>
<glossdef>
<para>Each resource type has a start/stop order, which is a non–negative
integer. In a resource group, the start/stop orders of the resource types
determine the order in which the resources will be started when Linux FailSafe
brings the group online and will be stopped when Linux FailSafe takes the
group offline. The group's resources are started in increasing order, and
stopped in decreasing order; resources of the same type are started and stopped
in indeterminate order. For example, if resource type <literal>volume</literal>
has order 10 and resource type <literal>filesystem</literal> has order 20,
then when Linux FailSafe brings a resource group online, all volume resources
in the group will be started before all file system resources in the group.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>system controller port</glossterm>
<glossdef>
<para>A port located on a node that provides a way to power-cycle the node
remotely. One example of this in the x86-based hardware arena is the Intel
EMP (Emergency Management Port) supplied on some Intel motherboards. Enabling
or disabling a system controller port in the cluster configuration database
(CDB) tells Linux FailSafe whether it can perform operations on the system
controller port. (When the port is enabled, serial cables must attach the
port to another node, the owner host.) System controller port information
is optional for a node in the pool, but is required if the node will be added
to a cluster; otherwise resources running on that node never will be highly
available.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>tie-breaker node</glossterm>
<glossdef>
<para>A node identified as a tie-breaker for Linux FailSafe to use in the
process of computing node membership for the cluster, when exactly half the
nodes in the cluster are up and can communicate with each other. If a tie-breaker
node is not specified, Linux FailSafe will use the node with the lowest node
ID in the cluster as the tie-breaker node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>type-specific attribute</glossterm>
<glossdef>
<para>Required information used to define a resource of a particular resource
type. For example, for a resource of type <literal>filesystem</literal> you
must enter attributes for the resource's volume name (where the file system
is located) and specify options for how to mount the file system (for example,
as readable and writable).</para>
</glossdef>
</glossentry>
</glossary>
<?Pub *0000023448>