[BACK]Return to gloss.sgml CVS log [TXT][DIR] Up to [Development] / failsafe / FailSafe-books / LnxFailSafe_AG

File: [Development] / failsafe / FailSafe-books / LnxFailSafe_AG / gloss.sgml (download)

Revision 1.1, Wed Nov 29 21:58:28 2000 UTC (16 years, 10 months ago) by vasa
Branch: MAIN
CVS Tags: HEAD

New documentation files for the Admin Guide.

<!-- Fragment document type declaration subset:
ArborText, Inc., 1988-1997, v.4001
<!DOCTYPE SET PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
<!ENTITY ha.cluster.messages SYSTEM "figures/ha.cluster.messages.eps" NDATA eps>
<!ENTITY machine.not.in.ha.cluster SYSTEM "figures/machine.not.in.ha.cluster.eps" NDATA eps>
<!ENTITY ha.cluster.config.info.flow SYSTEM "figures/ha.cluster.config.info.flow.eps" NDATA eps>
<!ENTITY software.layers SYSTEM "figures/software.layers.eps" NDATA eps>
<!ENTITY n1n4 SYSTEM "figures/n1n4.eps" NDATA eps>
<!ENTITY example.sgml SYSTEM "example.sgml">
<!ENTITY appupgrade.sgml SYSTEM "appupgrade.sgml">
<!ENTITY a1-1.failsafe.components SYSTEM "figures/a1-1.failsafe.components.eps" NDATA eps>
<!ENTITY a1-6.disk.storage.takeover SYSTEM "figures/a1-6.disk.storage.takeover.eps" NDATA eps>
<!ENTITY a2-3.non.shared.disk.config SYSTEM "figures/a2-3.non.shared.disk.config.eps" NDATA eps>
<!ENTITY a2-4.shared.disk.config SYSTEM "figures/a2-4.shared.disk.config.eps" NDATA eps>
<!ENTITY a2-5.shred.disk.2active.cnfig SYSTEM "figures/a2-5.shred.disk.2active.cnfig.eps" NDATA eps>
<!ENTITY a2-1.examp.interface.config SYSTEM "figures/a2-1.examp.interface.config.eps" NDATA eps>
<!ENTITY intro.sgml SYSTEM "intro.sgml">
<!ENTITY overview.sgml SYSTEM "overview.sgml">
<!ENTITY planning.sgml SYSTEM "planning.sgml">
<!ENTITY nodeconfig.sgml SYSTEM "nodeconfig.sgml">
<!ENTITY admintools.sgml SYSTEM "admintools.sgml">
<!ENTITY config.sgml SYSTEM "config.sgml">
<!ENTITY operate.sgml SYSTEM "operate.sgml">
<!ENTITY diag.sgml SYSTEM "diag.sgml">
<!ENTITY recover.sgml SYSTEM "recover.sgml">
<!ENTITY clustproc.sgml SYSTEM "clustproc.sgml">
<!ENTITY appfiles.sgml SYSTEM "appfiles.sgml">
<!ENTITY preface.sgml SYSTEM "preface.sgml">
<!ENTITY index.sgml SYSTEM "index.sgml">
]>
-->
<glossary>
<title>Glossary</title>
<glossentry><glossterm>action scripts</glossterm>
<glossdef>
<para>The set of scripts that determine how a resource is started, monitored,
and stopped. There must be a set of action scripts specified for each resource
type. The possible set of action scripts is: <command>probe</command>, <command>
exclusive</command>, <command>start</command>, <command>stop</command>, <command>
monitor</command>, and <command>restart</command>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster</glossterm>
<glossdef>
<para>A collection of one or more <glossterm>cluster node</glossterm><firstterm>
s</firstterm> coupled to each other by networks or other similar interconnections.
A cluster is identified by a simple name; this name must be unique within
the <firstterm>pool</firstterm>. A particular node may be a member of only
one cluster.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster administrator</glossterm>
<glossdef>
<para>The person responsible for managing and maintaining a Linux FailSafe
cluster.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster configuration database</glossterm>
<glossdef>
<para>Contains configuration information about all resources, resource types,
resource groups, failover policies, nodes, and clusters.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>cluster node</glossterm>
<glossdef>
<para>A single Linux image. Usually, a cluster node is an individual computer.
The term <emphasis>node</emphasis> is also used in this guide for brevity.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>control messages</glossterm>
<glossdef>
<para>Messages that cluster software sends between the cluster nodes to request
operations on or distribute information about cluster nodes and resource groups.
Linux FailSafe sends control messages for the purpose of ensuring nodes and
groups remain highly available. Control messages and heartbeat messages are
sent through a node's network interfaces that have been attached to a control
network. A node can be attached to multiple control networks.</para>
</glossdef>
<glossdef>
<para>A node's control networks should not be set to accept control messages
if the node is not a dedicated Linux FailSafe node. Otherwise, end users who
run non-Linux FailSafe jobs on the machine can have their jobs killed unexpectedly
when Linux FailSafe resets the node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>control network</glossterm>
<glossdef>
<para>The network that connects nodes through their network interfaces (typically
Ethernet) such that Linux FailSafe can maintain a cluster's high availability
by sending heartbeat messages and control messages through the network to
the attached nodes. Linux FailSafe uses the highest priority network interface
on the control network; it uses a network interface with lower priority when
all higher-priority network interfaces on the control network fail.</para>
</glossdef>
<glossdef>
<para>A node must have at least one control network interface for heartbeat
messages and one for control messages (both heartbeat and control messages
can be configured to use the same interface). A node can have no more than
eight control network interfaces.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>dependency list</glossterm>
<glossdef>
<para>See <glossterm>resource dependency</glossterm><firstterm>&ensp;list
</firstterm> or <glossterm>resource type dependency</glossterm><firstterm>
&ensp;list</firstterm>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover</glossterm>
<glossdef>
<para>The process of allocating a <firstterm>resource group</firstterm> to
another <firstterm>node</firstterm> to another, according to a <firstterm>
failover policy</firstterm>. A failover may be triggered by the failure of
a resource, a change in the node membership (such as when a node fails or
starts), or a manual request by the administrator.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover attribute</glossterm>
<glossdef>
<para>A string that affects the allocation of a resource group in a cluster.
The administrator must specify system-defined attributes (such as <firstterm>
AutoFailback</firstterm> or <firstterm>ControlledFailback</firstterm>), and
can optionally supply site-specific attributes.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover domain</glossterm>
<glossdef>
<para>The ordered list of <glossterm>node</glossterm><firstterm>s</firstterm>
on which a particular <glossterm>resource group</glossterm> can be allocated.
The nodes listed in the failover domain must be within the same cluster; however,
the failover domain does not have to include every node in the cluster.The
administrator defines the <firstterm>initial failover domain</firstterm> when
creating a failover policy. This list is transformed into the <firstterm>
running</firstterm>&ensp;<firstterm>failover domain</firstterm> by the <firstterm>
failover script</firstterm>; the runtime failover domain is what is actually
used to select the failover node. Linux FailSafe stores the runtime failover
domain and uses it as input to the next failover script invocation. The initial
and runtime failover domains may be identical, depending upon the contents
of the failover script. In general, Linux FailSafe allocates a given resource
group to the first node listed in the runtime failover domain that is also
in the node membership; the point at which this allocation takes place is
affected by the <glossterm>failover attribute</glossterm><firstterm>s</firstterm>.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover policy</glossterm>
<glossdef>
<para>The method used by Linux FailSafe to determine the destination node
of a failover. A failover policy consists of a <glossterm>failover domain
</glossterm>, <glossterm>failover attribute</glossterm><firstterm>s</firstterm>,
and a <glossterm>failover script</glossterm>. A failover policy name must
be unique within the <glossterm>pool</glossterm>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>failover script</glossterm>
<glossdef>
<para>A failover policy component that generates a <firstterm>runtime failover
domain</firstterm> and returns it to the Linux FailSafe process. The Linux
FailSafe process applies the failover attributes and then selects the first
node in the returned failover domain that is also in the current node membership.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>heartbeat messages</glossterm>
<glossdef>
<para>Messages that cluster software sends between the nodes that indicate
a node is up and running. Heartbeat messages and <glossterm>control messages
</glossterm> are sent through a node's network interfaces that have been attached
to a control network. A node can be attached to multiple control networks.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>heartbeat interval</glossterm>
<glossdef>
<para>Interval between heartbeat messages. The node timeout value must be
at least 10 times the heartbeat interval for proper Linux FailSafe operation
(otherwise false failovers may be triggered). The higher the number of heartbeats
(smaller heartbeat interval), the greater the potential for slowing down the
network. Conversely, the fewer the number of heartbeats (larger heartbeat
interval), the greater the potential for reducing availability of resources.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>initial failover domain</glossterm>
<glossdef>
<para>The ordered list of nodes, defined by the administrator when a failover
policy is first created, that is used the first time a cluster is booted.The
ordered list specified by the initial failover domain is transformed into
a <glossterm>runtime failover domain</glossterm> by the <glossterm>failover
script</glossterm>; the runtime failover domain is used along with failover
attributes to determine the node on which a resource group should reside.
With each failure, the failover script takes the current runtime failover
domain and potentially modifies it; the initial failover domain is never used
again. Depending on the runtime conditions and contents of the failover script,
the initial and runtime failover domains may be identical. See also <glossterm>
runtime failover domain</glossterm>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>key/value attribute</glossterm>
<glossdef>
<para>A set of information that must be defined for a particular resource
type. For example, for the resource type <literal>filesystem</literal>, one
key/value pair might be <literal>mount_point=/fs1</literal> where <literal>
mount_point</literal> is the key and <literal>fs1</literal> is the value specific
to the particular resource being defined. Depending on the value, you specify
either a <literal>string</literal> or <literal>integer</literal> data type.
In the previous example, you would specify <literal>string</literal> as the
data type for the value<literal>&ensp;fs1</literal>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log configuration</glossterm>
<glossdef>
<para>A log configuration has two parts: a <glossterm>log level</glossterm>
and a <glossterm>log file</glossterm>, both associated with a <glossterm>
log group</glossterm>. The cluster administrator can customize the location
and amount of log output, and can specify a log configuration for all nodes
or for only one node. For example, the <command>crsd</command> log group can
be configured to log detailed level-10 messages to the <?Pub _nolinebreak><filename>
/var/log/failsafe/crsd-foo</filename><?Pub /_nolinebreak> log only on the
node <literal>foo</literal>, and to write only minimal level-1 messages to
the <command>crsd</command> log on all other nodes.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log file</glossterm>
<glossdef>
<para>A file containing Linux FailSafe notifications for a particular <glossterm>
log group</glossterm>. A log file is part of the <glossterm>log configuration
</glossterm> for a log group. By default, log files reside in the <?Pub _nolinebreak><filename>
/var/log/failsafe</filename><?Pub /_nolinebreak> directory, but the cluster
administrator can customize this. Note: Linux FailSafe logs both normal operations
and critical errors to <?Pub _nolinebreak><filename>/var/log/messages</filename><?Pub /_nolinebreak><?Pub Caret>,
as well as to individual logs for specific log groups.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log group</glossterm>
<glossdef>
<para>A set of one or more Linux FailSafe processes that use the same log
configuration. A log group usually corresponds to one Linux FailSafe daemon,
such as <command>gcd</command>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>log level</glossterm>
<glossdef>
<para>A number controlling the number of log messages that Linux FailSafe
will write into an associated log group's log file. A log level is part of
the log configuration for a log group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>node</glossterm>
<glossdef>
<para>See <glossterm>cluster node</glossterm>&ensp;</para>
</glossdef>
</glossentry>
<glossentry><glossterm>node ID</glossterm>
<glossdef>
<para>A 16-bit positive integer that uniquely defines a cluster node. During
node definition, Linux FailSafe will assign a node ID if one has not been
assigned by the cluster administrator. Once assigned, the node ID cannot be
modified.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>node membership</glossterm>
<glossdef>
<para>The list of nodes in a cluster on which Linux FailSafe can allocate
resource groups.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>node timeout</glossterm>
<glossdef>
<para>If no heartbeat is received from a node in this period of time, the
node is considered to be dead. The node timeout value must be at least 10
times the heartbeat interval for proper Linux FailSafe operation (otherwise
false failovers may be triggered).</para>
</glossdef>
</glossentry>
<glossentry><glossterm>notification command</glossterm>
<glossdef>
<para>The command used to notify the cluster administrator of changes or failures
in the cluster, nodes, and resource groups. The command must exist on every
node in the cluster.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>offline resource group</glossterm>
<glossdef>
<para>A resource group that is not highly available in the cluster. To put
a resource group in offline state, Linux FailSafe stops the group (if needed)
and stops monitoring the group. An offline resource group can be running on
a node, yet not under Linux FailSafe control. If the cluster administrator
specifies the <emphasis>detach only</emphasis> option while taking the group
offline, then Linux FailSafe will not stop the group but will stop monitoring
the group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>online resource group</glossterm>
<glossdef>
<para>A resource group that is highly available in the cluster. When Linux
FailSafe detects a failure that degrades the resource group availability,
it moves the resource group to another node in the cluster. To put a resource
group in online state, Linux FailSafe starts the group (if needed) and begins
monitoring the group. If the cluster administrator specifies the <literal>
attach only</literal> option while bringing the group online, then Linux FailSafe
will not start the group but will begin monitoring the group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>owner host</glossterm>
<glossdef>
<para>A system that can control a Linux FailSafe node remotely, such as power-cycling
the node). Serial cables must physically connect the two systems through the
node's system controller port. At run time, the owner host must be defined
as a node in the Linux FailSafe pool.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>owner TTY name</glossterm>
<glossdef>
<para>The device file name of the terminal port (TTY) on the <glossterm>owner
host</glossterm> to which the system controller serial cable is connected.
The other end of the cable connects to the Linux FailSafe node with the system
controller port, so the node can be controlled remotely by the owner host.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>pool</glossterm>
<glossdef>
<para>The entire set of <glossterm>node</glossterm><firstterm>s</firstterm>
involved with a group of clusters. The group of clusters are usually close
together and should always serve a common purpose. A replicated database is
stored on each node in the pool.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>port password</glossterm>
<glossdef>
<para>The password for the system controller port, usually set once in firmware
or by setting jumper wires. (This is not the same as the node's root password.)
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>powerfail mode</glossterm>
<glossdef>
<para>When powerfail mode is turned <literal>on</literal>, Linux FailSafe
tracks the response from a node's system controller as it makes reset requests
to a cluster node. When these requests fail to reset the node successfully,
Linux FailSafe uses heuristics to try to estimate whether the machine has
been powered down. If the heuristic algorithm returns with success, Linux
FailSafe assumes the remote machine has been reset successfully. When powerfail
mode is turned <literal>off</literal>, the heuristics are not used and Linux
FailSafe may not be able to detect node power failures.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>process membership</glossterm>
<glossdef>
<para>A list of process instances in a cluster that form a process group.
There can be one or more processes per node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource</glossterm>
<glossdef>
<para>A single physical or logical entity that provides a service to clients
or other resources. For example, a resource can be a single disk volume, a
particular network address, or an application such as a web server. A resource
is generally available for use over time on two or more <glossterm>node</glossterm><firstterm>
s</firstterm> in a <glossterm>cluster</glossterm>, although it can be allocated
to only one node at any given time. Resources are identified by a <glossterm>
resource name</glossterm> and a <glossterm>resource type</glossterm>. Dependent
resources must be part of the same <glossterm>resource group</glossterm> and
are identified in a <firstterm>resource dependency list</firstterm>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource dependency</glossterm>
<glossdef>
<para>The condition in which a resource requires the existence of other resources.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource group</glossterm>
<glossdef>
<para>A collection of <glossterm>resource</glossterm><firstterm>s</firstterm>.
A resource group is identified by a simple name; this name must be unique
within a cluster. Resource groups cannot overlap; that is, two resource groups
cannot contain the same resource. All interdependent resources must be part
of the same resource group. If any individual resource in a resource group
becomes unavailable for its intended use, then the entire resource group is
considered unavailable. Therefore, a resource group is the unit of failover
for Linux FailSafe.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource keys</glossterm>
<glossdef>
<para>Variables that define a resource of a given resource type. The action
scripts use this information to start, stop, and monitor a resource of this
resource type.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource name</glossterm>
<glossdef>
<para>The simple name that identifies a specific instance of a <glossterm>
resource type</glossterm>. A resource name must be unique within a cluster.
</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource type</glossterm>
<glossdef>
<para>A particular class of <glossterm>resource</glossterm>. All of the resources
in a particular resource type can be handled in the same way for the purposes
of <glossterm>failover</glossterm>. Every resource is an instance of exactly
one resource type. A resource type is identified by a simple name; this name
must be unique within a cluster. A resource type can be defined for a specific
node or for an entire cluster. A resource type that is defined for a node
overrides a cluster-wide resource type definition with the same name; this
allows an individual node to override global settings from a cluster-wide
resource type definition.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>resource type dependency</glossterm>
<glossdef>
<para>A set of resource types upon which a resource type depends. For example,
the <?Pub _nolinebreak><literal>filesystem</literal><?Pub /_nolinebreak> resource
type depends upon the <literal>volume</literal> resource type, and the <?Pub _nolinebreak><literal>
Netscape_web</literal><?Pub /_nolinebreak> resource type depends upon the <literal>
filesystem</literal> and <literal>IP_address</literal> resource types.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>runtime failover domain</glossterm>
<glossdef>
<para>The ordered set of nodes on which the resource group can execute upon
failures, as modified by the <glossterm>failover script</glossterm>. The runtime
failover domain is used along with failover attributes to determine the node
on which a resource group should reside.See also <glossterm>initial failover
domain</glossterm>.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>start/stop order</glossterm>
<glossdef>
<para>Each resource type has a start/stop order, which is a non-negative integer.
In a resource group, the start/stop orders of the resource types determine
the order in which the resources will be started when Linux FailSafe brings
the group online and will be stopped when Linux FailSafe takes the group offline.
The group's resources are started in increasing order, and stopped in decreasing
order; resources of the same type are started and stopped in indeterminate
order. For example, if resource type <literal>volume</literal> has order 10
and resource type <literal>filesystem</literal> has order 20, then when Linux
FailSafe brings a resource group online, all volume resources in the group
will be started before all filesystem resources in the group.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>system controller port</glossterm>
<glossdef>
<para>A port sitting on a node that provides a way to power-cycle the node
remotely. Enabling or disabling a system controller port in the cluster configuration
database (CDB) tells Linux FailSafe whether it can perform operations on the
system controller port. (When the port is enabled, serial cables must attach
the port to another node, the owner host.) System controller port information
is optional for a node in the pool, but is required if the node will be added
to a cluster; otherwise resources running on that node never will be highly
available.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>tie-breaker node</glossterm>
<glossdef>
<para>A node identified as a tie-breaker for Linux FailSafe to use in the
process of computing node membership for the cluster, when exactly half the
nodes in the cluster are up and can communicate with each other. If a tie-breaker
node is not specified, Linux FailSafe will use the node with the lowest node
ID in the cluster as the tie-breaker node.</para>
</glossdef>
</glossentry>
<glossentry><glossterm>type-specific attribute</glossterm>
<glossdef>
<para>Required information used to define a resource of a particular resource
type. For example, for a resource of type <literal>filesystem</literal>, you
must enter attributes for the resource's volume name (where the filesystem
is located) and specify options for how to mount the filesystem (for example,
as readable and writable).</para>
</glossdef>
</glossentry>
</glossary>
<?Pub *0000023837>