Overview of the Linux FailSafe SystemThis chapter provides an overview of the components and operation of
the Linux FailSafe system. It contains these major sections:High Availability and Linux FailSafeIn the world of mission critical computing, the availability of information
and computing resources is extremely important. The availability of a system
is affected by how long it is unavailable after a failure in any of its components.
Different degrees of availability are provided by different types of systems:
fault-tolerant systems,
definition Fault-tolerant systems (continuous availability).
These systems use redundant components and specialized logic to ensure continuous
operation and to provide complete data integrity. On these systems the degree
of availability is extremely high. Some of these systems can also tolerate
outages due to hardware or software upgrades (continuous availability). This
solution is very expensive and requires specialized hardware and software.
Highly available systems. These systems survive single points
of failure by using redundant off-the-shelf components and specialized software.
They provide a lower degree of availability than the fault-tolerant systems,
but at much lower cost. Typically these systems provide high availability
only for client/server applications, and base their redundancy on cluster
architectures with shared resources.The Silicon Graphics® Linux FailSafe product provides a general
facility for providing highly available services. Linux FailSafe provides
highly available services for a cluster that contains multiple nodes (
N-node configuration). Using Linux FailSafe, you can configure
a highly available system in any of the following topologies:Basic two-node configurationRing configurationStar configuration, in which multiple applications running
on multiple nodes are backed up by one nodeSymmetric pool configurationThese configurations provide redundancy of processors and I/O controllers.
Redundancy of storage can either be obtained through the use of multi-hosted
RAID disk devices and mirrored disks, or with redundant disk systems which
are kept in synchronization.If one of the nodes in the cluster or one of the nodes' components fails,
a different node in the cluster restarts the highly available services of
the failed node. To clients, the services on the replacement node are indistinguishable
from the original services before failure occurred. It appears as if the original
node has crashed and rebooted quickly. The clients notice only a brief interruption
in the highly available service.In a Linux FailSafe highly available system, nodes can serve as backup
for other nodes. Unlike the backup resources in a fault-tolerant system, which
serve purely as redundant hardware for backup in case of failure, the resources
of each node in a highly available system can be used during normal operation
to run other applications that are not necessarily highly available services.
All highly available services are owned and accessed by one node at a time.
Highly available services are monitored by the Linux FailSafe software.
During normal operation, if a failure is detected on any of these components,
a failover process is initiated. Using Linux FailSafe,
you can define a failover policy to establish which node will take over the
services under what conditions. This process consists of resetting the failed
node (to ensure data consistency), doing any recovery required by the failed
over services, and quickly restarting the services on the node that will take
them over.Linux FailSafe supports selective failover in
which individual highly available applications can be failed over to a backup
node independent of the other highly available applications on that node.
Linux FailSafe highly available services fall into two groups: highly
available resources and highly available applications. Highly available resources
include network interfaces, logical volumes, and filesystems such as ext2f
or reiserfs that have been configured for Linux FailSafe. Silicon Graphics
has also developed Linux FailSafe NFS. Highly available applications can include
applications such as NFS, Apache, etc.Linux FailSafe
See Linux FailSafe Linux FailSafe provides
a framework for making additional applications into highly available services.
If you want to add highly available applications on a Linux FailSafe cluster,
you must write scripts to handle application monitoring functions. Information
on developing these scripts is described in the Linux FailSafe
Programmer's Guide. If you need assistance in this regard, contact
SGI Global Services, which offers custom Linux FailSafe agent development
and HA integration services.ConceptsIn order to use Linux FailSafe, you must understand the concepts in
this section.conceptsCluster Node (or Node)A cluster node is a single Linux execution environment.
In other words, a single physical or virtual machine. In current Linux environments
this will always be an individual computer. The term node
is used to indicate this meaning in this guide for brevity, as opposed to
any meaning such as a network node. cluster nodenodePoolA pool is the entire set of nodes having membership
in a group of clusters. The clusters are usually close together and should
always serve a common purpose. A replicated cluster configuration database
is stored on each node in the pool.
poolClusterA cluster is a collection of one or more nodes
coupled to each other by networks or other similar interconnections. A cluster
belongs to one pool and only one pool. A cluster is identified by a simple
name; this name must be unique within the pool. A particular node may be
a member of only one cluster. All nodes in a cluster are also in the pool;
however, all nodes in the pool are not necessarily in the cluster.clusterNode MembershipA node membership is the list of nodes in a cluster
on which Linux FailSafe can allocate resource groups.node membershipmembershipProcess MembershipA process
membershipprocess membership
is the list of process instances in a cluster that form a process group. There
can be multiple process groups per node.ResourceA resource is a single physical or logical entity
that provides a service to clients or other resources. For example, a resource
can be a single disk volume, a particular network address, or an application
such as a web server. A resource is generally available for use over time
on two or more nodes in a cluster, although it can only be allocated to one
node at any given time. resourcedefinitionResources are identified by a resource name and a resource type. One
resource can be dependent on one or more other resources; if so, it will not
be able to start (that is, be made available for use) unless the dependent
resources are also started. Dependent resources must be part of the same resource
group and are identified in a resource dependency list.Resource TypeA resource type is a particular class of resource.
All of the resources in a particular resource type can be handled in the same
way for the purposes of failover. Every resource is an instance of exactly
one resource type.resource typedescriptionA resource type is identified by a simple name; this name should be
unique within the cluster. A resource type can be defined for a specific node,
or it can be defined for an entire cluster. A resource type definition for
a specific node overrides a clusterwide resource type definition with the
same name; this allows an individual node to override global settings from
a clusterwide resource type definition.Like resources, a resource type can be dependent on one or more other
resource types. If such a dependency exists, at least one instance of each
of the dependent resource types must be defined. For example, a resource type
named Netscape_web might have resource type dependencies
on resource types named IP_address and volume
. If a resource named web1 is defined with the
Netscape_web resource type, then the resource group containing
web1 must also contain at least one resource of the type
IP_address and one resource of the type volume.
The Linux FailSafe software includes some predefined resource types.
If these types fit the application you want to make highly available, you
can reuse them. If none fit, you can create additional resource types by using
the instructions in the Linux FailSafe Programmer's Guide.
Resource NameA resource name identifies a specific instance
of a resource type. A resource name must be unique for a given resource type.resourcenameResource GroupA resource group is a collection of interdependent
resources. A resource group is identified by a simple name; this name must
be unique within a cluster. shows an example
of the resources and their corresponding resource types for a resource group
named WebGroup.
resource groupdefinition
Example Resource GroupResourceResource Type10.10.48.22IP_address/fs1filesystemvol1volumeweb1Netscape_web
If any individual resource in a resource group becomes unavailable for
its intended use, then the entire resource group is considered unavailable.
Therefore, a resource group is the unit of failover.Resource groups cannot overlap; that is, two resource groups cannot
contain the same resource.Resource Dependency ListA resource dependency list is a list of resources
upon which a resource depends. Each resource instance must have resource dependencies
that satisfy its resource type dependencies before it can be added to a resource
group.Resource Type Dependency ListA resource type dependency list is a list of
resource types upon which a resource type depends. For example, the
filesystem resource type depends upon the volume
resource type, and the Netscape_web resource type depends
upon the filesystem and IP_address resource
types.resource type
dependency list
dependency listFor example, suppose a file system instance fs1 is
mounted on volume vol1. Before fs1 can
be added to a resource group, fs1 must be defined to depend
on vol1. Linux FailSafe only knows that a file system instance
must have one volume instance in its dependency list. This requirement is
inferred from the resource type dependency list. resourcedependency listFailoverA failover is the process of allocating a resource
group (or application) to another node, according to a failover policy. A
failover may be triggered by the failure of a resource, a change in the node
membership (such as when a node fails or starts), or a manual request by the
administrator.failoverFailover PolicyA failover policy is the method used by Linux
FailSafe to determine the destination node of a failover. A failover policy
consists of the following:Failover domainFailover attributesFailover scriptLinux FailSafe uses the failover domain output from a failover script
along with failover attributes to determine on which node a resource group
should reside.The administrator must configure a failover policy for each resource
group. A failover policy name must be unique within the pool. Linux FailSafe
includes predefined failover policies, but you can define your own failover
algorithms as well. failover policy
Failover DomainA failover domain is the ordered list of nodes
on which a given resource group can be allocated. The nodes listed in the
failover domain must be within the same cluster; however, the failover domain
does not have to include every node in the cluster.failover domaindomain
application failover domainThe administrator defines the initial failover domain when creating
a failover policy. This list is transformed into a run-time failover domain
by the failover script; Linux FailSafe uses the run-time failover domain along
with failover attributes and the node membership to determine the node on
which a resource group should reside. Linux FailSafe stores the run-time failover
domain and uses it as input to the next failover script invocation. Depending
on the run-time conditions and contents of the failover script, the initial
and run-time failover domains may be identical.In general, Linux FailSafe allocates a given resource group to the first
node listed in the run-time failover domain that is also in the node membership;
the point at which this allocation takes place is affected by the failover
attributes.Failover AttributeA failover attribute is a string that affects
the allocation of a resource group in a cluster. The administrator must specify
system attributes (such as Auto_Failback or
Controlled_Failback), and can optionally supply
site-specific attributes.failover attributes
Failover ScriptsA failover script is a shell script that generates
a run-time failover domain and returns it to the Linux FailSafe process. The
Linux FailSafe process ha_fsd applies the failover attributes
and then selects the first node in the returned failover domain that is also
in the current node membership.failover
scriptdescriptionThe following failover scripts are provided with the Linux FailSafe
release:ordered, which never changes the initial
failover domain. When using this script, the initial and run-time failover
domains are equivalent.round-robin, which selects the resource
group owner in a round-robin (circular) fashion. This policy can be used for
resource groups that can be run in any node in the cluster.If these scripts do not meet your needs, you can create a new failover
script using the information in this guide.Action ScriptsThe action scripts are the set of scripts that
determine how a resource is started, monitored, and stopped. There must be
a set of action scripts specified for each resource type.action scriptsThe following is the complete set of action scripts that can be specified
for each resource type:exclusive, which verifies that a resource
is not already runningstart, which starts a resourcestop, which stops a resourcemonitor, which monitors a resourcerestart, which restarts a resource on the
same server after a monitoring failure occursThe release includes action scripts for predefined resource types. If
these scripts fit the resource type that you want to make highly available,
you can reuse them by copying them and modifying them as needed. If none fits,
you can create additional action scripts by using the instructions in the
Linux FailSafe Programmer's Guide.Additional Linux FailSafe FeaturesLinux FailSafefeaturesLinux FailSafe provides the following features to increase the
flexibility and ease of operation of a highly available system:Dynamic managementFine grain failoverLocal restartsThese features are summarized in the following sections.Dynamic ManagementLinux FailSafe allows you to perform a variety of administrative tasks
while the system is running:Dynamically managed application monitoringLinux FailSafe allows you to turn monitoring of an application on and
off while other highly available applications continue to run. This allows
you to perform online application upgrades without bringing down the Linux
FailSafe system.Dynamically managed Linux FailSafe resourcesLinux FailSafe allows you to add resources while the Linux FailSafe
system is online.Dynamically managed Linux FailSafe upgradesLinux FailSafe allows you to upgrade Linux FailSafe software on one
node at a time without taking down the entire Linux FailSafe cluster.Fine Grain FailoverUsing Linux FailSafe, you can specify fine-grain failover
. Fine-grain failover is a process in which a specific resource
group is failed over from one node to another node while other resource groups
continue to run on the first node, where possible. Fine-grain failover is
possible in Linux FailSafe because the unit of failover is the resource group,
and not the entire node.Local RestartsLinux FailSafe allows you to fail over a resource group onto the same
node. This feature enables you to configure a single-node system, where backup
for a particular application is provided on the same machine, if possible.
It also enables you to indicate that a specified number of local restarts
be attempted before the resource group fails over to a different node.Linux FailSafe AdministrationYou can perform all Linux FailSafe administrative tasks by means of
the Linux FailSafe Cluster Manager Graphical User Interface (GUI). The Linux
FailSafe GUI provides a guided interface to configure, administer, and monitor
a Linux FailSafe-controlled highly available cluster. The Linux FailSafe GUI
also provides screen-by-screen help text.If you wish, you can perform Linux FailSafe administrative tasks directly
by means of the Linux FailSafe Cluster Manager CLI, which provides a command-line
interface for the administration tasks.For information on Linux FailSafe Cluster Manager tools, see .
For information on Linux FailSafe configuration and administration tasks,
see , and .
Hardware Components of a Linux FailSafe Cluster
Linux FailSafehardware components
, shows an example
of Linux FailSafe hardware components, in this case for a two-node system.
Sample Linux FailSafe System ComponentsThe hardware components of the Linux FailSafe system are as follows:
Up to eight Linux nodesTwo or more interfaces on each node to control networks (Ethernet,
FDDI, or any other available network interface)At least two network interfaces on each node are required for the control
network heartbeat connection, by which each node monitors
the state of other nodes. The Linux FailSafe software also uses this connection
to pass control messages between nodes. These interfaces
have distinct IP addresses.A mechanism for remote reset of nodesA reset ensures that the failed node is not using the shared disks when
the replacement node takes them over.Disk storage and SCSI bus shared by the nodes in the cluster
The nodes in the Linux FailSafe system can share dual-hosted disk storage
over a shared fast and wide SCSI bus where this is supported by the SCSI controller
and Linux driver. Note that few Linux drivers are currently known to implement this correctly.
Please check hardware compatibility lists if this is a configuration you
plan to use. Fibre Channel solutions should universally support this.The Linux FailSafe system is designed to survive a single point of failure.
Therefore, when a system component fails, it must be restarted, repaired,
or replaced as soon as possible to avoid the possibility of two or more failed
components.Linux FailSafe Disk ConnectionsA Linux FailSafe system supports the following disk connections:RAID supportSingle controller or dual controllersSingle or dual hubsSingle or dual pathingJBOD supportSingle or dual vaultsSingle or dual hubsNetwork-mirrored supportClustered filesystems such as GFSNetwork mirroring block devices such as with DRBDNetwork mirrored devices are not discussed in the examples within this
guide. However, the Linux FailSafe configuration items that are set for shared
storage apply validly to network-duplicated storage.SCSI disks can be connected to two machines only. Fibre channel disks
can be connected to multiple machines.Linux FailSafe Supported ConfigurationsLinux FailSafe supports the following highly available configurations:
Basic two-node configurationStar configuration of multiple primary and 1 backup nodeRing configurationYou can use the following reset models when configuring a Linux FailSafe
system:Server-to-server. Each server is directly connected to another
for reset. May be unidirectional.Network. Each server can reset any other by sending a signal
over the control network to a multiplexer.The following sections provide descriptions of the different Linux FailSafe
configurations.Basic Two-Node ConfigurationIn a basic two-node configuration, the following arrangements are possible:
All highly available services run on one node. The other node
is the backup node. After failover, the services run on the backup node. In
this case, the backup node is a hot standby for failover purposes only. The
backup node can run other applications that are not highly available services.
Highly available services run concurrently on both nodes.
For each service, the other node serves as a backup node. For example, both
nodes can be exporting different NFS filesystems. If a failover occurs, one
node then exports all of the NFS filesystems.Highly Available ResourcesThis section discusses the highly available resources that are provided
on a Linux FailSafe system.NodesIf a node crashes or hangs (for example, due to a parity error or bus
error), the Linux FailSafe software detects this. A different node, determined
by the failover policy, takes over the failed node's services after resetting
the failed node.If a node fails, the interfaces, access to storage, and services also
become unavailable. See the succeeding sections for descriptions of how the
Linux FailSafe system handles or eliminates these points of failure.Network Interfaces and IP Addressesnetwork interfaceoverviewIP addressoverviewClients access the highly available services provided by the Linux
FailSafe cluster using IP addresses. Each highly available service can use
multiple IP addresses. The IP addresses are not tied to a particular highly
available service; they can be shared by all the highly available services
in the cluster.Linux FailSafe uses the IP aliasing mechanism to support multiple IP
addresses on a single network interface. Clients can use a highly available
service that uses multiple IP addresses even when there is only one network
interface in the server node.The IP aliasing mechanism allows a Linux FailSafe configuration that
has a node with multiple network interfaces to be backed up by a node with
a single network interface. IP addresses configured on multiple network interfaces
are moved to the single interface on the other node in case of a failure.
Linux FailSafe requires that each network interface in a cluster have
an IP address that does not failover. These IP addresses, called
fixed IP addresses, are used to monitor network interfaces. Each
fixed IP address must be configured to a network interface at system boot
up time. All other IP addresses in the cluster are configured as
highly available IP addresses.Highly available IP addresses are configured on a network interface.
During failover and recovery processes they are moved to another network interface
in the other node by Linux FailSafe. Highly available IP addresses are specified
when you configure the Linux FailSafe system. Linux FailSafe uses the
ifconfig command to configure an IP address on a network interface
and to move IP addresses from one interface to another.In some networking implementations, IP addresses cannot be moved from
one interface to another by using only the ifconfig command.
Linux FailSafe uses re-MACing (MAC address
impersonation) to support these networking implementations. Re-MACing
moves the physical (MAC) address of a network interface to another interface.
It is done by using the macconfig command. Re-MACing is
done in addition to the standard ifconfig process that
Linux FailSafe uses to move IP addresses. To do RE-MACing in Linux FailSafe,
a resource of type MAC_Address is used.Re-MACing can be used only on Ethernet networks. It cannot be used on
FDDI networks.Re-MACing is required when packets called gratuitous ARP packets are
not passed through the network. These packets are generated automatically
when an IP address is added to an interface (as in a failover process). They
announce a new mapping of an IP address to MAC address. This tells clients
on the local subnet that a particular interface now has a particular IP address.
Clients then update their internal ARP caches with the new MAC address for
the IP address. (The IP address just moved from interface to interface.) When
gratuitous ARP packets are not passed through the network, the internal ARP
caches of subnet clients cannot be updated. In these cases, re-MACing is used.
This moves the MAC address of the original interface to the new interface.
Thus, both the IP address and the MAC address are moved to the new interface
and the internal ARP caches of clients do not need updating.Re-MACing is not done by default; you must specify that it be done for
each pair of primary and secondary interfaces that requires it. A procedure
in the section describes how you can determine
whether re-MACing is required. In general, routers and PC/NFS clients may
require re-MACing interfaces.A side effect of re-MACing is that the original MAC address of an interface
that has received a new MAC address is no longer available for use. Because
of this, each network interface has to be backed up by a dedicated backup
interface. This backup interface cannot be used by clients as a primary interface.
(After a failover to this interface, packets sent to the original MAC address
are ignored by every node on the network.) Each backup interface backs up
only one network interface.DisksThe Linux FailSafe cluster can include shared SCSI-based storage in
the form of individual disks, RAID systems, or Fibre Channel storage systems.
disks, sharedand disk failure
disks, shared
and disk controller failureWith mirrored volumes on
the disks in a RAID or Fibre Channel system, the device system should provide
redundancy. No participation of the Linux FailSafe system software is required
for a disk failure. If a disk controller fails, the Linux FailSafe system
software initiates the failover process.failoverof disk storage, shows disk storage takeover on
a two-node system. The surviving node takes over the shared disks and recovers
the logical volumes and filesystems on the disks. This process is expedited
by a filesystem such as ReiserFS or XFS, because of journaling technology
that does not require the use of the fsck command for filesystem
consistency checking.Disk Storage Failover on a Two-Node SystemHighly Available ApplicationsEach application has a primary node and up to seven additional nodes
that you can use as a backup node, according to the failover policy you define.
The primary node is the node on which the application runs when Linux FailSafe
is in normal state. When a failure of any highly available
resources or highly available application is detected by Linux FailSafe software,
all highly available resources in the affected resource group on the failed
node are failed over to a different node and the highly available applications
on the failed node are stopped. When these operations are complete, the highly
available applications are started on the backup node.All information about highly available applications, including the primary
node, components of the resource group, and failover policy for the application
and monitoring, is specified when you configure your Linux FailSafe system
with the Cluster Manager GUI or with the Cluster Manager CLI. Information
on configuring the system is provided in .
Monitoring scripts detect the failure of a highly available application.The Linux FailSafe software provides a framework for making applications
highly available services. By writing scripts and configuring the system in
accordance with those scripts, you can turn client/server applications into
highly available applications. For information, see the Linux
FailSafe Programmer's Guide.Failover and Recovery Processesfailoverdescriptionfailoverand recovery
processesWhen a failure is detected on one node (the
node has crashed, hung, or been shut down, or a highly available service is
no longer operating), a different node performs a failover of the highly available
services that are being provided on the node with the failure (called the
failed node). Failover allows all of the highly available services,
including those provided by the failed node, to remain available within the
cluster.A failure in a highly available service can be detected by Linux FailSafe
processes running on another node. Depending on which node detects the failure,
the sequence of actions following the failure is different.If the failure is detected by the Linux FailSafe software running on
the same node, the failed node performs these operations:Stops the highly available resource group running on the node
Moves the highly available resource group to a different node,
according to the defined failover policy for the resource groupSends a message to the node that will take over the services
to start providing all resource group services previously provided by the
failed nodeWhen it receives the message, the node that is taking over the resource
group performs these operations:Transfers ownership of the resource group from the failed
node to itselfStarts offering the resource group services that were running
on the failed nodeIf the failure is detected by Linux FailSafe software running on a different
node, the node detecting the failure performs these operations:Using the serial connection between the nodes, reboots the
failed node to prevent corruption of dataTransfers ownership of the resource group from the failed
node to the other nodes in the cluster, based on the resource group failover
policy.Starts offering the resource group services that were running
on the failed nodeWhen a failed node comes back up, whether the node automatically starts
to provide highly available services again depends on the failover policy
you define. For information on defining failover policies, see .
Normally, a node that experiences a failure automatically reboots and
resumes providing highly available services. This scenario works well for
transient errors (as well as for planned outages for equipment and software
upgrades). However, if there are persistent errors, automatic reboot can cause
recovery and an immediate failover again. To prevent this, the Linux FailSafe
software checks how long the rebooted node has been up since the last time
it was started. If the interval is less than five minutes (by default), the
Linux FailSafe software automatically disables Linux FailSafe from booting
on the failed node and does not start up the Linux FailSafe software on this
node. It also writes error messages to /var/log/failsafe
and to the appropriate log file.Overview of Configuring and Testing a New Linux
FailSafe ClusterAfter the Linux FailSafe cluster hardware has been installed, follow
this general procedure to configure and test the Linux FailSafe system:Become familiar with Linux FailSafe terms by reviewing this
chapter.Plan the configuration of highly available applications and
services on the cluster using .Perform various administrative tasks, including the installation
of prerequisite software, that are required by Linux FailSafe, as described
in .Define the Linux FailSafe configuration as explained in .Test the Linux FailSafe system in three phases: test individual
components prior to starting Linux FailSafe software, test normal operation
of the Linux FailSafe system, and simulate failures to test the operation
of the system after a failure occurs.Linux FailSafe System Software This section describes the software layers, communication paths, and
cluster configuration database.LayersA Linux FailSafe system has the following software layers:system softwarelayerslayersPlug-ins, which create highly available services. If the
application plug-in you want is not available, you can hire the Silicon Graphics
Global Services group to develop the required software, or you can use the
Linux FailSafe Programmer's Guide to write the software yourself.plug-insLinux FailSafe base, which includes the ability to define
resource groups and failover policies
baseHigh-availability cluster infrastructure that lets you define
clusters, resources, and resource types (this consists of the cluster_services
installation package) infrastructure
high-availability
infrastructurecluster_ha subsystemcluster_admin subsystemcluster_control
subsystemCluster software infrastructure, which lets you do the following:
Perform node loggingAdminister the clusterDefine nodesThe cluster software infrastructure consists of the
cluster_admin and
cluster_control subsystems). shows a graphic representation of these
layers. describes the layers for Linux FailSafe,
which are located in the /usr/lib/failsafe/bin directory.
Software Layers
Contents of /usr/lib/failsafe/binLayerSubsystemProcess
DescriptionLinux FailSafe Basefailsafe2ha_fsdLinux FailSafe daemon. Provides basic
component of the Linux FailSafe software.
failsafe2 subsystemha_fsd processHigh-availability cluster infrastructure
cluster_ha ha_cmsdCluster membership daemon. Provides
the list of nodes, called node membership, available
to the cluster.ha_cmsd
process
cluster_ha subsystemha_gcdGroup membership daemon. Provides group
membership and reliable communication services in the presence of failures
to Linux FailSafe processes
ha_gcd process.ha_srmdSystem resource manager daemon. Manages
resources, resource groups, and resource types. Executes action scripts for
resources.ha_srmd
processha_ifdInterface agent daemon. Monitors the
local node's network interfaces.
ha_ifd processCluster software infrastructurecluster_admin cad Cluster administration daemon. Provides
administration services.cluster_admin
subsystemcluster_control
crsd Node control daemon. Monitors the serial
connection to other nodes. Has the ability to reset other nodes.cluster_control subsystem
crsd
processcmondDaemon that manages all other daemons.
This process starts other processes in all nodes in the cluster and restarts
them on failures.cdbdManages the configuration database
and keeps each copy in sync on all nodes in the pool
Communication PathsThe following figures show communication paths in Linux FailSafe. Note
that they do not represent cmond. communication pathssystem softwarecommunication pathsread/write actions to
the cluster configuration database diagram Read/Write Actions to the Cluster Configuration Database shows the communication path for a node
that is in the pool but not in a cluster.
node not in a cluster diagramCommunication Path for a Node that is Not in a Cluster
Conditions Under Which Action Scripts are ExecutedAction scripts are executed under the following conditions:exclusive: the resource group is made online
by the user or HA processes are startedstart: the resource group is made online
by the user, HA processes are started, or there is a resource group failover
stop: the resource group is made offline,
HA process are stopped, the resource group fails over, or the node is shut
downmonitor: the resource group is online
restart: the monitor
script failsWhen Does FailSafe Execute Action and Failover Scripts
The order of execution is as follows:Linux FailSafe is started, usually at node boot or manually,
and reads the resource group information from the cluster configuration database.
Linux FailSafe asks the system resource manager (SRM) to run
exclusive scripts for all resource groups that are in the
Online ready state.SRM returns one of the following states for each resource
group:runningpartially runningnot runningIf a resource group has a state of not running
in a node where HA services have been started, the following occurs:Linux FailSafe runs the failover policy script associated
with the resource group. The failover policy scripts take the list of nodes
that are capable of running the resource group (the failover domain
) as a parameter.The failover policy script returns an ordered list of nodes
in descending order of priority (the run-time failover domain)
where the resource group can be placed.Linux FailSafe sends a request to SRM to move the resource
group to the first node in the run-time failover domain.SRM executes the start action script for
all resources in the resource group:If the start script fails, the resource
group is marked online on that node with an srmd
executable error error.If the start script is successful, SRM
automatically starts monitoring those resources. After the specified start
monitoring time passes, SRM executes the monitor action
script for the resource in the resource group.If the state of the resource group is running
or partially running on only one node in the cluster,
Linux FailSafe runs the associated failover policy script:If the highest priority node is the same node where the resource
group is partially running or running, the resource group is made online on
the same node. In the partially running case, Linux FailSafe
asks SRM to execute start scripts for resources in the
resource group that are not running.If the highest priority node is a another node in the cluster,
Linux FailSafe asks SRM to execute stop action scripts
for resources in the resource group. Linux FailSafe makes the resource group
online in the highest priority node in the cluster.If the state of the resource group is running
or partially running in multiple nodes in the cluster,
the resource group is marked with an error exclusivity error.
These resource groups will require operator intervention to become online
in the cluster. shows the message paths for action
scripts and failover policy scripts.
message paths diagramMessage Paths for Action Scripts and Failover Policy
ScriptsComponentsThe cluster configuration database is a key component of Linux FailSafe
software. It contains all information about the following:system softwarecomponentscomponentsResourcesResource typesResource groupsFailover policiesNodesClustersThe cluster configuration database daemon (cdbd)
maintains identical databases on each node in the cluster.cluster administration daemonadministration daemonThe following are the contents of the failsafe directories under the
/usr/lib and /var hierarchies:/var/run/failsafe/comm/Directory that contains files that communicate between various daemons.
/usr/lib/failsafe/common_scripts/Directory that contains the script library (the common functions that
may be used in action scripts)./var/log/failsafe/Directory that contains the logs of all scripts and daemons executed
by Linux FailSafe. The outputs and errors from the commands within the scripts
are logged in the script_nodename
file./usr/lib/failsafe/policies/Directory that contains the failover scripts used for resource groups.
/usr/lib/failsafe/resource_types/templateDirectory that contains the template action scripts./usr/lib/failsafe/resource_types/rt_name
Directory that contains the action scripts for the rt_name
resource type. For example, /usr/lib/failsafe/resource_types/filesystem
.resource_types/rt_name/exclusive
Script that verifies that a resource of this resource type is not already
running. For example, resource_types/filesystem/exclusive.
resource_types/rt_name/monitor
Script that monitors a resource of this type.resource_types/rt_name/restart
Script that restarts a resource of this resource type on the same node
after a monitoring failure.resource_types/rt_name/start
Script that starts a resource of this resource type.resource_types/rt_name/stop
Script that stops a resource of this resource type. shows the administrative commands available
for use in scripts.
ha_cilog commandadministrative commandscommandsAdministrative Commands for Use in ScriptsCommandPurposeha_cilogLogs messages to the script_
nodename log
files.log messagesmessage loggingha_execute_lockExecutes a command with a file lock.
This allows command execution to be serializedha_exec2Executes a command and retries the
command on failure or timeout.monitoring
processesprocessmonitoringha_cilog commandha_filelockLocks a file.ha_filelock commandlock a filefile locking and unlockingha_fileunlockUnlocks a file.ha_fileunlock commandunlock a fileha_ifdadminCommunicates with the ha_ifd
network interface agent daemon.
ha_ifdadmin commandcommunicate with the network interface agent daemonha_http_ping2 Checks if a web server is running.ha_http_ping2 commandNetscape node checkha_macconfig2 Displays or modifies MAC addresses
of a network interface.ha_macconfig2
command
MAC address modification and display