This section discusses the highly available resources that are provided on a Linux FailSafe system.
If a node crashes or hangs (for example, due to a parity error or bus error), the Linux FailSafe software detects this. A different node, determined by the failover policy, takes over the failed node's services after resetting the failed node.
If a node fails, the interfaces, access to storage, and services also become unavailable. See the succeeding sections for descriptions of how the Linux FailSafe system handles or eliminates these points of failure.
Clients access the highly available services provided by the Linux FailSafe cluster using IP addresses. Each highly available service can use multiple IP addresses. The IP addresses are not tied to a particular highly available service; they can be shared by all the highly available services in the cluster.
Linux FailSafe uses the IP aliasing mechanism to support multiple IP addresses on a single network interface. Clients can use a highly available service that uses multiple IP addresses even when there is only one network interface in the server node.
The IP aliasing mechanism allows a Linux FailSafe configuration that has a node with multiple network interfaces to be backed up by a node with a single network interface. IP addresses configured on multiple network interfaces are moved to the single interface on the other node in case of a failure.
Linux FailSafe requires that each network interface in a cluster have an IP address that does not failover. These IP addresses, called fixed IP addresses, are used to monitor network interfaces. Each fixed IP address must be configured to a network interface at system boot up time. All other IP addresses in the cluster are configured as highly available IP addresses.
Highly available IP addresses are configured on a network interface. During failover and recovery processes they are moved to another network interface in the other node by Linux FailSafe. Highly available IP addresses are specified when you configure the Linux FailSafe system. Linux FailSafe uses the ifconfig command to configure an IP address on a network interface and to move IP addresses from one interface to another.
In some networking implementations, IP addresses cannot be moved from one interface to another by using only the ifconfig command. Linux FailSafe uses re-MACing (MAC address impersonation) to support these networking implementations. Re-MACing moves the physical (MAC) address of a network interface to another interface. It is done by using the macconfig command. Re-MACing is done in addition to the standard ifconfig process that Linux FailSafe uses to move IP addresses. To do RE-MACing in Linux FailSafe, a resource of type MAC_Address is used.
Note: Re-MACing can be used only on Ethernet networks. It cannot be used on FDDI networks.
Re-MACing is required when packets called gratuitous ARP packets are not passed through the network. These packets are generated automatically when an IP address is added to an interface (as in a failover process). They announce a new mapping of an IP address to MAC address. This tells clients on the local subnet that a particular interface now has a particular IP address. Clients then update their internal ARP caches with the new MAC address for the IP address. (The IP address just moved from interface to interface.) When gratuitous ARP packets are not passed through the network, the internal ARP caches of subnet clients cannot be updated. In these cases, re-MACing is used. This moves the MAC address of the original interface to the new interface. Thus, both the IP address and the MAC address are moved to the new interface and the internal ARP caches of clients do not need updating.
Re-MACing is not done by default; you must specify that it be done for each pair of primary and secondary interfaces that requires it. A procedure in the section Section 2.5.1 describes how you can determine whether re-MACing is required. In general, routers and PC/NFS clients may require re-MACing interfaces.
A side effect of re-MACing is that the original MAC address of an interface that has received a new MAC address is no longer available for use. Because of this, each network interface has to be backed up by a dedicated backup interface. This backup interface cannot be used by clients as a primary interface. (After a failover to this interface, packets sent to the original MAC address are ignored by every node on the network.) Each backup interface backs up only one network interface.
The Linux FailSafe cluster can include shared SCSI-based storage in the form of individual disks, RAID systems, or Fibre Channel storage systems.
With mirrored volumes on the disks in a RAID or Fibre Channel system, the device system should provide redundancy. No participation of the Linux FailSafe system software is required for a disk failure. If a disk controller fails, the Linux FailSafe system software initiates the failover process.
Figure 1-2, shows disk storage takeover on a two-node system. The surviving node takes over the shared disks and recovers the logical volumes and filesystems on the disks. This process is expedited by a filesystem such as ReiserFS or XFS, because of journaling technology that does not require the use of the fsck command for filesystem consistency checking.