Writing the Action Scripts and Adding Monitoring
AgentsThis chapter provides information about writing the action scripts required
to make an application highly available and how to add monitoring agents.
It discusses the following topics:Set of Action ScriptsMultiple instances of scripts may be executed at the same time. For
more information, see .The following set of action scripts can be provided for each resource:action scriptsset of scripts
exclusive, which verifies that the resource
is not already runningexclusive script
definitionstart, which starts the resourcestart scriptdefinitionstop, which stops the resourcestop scriptdefinitionmonitor, which monitors the resourcemonitor scriptdefinitionrestart, which restarts the resource on
the same node when a monitoring failure occursrestart
scriptdefinitionThe start, stop, and
exclusive scripts are required for every resource type.The start and stop scripts must
be idempotent; that is, an action requested multiple
times successively should continue to return success, and should have no side-effects.
For example, if the start script is run for a resource
that is already started, the script must not return an error.A monitor script is required, but if you wish it
may contain only a return-success function. A restart script
is required if the application must have a restart ability on the same node
in case of failure. However, the restart script may contain
only a return-success function. action
scriptsrequiredaction scriptsoptionalUnderstanding the Execution of Action ScriptsBefore you can write a new action script, you must understand how action
scripts are executed. This section covers the following topics:Multiple Instances of Script Executed at the Same
TimeMultiple instances of the same script may be executed at the same time.
To avoid problems this may cause, you can use the ha_filelock
and ha_execute_lock commands to achieve sequential execution
of commands in different instances of the same script.For example, consider a script which modifies a configuration file to
start a new application instance. Multiple instances of the script modifying
the file simultaneously could cause file corruption and data loss. The start
script for the application should use ha_execute_lock when
executing the modification script to ensure correct configuration file modification.
Assuming the script is named modify_configuration_file,
the start script would contain a statement similar to the following:${HA_CMDSPATH}/ha_execute_lock 30
${HA_SCRIPTTMPDIR}/lock.volume_assemble \"modify_configuration_file\"
The ha_execute_lock command takes 3 arguments:Number of seconds before the command times out waiting for
the file lockFile to be used for lockingCommand to be executedThe ha_execute_lock command tries to obtain a lock
on the file every second for timeout seconds. After
obtaining a lock on the file, it executes the command argument. On command
completion, it releases lock on the file.Differences between the exclusive and
monitor ScriptsAlthough the same check can be used in monitor and
exclusive action scripts, they are used for different purposes. summarizes the differences between the scripts.
Differences Between the monitor
and exclusive Action Scriptsexclusive
monitorExecuted in all nodes in the cluster.
Executed only on the node where the
resource group (which contains the resource) is online.Executed before the resource is started
in the cluster.Executed when the resource is online
in the cluster. (The monitor script could degrade the services
provided by the HA server. Therefore, the check performed by the
monitor script should be lightweight and less time consuming than
the check performed by the exclusive script))Executed only once before the resource
group is made online in the cluster.Executed periodically.Failure will result in resource group
not becoming online in the cluster.Failure will cause a resource group
failover to another node or a restart of the resource in the local node. An
error will cause false resource group failovers in the cluster.
Successful Execution of Action Scripts shows the state of a resource group
after the successful execution of an action script for every resource within
a resource group. To view the state of a resource group, use the Cluster Manager
graphical user interface (GUI) or the cluster_mgr command.
action scriptssuccessful execution resultsSuccessful Action Script Results EventAction Script to ExecuteResource Group StateResource group is made online on a
nodestartonlineresource groupstatesResource group is made offline on a
nodestopofflineOnline status of the resource group
exclusive(No effect)Normal monitoring of online resource
groupmonitoronlineResource group monitoring failurerestartonline
Failure of Action Scripts shows the state of the resource group
and the error state when an action script fails.
action scriptsfailure ofFailure of an Action ScriptFailing Action Script
Resource Group State
Error Stateexclusiveonlineexclusivitymonitoronlinemonitoring failurerestartonlinemonitoring failurestartonlinesrmd executable errorstoponlinesrmd executable error
Implementing Timeouts and Retrying a CommandYou can use the ha_exec2 command to execute action
scripts using timeouts. This allows the action script to be completed within
the specified time, and permits proper error messages to be logged on failure
or timeout. The retry variable is especially useful
in monitor and exclusive action scripts.
To retry a command, use the following syntax:/usr/lib/failsafe/bin/ha_exec2
timeout_in_seconds number_of_retries command_to_be_executedFor example:${HA_CMDSPATH}/ha_exec2 30 2 "umount /fs"
The above ha_exec2 command executes the
umount /fs command line. If the command does not complete within
30 seconds, it kills the umount command and retries the
command. The ha_exec2 command retries the umount
command 2 times if it times out or fails.For more information, see the ha_exec2 man page.
Sending UNIX SignalsYou can use the ha_exec2 command to send UNIX signals
to specific process. A process is identified by its name or its arguments.
For example:${HA_CMDSPATH}/ha_exec2 -s 0 -t "knfsd"
The above command sends signal 0 (checks if the process exists) to all
processes whose name or arguments match the string knfsd.
The command returns 0 if it is a success.You should use the ha_exec2 command to check for
server processes in the monitor script instead of using
a ps -ef | grep command line construction, for performance
and speed considerations. For more information, see the ha_exec2 man page.
PreparationBefore you can write the action scripts, you must do the following:action scriptspreparation for
writing scriptsUnderstand the scriptlib functions described
in .Familiarize yourself with the script templates provided in
the following directory: /usr/lib/failsafe/resource_types/template
action
scriptstemplatestemplatesaction scriptsRead the man pages for the following commands:cluster_mgrcdbdha_cilogha_cmsdha_exec2ha_fsdha_gcdha_ifdha_ifdadminha_macconfig2ha_srmdha_statd2haStatusFamiliarize yourself with the action scripts for other highly
available services in /usr/lib/failsafe/resource_types
action
scriptsresource types provided
that are similar to the scripts you wish to create.Understand how to do the following actions for your application:
Verify that the resource is runningVerify that the resource can be runStart the resourceStop the resourceCheck for the server processesDo a simple query as a client and understand the expected
responseCheck for configuration file or directory existence (as needed)
Determine whether or not monitoring is required (see ). However, even if monitoring is not needed, a
monitor script is still required; in this case, it can contain only
a return-success function.Determine if a resource type must be added to the cluster
configuration database.Understand the vendor-supplied startup and shutdown procedures.
Determine the configuration parameters for the application;
these may be used in the action script and should be stored in the CDB. Determine whether the resource type can be restarted in its
local node, and whether this action makes sense.Is Monitoring Necessary?monitoring
necessity ofIn the following situations, you may not
need to perform application monitoring:
action scriptsmonitoringnecessity
ofmonitoring
typesHeartbeat monitoring is sufficient; that is, simply verifying
that the node is alive (provided automatically by the base software) determines
the health of the highly available service.There is no process or resource that can be monitored. For
example, the Linux kernel ipchains filtering software performs IP filtering
on firewall nodes. Because the filtering is done in the kernel, there is no
process or resource to monitor.A resource on which the application depends is already monitored.
For example, monitoring some client-node resources might best be done by monitoring
the file systems, volumes, and network interfaces they use. Because this is
already done by the base software, additional monitoring is not required.
Beware that monitoring should be as lightweight as possible so that
it does not affect system performance. Also, security issues may make monitoring
difficult. If you are unable to provide a monitoring script with appropriate
performance and security, consider a monitoring agent; see .
Types of MonitoringThere are two types of monitoring that may be accomplished in a
monitor script:action scripts
monitoringtypesIs the resource present?Is the resource responding?You can define multiple levels of monitoring within the monitor script,
and the administrator can choose the desired level by configuring the resource
definition in the cluster configuration database. Ensure that the monitoring
level chosen does not affect system performance. For more information, see
the Linux FailSafe Administrator's Guide.What are the Symptoms of Monitoring Failure?monitoring
failurePossible symptoms of failure include the following:
The resource returns an error codeThe resource returns the wrong resultThe resource does not return quickly enoughHow Often Should Monitoring Occur?You must determine the monitoring interval and time-out values for the
monitor script. The time-out must be long enough to guarantee that
occasional anomalies do not cause false failovers. It will be useful for you
to determine the peak load that resource may need to sustain. action scriptsmonitoringfrequence
monitoringfrequenceYou must also determine if the monitor test should
execute multiple times so that an application is not declared dead after a
single failure. In general, testing more than once before declaring failure
is a good idea.Examples of Testing for Monitoring Failureaction scripts
monitoringtesting examplesmonitoringtesting examples
The test should be simple and should complete quickly,
whether it succeeds or fails. Some examples of tests are as follows: For a client/server applications that follows a well-defined
protocol, the monitor script can make a simple request
and verify that the proper response is received. For a web server application, the monitor
script can request a home page, verify that the connection was made, and ignore
the resulting home page.For a database, a simple request such as querying a table
can be made.For NFS, more complicated end-to-end monitoring is required.
The test might consist of mounting an exported file system, checking access
to the file system with a stat() system call to the root
of the file system, and undoing the mount.For a resource that writes to a log file, check that the size
of the log file is increasing or use the grep command to
check for a particular message.The following command can be used to determine quickly whether
a process exists:/usr/bin/killall -0 process_nameYou can also use the ha_exec2 command to check if
a process is running.The ha_exec2 command differs from killall
in that it performs a more exhaustive check on the process name
as well as process arguments. killall searches for the
process using the process name only. The command line is as follows:/usr/lib/failsafe/bin/ha_exec2 -s 0 -t process_name
Do not use the ps command to check on a particular
process because its execution can be too slow.Script FormatTemplates for the action scripts are provided in the following directory:scripts. See action scripts
or failover script
action scriptsformatoverview/usr/lib/failsafe/resource_types/templateThe template scripts have the same general format. Following is the
order in which the information appears in the script:Header informationSet local variablesRead resource informationExit statusPerform the basic action of the script, which is the customized
area you must provideSet global variablesVerify argumentsRead input file Action “scripts” can be of any form -- such as Bourne shell
script, perl script, or C language program.The following sections show an example from the NFS start
script. Note that the contents of these examples may not match the latest
software.Header InformationThe header information contains comments about the resource type, script
type, and resource configuration format. You must modify the code as needed.action scriptsformatheaderFollowing is the header for the NFS start script:
#!/bin/sh
# **************************************************************************
# * *
# * Copyright (C) 1998 Silicon Graphics, Inc. *
# * *
# * These coded instructions, statements, and computer programs contain *
# * unpublished proprietary information of Silicon Graphics, Inc., and *
# * are protected by Federal copyright law. They may not be disclosed *
# * to third parties or copied or duplicated in any form, in whole or *
# * in part, without the prior written consent of Silicon Graphics, Inc. *
# * *
# **************************************************************************
#ident "$Revision: 1.1 $"
# Resource type: NFS
# Start script NFS
#
# Test resource configuration information is present in the database in
# the following format
#
# resource-type.NFSSet Local Variablesset_local_variables()
section of an action scriptaction scriptsformatread
resource informationThe set_local_variables()
section of the script defines all of the variables that are local
to the script, such as temporary file names or database keys. All local variables
should use the LOCAL_ prefix. You must modify the code
as needed.action scripts
formatset local variablesFollowing is the set_local_variables() section from
the NFS start script:set_local_variables()
{
LOCAL_TEST_KEY=NFS
}Read Resource InformationTheget_xxx_info()
functionget_
xxx_info() function, such as get_nfs_info()
, reads the resource information from the cluster configuration
database. $1 is the test resource name. If the operation
is successful, a value of 0 is returned; if the operation fails, 1 is returned.action scriptsformatread resource informationresource informationread into an action script
The information is returned in the HA_STRING variable.
For more information about HA_STRING, see .
Following is the get_nfs_info() section from the
NFS start scriptget_nfs_info ()
{
ha_get_info ${LOCAL_TEST_KEY} $1
if [ $? -ne 0 ]; then
return 1;
else
return 0;
fi
}If you wish to get resource dependency information, you can call
ha_get_info with a third argument of any value. The resource dependency
list will be returned in the HA_STRING variable. Exit StatusIn theexit_script()
functionexit_script() function,
$1 contains the exit_status value. exit_status valueIf cleanup actions are required, such as the removal of temporary
files that were created as part of the process, place them before the
exit line.action scriptsformatexit statusexit status in action scriptsFollowing is the exit_script() section from the NFS
start scriptexit_script()
{
exit $1;
}If you call the exit_script function prior to normal
termination, it should be preceded by the ha_write_status_for_resource
function and you should use the same return code that is logged
to the output file. For more information see . ha_write_status_for_resource
functionBasic Action This area of the script is the portion you must customize. The templates
provide a minimal framework. action scripts
formatbasic actionFollowing is the framework for the basic action from the
start template:start_template()
# for all template resources passed as parameter
for TEMPLATE in $HA_RES_NAMES
do
#HA_CMD="command to start $TEMPLATE resource on the local machine
";
#ha_execute_cmd "string to describe the command being executed
";
ha_write_status_for_resource $TEMPLATE $HA_SUCCESS;
done
}When testing the script, you can obtain debugging information by adding
the shell command set -x to this section.For examples of this area, see .Set Global VariablesThe following lines set all of the global and local variables and store
the resource names in $HA_RES_NAMES. action scriptsformatset
global variables
global variablesset_global_variables()
functionFollowing is the set_global_variables()
function from the NFS start script:set_global_variables()
{
HA_DIR=/usr/lib/failsafe
COMMON_LIB=${HA_DIR}/common_scripts/scriptlib
# Execute the common library file
. $COMMON_LIB
ha_set_global_defs;
}Verify ArgumentsTheha_check_args()
functionha_check_args() function
verifies the arguments and stores them in the $HA_INFILE
and $HA_OUTFILE variables. It returns 1 on error and 0
on success.action scripts
formatverify argumentsFollowing is the following is the section from the NFS start script
that calls ha_check_args:ha_check_args $*;
if [ $? -ne 0 ]; then
exit $HA_INVAL_ARGS;
fiRead Input Fileha_read_infile()
functionThe ha_read_infile() function
reads the input file and stores the resource names in the $HA_RES_NAMES
variable.action scriptsformatread input fileFollowing is the ha_read_infile() function from the
common library file scriptlib:ha_read_infile()
{
HA_RES_NAMES="";
for HA_RESOURCE in `cat ${HA_INFILE}`
do
HA_TMP="${HA_RES_NAMES} ${HA_RESOURCE}";
HA_RES_NAMES=${HA_TMP};
done
}Complete the ActionLocated at the bottom of the script file are the lines which perform
the actual work of the requested action using the prior sections and provided
tools. The results are written as output to $HA_OUTFILE:action scriptsformatcompletionaction_resourcetype
;
exit_script $HA_SUCCESSFollowing is the completion from the NFS start
script:start_nfs;
exit_script $HA_SUCCESS;Steps in Writing a ScriptMultiple copies of actions scripts can execute at the same time. Therefore,
all temporary file names must be unique within the storage space used. Often
adding a script.$$ suffix
script.$$ to the name is sufficient. If multiple nodes share a temporary
directory, you will also want to incorporate host identifier to ensure uniqueness.
Another method is to use the resource name because it must be unique to the
cluster.action scripts
writing stepsFor each script, you must do the following:Get the required variablesCheck the variablesPerform the actionCheck the actionThe start and stop scripts are
required to be idempotent; that is, they have the appearance
of being run once but can in fact be run multiple times. For example, if the
start script is run for a resource that is already started, the
script must not return an error.All action scripts must return the status to the
/var/log/failsafe/script_nodename file.
Examples of Action ScriptsThe following sections use portions of the NFS scripts as examples. action scriptsexamplesThe examples in this guide may not exactly match the released system.
start ScriptThe NFS start script does the following:start script
example Creates a resource-specific NFS status directory. Exports the specified export-point with the specified
export-options.Following is a section from the NFS start script:
# Start the resource on the local machine.
# Return HA_SUCCESS if the resource has been successfully started on the local
# machine and HA_CMD_FAILED otherwise.
#
start_nfs()
{
${HA_DBGLOG} "Entry: start_nfs()";
# for all nfs resources passed as parameter
for resource in ${HA_RES_NAMES}
do
NFSFILEDIR=${HA_SCRIPTTMPDIR}/${LOCAL_TEST_KEY}$resource
HA_CMD="mkdir -p $NFSFILEDIR";
ha_execute_cmd "creating nfs status file directory";
if [ $? -ne 0 ]; then
${HA_LOG} "Failed to create ${NFSFILEDIR} directory";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script $HA_NOCFGINFO
fi
get_nfs_info $resource
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: $resource parameters not present in CDB";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
ha_get_field "${HA_STRING}" export-info
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: export-info not present in CDB for resource $resource";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
export_opts="$HA_FIELD_VALUE"
ha_get_field "${HA_STRING}" filesystem
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: filesystem-info not present in CDB for resource $resource";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
filesystem="$HA_FIELD_VALUE"
# Before we try and export the NFS resource, make sure
# filesystem is mounted.
HA_CMD="grep $filesystem /etc/mtab > /dev/null 2>&1";
ha_execute_cmd "check if the filesystem $filesystem is mounted";
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: filesystem $filesystem not mounted";
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
exit_script ${HA_CMD_FAILED};
fi
# Now do the job: export the new directory
# Note: the export_dir command will check wether this directory
# is already exported or not.
HA_CMD="export_dir ${resource} ${export_opts}";
ha_execute_cmd "export $resource directories to NFS clients";
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: could not export resoure ${resource}"
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
exit_script ${HA_CMD_FAILED};
else
ha_write_status_for_resource ${resource} ${HA_SUCCESS};
fi
done
}stop ScriptThe NFS stop script does the following:stop script
exampleUnexports the specified export-point.Removes the NFS status directory.Following is an example from the NFS stop script:
# Stop the nfs resource on the local machine.
# Return HA_SUCCESS if the resource has been successfully stopped on the local
# machine and HA_CMD_FAILED otherwise.
#
stop_nfs()
{
${HA_DBGLOG} "Entry: stop_nfs()";
# for all nfs resources passed as parameter
for resource in ${HA_RES_NAMES}
do
get_nfs_info ${resource}
if [ $? -ne 0 ]; then
# NFS resource information not available.
${HA_LOG} "NFS: $resource parameters not present in CDB";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
ha_get_field "${HA_STRING}" export-info
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: export-info not present in CDB for resource $resource";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
export_opts="$HA_FIELD_VALUE"
# Unexport the directory
HA_CMD="unexport_dir ${resource}"
ha_execute_cmd "unexport ${resource} directory to NFS clients"
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: Failed to unexport resource ${resource}"
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED}
fi
ha_write_status_for_resource ${resource} ${HA_SUCCESS}
done
}monitor ScriptThe NFS monitor script does the following:monitor script
exampleVerifies that the file system is mounted at the correct mount
point.Requests the status of the exported file system.Checks the export-point.Requests NFS statistics and (based on the results) make a
Remote Procedure Call (RPC) to NFS as needed.Following is an example from the NFS monitor script:
# Check if the nfs resource is allocated in the local node
# This check must be light weight and less intrusive compared to
# exclusive check. This check is done when the resource has been
# allocated in the local node.
# Return HA_SUCCESS if the resource is running in the local node
# and HA_CMD_FAILED if the resource is not running in the local node
# The list of the resources passed as input is in variable
# $HA_RES_NAMES
#
monitor_nfs()
{
${HA_DBGLOG} "Entry: monitor_nfs()";
for resource in ${HA_RES_NAMES}
do
get_nfs_info ${resource}
if [ $? -ne 0 ]; then
# No resource information available.
${HA_LOG} "NFS: ${resource} parameters not present in CDB";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
ha_get_field "${HA_STRING}" filesystem
if [ $? -ne 0 ]; then
# filesystem not available available.
${HA_LOG} "NFS: filesystem not present in CDB for resource $resource";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
fs="$HA_FIELD_VALUE";
# Check to see if the filesystem is mounted
HA_CMD="mount | grep ${fs} >/dev/null 2>&1"
ha_execute_cmd "check to see if $fs is mounted"
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: ${fs} not mounted";
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
exit_script $HA_CMD_FAILED;
fi
# stat the filesystem
HA_CMD="fs_stat -r ${resource} >/dev/null 2>&1";
ha_execute_cmd "stat mount point $resource"
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: cannot stat ${resource} NFS export point";
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
exit_script $HA_CMD_FAILED;
fi
# check the filesystem is exported
showmount -e | grep "${resource} " >/dev/null 2>&1
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: failed to find ${resource} in exported filesystem list:-"
${HA_LOG} "`showmount -e`"
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED}
exit_script ${HA_CMD_FAILED}
fi
# check the NFS daemon is still alive and responding
exec_rpcinfo;
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: exec_rpcinfo failed";
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED}
exit_script $HA_CMD_FAILED
fi
# Check the stats ?
# To Be Done... but there is no nfsstat command
# for the user space NFS daemon.
ha_write_status_for_resource $resource $HA_SUCCESS;
done
}exclusive ScriptThe NFS exclusive script determines whether the file
system is already exported. The check made by an exclusive script can be more
expensive than a monitor check. Linux FailSafe uses this script to determine
if resources are running on a node in the cluster, and to thereby prevent
starting resources on multiple nodes in the cluster.exclusive script
exampleFollowing is an example from the NFS exclusive script:
# Check if the nfs resource is running in the local node. This check can
# more intrusive than the monitor check. This check is used to determine
# if the resource has to be started on a machine in the cluster.
# Return HA_NOT_RUNNING if the resource is not running in the local node
# and HA_RUNNING if the resource is running in the local node
# The list of nfs resources passed as input is in variable
# $HA_RES_NAMES
#
exclusive_nfs()
{
${HA_DBGLOG} "Entry: exclusive_nfs()";
# for all resources passed as parameter
for resource in ${HA_RES_NAMES}
do
get_nfs_info $resource
if [ $? -ne 0 ]; then
# No resource information available
${HA_LOG} "NFS: $resource parameters not present in CDB";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
# Check if resource is already exported by the NFS server
showmount -e | grep "${resource} " >/dev/null 2>&1
if [ $? -eq 0 ];then
ha_write_status_for_resource ${resource} ${HA_RUNNING};
ha_print_exclusive_status ${resource} ${HA_RUNNING};
else
ha_write_status_for_resource ${resource} ${HA_NOT_RUNNING};
ha_print_exclusive_status ${resource} ${HA_NOT_RUNNING};
fi
done
}restart ScriptThe NFS restart script exports the specified export-point
with the specified export-options.
restart scriptexampleFollowing is an example from the restart script for
NFS:# Restart nfs resource
# Return HA_SUCCESS if nfs resource failed over successfully or
# return HA_CMD_FAILED if nfs resource could not be failed over locally.
# The list of nfs resources passed as input is in variable
# $HA_RES_NAMES
#
restart_nfs()
{
${HA_DBGLOG} "Entry: restart_nfs()";
# for all nfs resources passed as parameter
for resource in ${HA_RES_NAMES}
do
get_nfs_info $resource
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: $resource parameters not present in CDB";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
ha_get_field "${HA_STRING}" export-info
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: export-info not present in CDB for resource $resource";
ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
exit_script ${HA_NOCFGINFO};
fi
export_opts="$HA_FIELD_VALUE"
# Note: the export_dir command will check wether this directory
# is already exported or not.
HA_CMD="export_dir ${resource} ${export_opts}";
ha_execute_cmd "export $resource directories to NFS clients";
if [ $? -ne 0 ]; then
${HA_LOG} "NFS: could not export resoure ${resource}"
ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
exit_script ${HA_CMD_FAILED};
else
ha_write_status_for_resource ${resource} ${HA_SUCCESS};
fi
done
}Monitoring AgentsIf resources cannot be monitored using a lightweight check, you should
use a monitoring agent. The monitor
action script contacts the monitoring agent to determine the status of the
resource in the node. The monitoring agent in turn periodically monitors the
resource. shows the monitoring process.Monitoring ProcessMonitoring agents are useful for monitoring database resources. In databases,
creating the database connection is costly and time consuming. The monitoring
agent maintains connections to the database and it queries the database using
the connection in response to the monitor action script
request.Monitoring agents are independent processes and can be started by
cmond process, although this is not required. For example, if a
monitoring agent must be started when activating highly available services
on a node, information about that agent can be added to the cmond
cmond process
configuration
configuration on that node. The cmond configuration is
located in the /etc/failsafe/cmon_process_groups
directory
/etc/failsafe/cmon_process_groups directory.
Information about different agents should go into different files. The name
of the file is not relevant to the activate/deactivate procedure.monitoringagentsagentsIf a monitoring agent exits or aborts, cmond will
automatically restart the monitoring agent. This prevents monitor
action script failures due to monitoring agent failures.For example, the /etc/failsafe/cmon_process_groups/ip_addresses
file contains information about the
ha_ifd process that monitors network interfaces. It contains the
following, where ACTIONS represents what cmond
can perform on the agents (which will be the same for all scripts):
TYPE = cluster_agent
PROCS = ha_ifd
ACTIONS = start stop restart attach detach
AUTOACTION = attachIf you create a new monitoring agent, you must also create a corresponding
file in the /etc/failsafe/cmon_process_groups
directory that contains similar information
about the new agent. To do this, you can copy the ip_addresses
file and modify the PROCS line to list the executables
that constitute your new agent. These processes must be located in the
/usr/lib/failsafe/bin directory. You should not modify the other
configuration lines (TYPE, ACTIONS,
and AUTOACTION).Suppose you need to add a new agent called newagent
that consists of processes ha_x and ha_y.
The configuration information for this agent will be located in the
/etc/failsafe/cmon_process_groups/newagent file,
which will contain the following:TYPE = cluster_agent
PROCS = ha_x ha_y
ACTIONS = start stop restart attach detach
AUTOACTION = attachIn this case, the software will expect two executables (
/usr/lib/failsafe/bin/ha_x and
/usr/lib/failsafe/bin/ha_y) to be present.