Testing Scripts
This chapter describes how to test action scripts without running Linux
FailSafe. It also provides tips on how to debug problems that you may encounter.
Parameters are passed to the action scripts as both input files and
output files. Each line of the input file contains the resource name; the
output file contains the resource name and the script exit status.
General Testing and Debugging Techniques
Some general testing and debugging techniques you can use during testing
are as follows:script testing
techniques
testing scripts. See script testing
To get debugging information, adding the following line to
each of your scripts in the main function of the script:
debugging information in action scripts
set -x
To check that an application is running on a node, you may
be able to use a command provided by the application.
Another way to check that an application is running on a node,
is to enter this command on that node:
# ps -ef | grep application
application is the name (or a portion of
the name) of the executable for the application.
To show the status of a resource, use the following
cluster_mgr command:
cmgr> set cluster clustername
cmgr> show status of resource resourcename
of resource_type typename
To show the status of a node, use the following
cluster_mgr command:status
of a node node
status
cmgr> show status of node
nodename
To show the status of a resource group, use the following
cluster_mgr command:
cmgr> show status of resource_group
rgname in cluster
cname
Debugging Notes
The exclusive script returns an error when
the resource is running in the local node. If the resource is actually running
in the node, there is no exclusive action script bug.
If the resource group does not become online on the primary
node, it can be because of a start script error on the
primary node or a monitor script error on the primary node.
The nature of the failure can be seen in the srmd logs
of the primary node.
If the action script failure status is timeout,
resource type timeouts for the action should be increased. In the case of
the monitor script, the check can be made more lightweight.
The resource type action script timeouts are for a resource.
So, if an action is performed on two resources, the script timeout is twice
the configured resource type action timeout.
If the resource group has a configuration error, check the
srmd logs on the primary node for errors.
The action scripts that use ${HA_LOG} and
${HA_DBGLOG} macros to log messages can find the messages in
/var/log/failsafe/script_nodename file
in each node in the cluster.
Testing an Action Script
action scripts
testing script
testingaction scriptsTo test
an action script, do the following:
Create an input file, such as /tmp/input,
that contains expected resource names. For example, to create a file that
contains the resource named disk1 do the following:
# echo "/disk1" > /tmp/input
Create an input parameter file, such as /tmp/ipparamfile
, as follows:
# echo "ClusterName web-cluster" > /tmp/ipparamfile
Execute the action script as follows:
# ./start /tmp/input /tmp/output /tmp/ipparamfile
The use of the input parameter file is optional.
Change the log level from HA_NORMLVL to
HA_DBGLVL to allow messages written with HA_DBGLOG
to be printed by adding the following line after the set_global_variables
statement in your script:HA_CURRENT_LOGLEVEL=$HA_DBGLVL
The output file will contain one of the following return values for
the start, stop, monitor,
and restart scripts:
HA_SUCCESS=0
HA_INVAL_ARGS=1
HA_CMD_FAILED=2
HA_NOTSUPPORTED=3
HA_NOCFGINFO=4
The output file will contain one of the following return values for
the exclusive script:
HA_NOT_RUNNING=0
HA_RUNNING=2
exit_script()
functionIf you call the exit_script
function prior to normal termination, it should be preceded by the
ha_write_status_for_resource function and you should use the same
return code that is logged to the output file.
Suppose you have a resource named /disk1 and the
following files:
The syntax for the input file is: <resourcename>
The syntax for the output file is: <resourcename>
<status>
The following example shows:
The exit status of the action script is 1
The exit status of the resource is 2
The use of anonymous indicates that the script was
run manually. When the script is run by Linux FailSafe, the full path to the
script name is displayed.
# echo "/disk1"
> /tmp/ipfile
# ./monitor /tmp/ipfile
/tmp/opfile /tmp/ipparamfile
# echo $?
2
# cat /tmp/opfile
/disk1 2
# tail /var/log/failsafe/script_heb1
Tue Aug 25 11:32:57.437 <anonymous script 23787:0 Unknown:0> ./monitor:
./monitor called with /tmp/ipfile and /tmp/opfile
Tue Aug 25 11:32:58.118 <anonymous script 24556:0 Unknown:0> ./monitor:
check to see if /disk1 is mounted on /disk1
Tue Aug 25 11:32:58.433 <anonymous script 23811:0 Unknown:0> ./monitor:
/bin/mount | grep /disk1 | grep /disk1 >> /dev/null 2>&1 exited with
status 0
Tue Aug 25 11:32:58.665 <anonymous script 24124:0 Unknown:0> ./monitor:
stat mount point /disk1
Tue Aug 25 11:32:58.969 <anonymous script 23525:0 Unknown:0> ./monitor:
/bin/stat /disk1 exited with status 0
Tue Aug 25 11:32:59.258 <anonymous script 24431:0 Unknown:0> ./monitor:
check the filesystem /disk1 is exported
Tue Aug 25 11:32:59.610 <anonymous script 6982:0 Unknown:0> ./monitor:
Tue Aug 25 11:32:59.917 <anonymous script 24040:0 Unknown:0> ./monitor:
awk '{print \$1}' /var/run/failasafe/tmp/exportfs.23762 | grep /disk1 exited
with status 1
Tue Aug 25 11:33:00.131 <anonymous script 24418:0 Unknown:0> ./monitor:
echo failed to find /disk1 in exported filesystem list:-
Tue Aug 25 11:33:00.340 <anonymous script 24236:0 Unknown:0> ./monitor:
echo /disk2
For additional information about a script's processing, see the
/var/log/failsafe/script_nodename.
Special Testing Considerations for the monitor Script
script testing
monitoring script considerations
monitoringscript testingThe
monitor script tests the liveliness of applications and resources.
The best way to test it is to induce a failure, run the script, and check
if this failure is detected by the script; then repeat the process for another
failure.
Use this checklist for testing a monitor script:
Verify that the script detects failure of the application
successfully.
Verify that the script always exits with a return value.
Verify that the script does not contain commands that can
hang (such as using DNS for name resolution) or those that continue forever,
such as ping.
Verify that the script completes before the time-out value
specified in the configuration file.
Verify that the script's return codes are correct.
During testing, measure the time it takes for a script to complete and
adjust the monitoring times in your script accordingly. To get a good estimate
of the time required for the script to execute, run it under different system
load conditions.