Testing Scripts This chapter describes how to test action scripts without running Linux FailSafe. It also provides tips on how to debug problems that you may encounter. Parameters are passed to the action scripts as both input files and output files. Each line of the input file contains the resource name; the output file contains the resource name and the script exit status. General Testing and Debugging Techniques Some general testing and debugging techniques you can use during testing are as follows:script testing techniques testing scripts. See script testing To get debugging information, adding the following line to each of your scripts in the main function of the script: debugging information in action scripts set -x To check that an application is running on a node, you may be able to use a command provided by the application. Another way to check that an application is running on a node, is to enter this command on that node: # ps -ef | grep application application is the name (or a portion of the name) of the executable for the application. To show the status of a resource, use the following cluster_mgr command: cmgr> set cluster clustername cmgr> show status of resource resourcename of resource_type typename To show the status of a node, use the following cluster_mgr command:status of a node node status cmgr> show status of node nodename To show the status of a resource group, use the following cluster_mgr command: cmgr> show status of resource_group rgname in cluster cname Debugging Notes The exclusive script returns an error when the resource is running in the local node. If the resource is actually running in the node, there is no exclusive action script bug. If the resource group does not become online on the primary node, it can be because of a start script error on the primary node or a monitor script error on the primary node. The nature of the failure can be seen in the srmd logs of the primary node. If the action script failure status is timeout, resource type timeouts for the action should be increased. In the case of the monitor script, the check can be made more lightweight. The resource type action script timeouts are for a resource. So, if an action is performed on two resources, the script timeout is twice the configured resource type action timeout. If the resource group has a configuration error, check the srmd logs on the primary node for errors. The action scripts that use ${HA_LOG} and ${HA_DBGLOG} macros to log messages can find the messages in /var/log/failsafe/script_nodename file in each node in the cluster. Testing an Action Script action scripts testing script testingaction scriptsTo test an action script, do the following: Create an input file, such as /tmp/input, that contains expected resource names. For example, to create a file that contains the resource named disk1 do the following: # echo "/disk1" > /tmp/input Create an input parameter file, such as /tmp/ipparamfile , as follows: # echo "ClusterName web-cluster" > /tmp/ipparamfile Execute the action script as follows: # ./start /tmp/input /tmp/output /tmp/ipparamfile The use of the input parameter file is optional. Change the log level from HA_NORMLVL to HA_DBGLVL to allow messages written with HA_DBGLOG to be printed by adding the following line after the set_global_variables statement in your script:HA_CURRENT_LOGLEVEL=$HA_DBGLVL The output file will contain one of the following return values for the start, stop, monitor, and restart scripts: HA_SUCCESS=0 HA_INVAL_ARGS=1 HA_CMD_FAILED=2 HA_NOTSUPPORTED=3 HA_NOCFGINFO=4 The output file will contain one of the following return values for the exclusive script: HA_NOT_RUNNING=0 HA_RUNNING=2 exit_script() functionIf you call the exit_script function prior to normal termination, it should be preceded by the ha_write_status_for_resource function and you should use the same return code that is logged to the output file. Suppose you have a resource named /disk1 and the following files: The syntax for the input file is: <resourcename> The syntax for the output file is: <resourcename> <status> The following example shows: The exit status of the action script is 1 The exit status of the resource is 2 The use of anonymous indicates that the script was run manually. When the script is run by Linux FailSafe, the full path to the script name is displayed. # echo "/disk1"  > /tmp/ipfile # ./monitor /tmp/ipfile  /tmp/opfile /tmp/ipparamfile # echo $? 2 # cat /tmp/opfile /disk1 2 # tail /var/log/failsafe/script_heb1 Tue Aug 25 11:32:57.437 <anonymous script 23787:0 Unknown:0> ./monitor: ./monitor called with /tmp/ipfile and /tmp/opfile Tue Aug 25 11:32:58.118 <anonymous script 24556:0 Unknown:0> ./monitor: check to see if /disk1 is mounted on /disk1 Tue Aug 25 11:32:58.433 <anonymous script 23811:0 Unknown:0> ./monitor: /bin/mount | grep /disk1 | grep /disk1 >> /dev/null 2>&1 exited with status 0 Tue Aug 25 11:32:58.665 <anonymous script 24124:0 Unknown:0> ./monitor: stat mount point /disk1 Tue Aug 25 11:32:58.969 <anonymous script 23525:0 Unknown:0> ./monitor: /bin/stat /disk1 exited with status 0 Tue Aug 25 11:32:59.258 <anonymous script 24431:0 Unknown:0> ./monitor: check the filesystem /disk1 is exported Tue Aug 25 11:32:59.610 <anonymous script 6982:0 Unknown:0> ./monitor: Tue Aug 25 11:32:59.917 <anonymous script 24040:0 Unknown:0> ./monitor: awk '{print \$1}' /var/run/failasafe/tmp/exportfs.23762 | grep /disk1 exited with status 1 Tue Aug 25 11:33:00.131 <anonymous script 24418:0 Unknown:0> ./monitor: echo failed to find /disk1 in exported filesystem list:- Tue Aug 25 11:33:00.340 <anonymous script 24236:0 Unknown:0> ./monitor: echo /disk2 For additional information about a script's processing, see the /var/log/failsafe/script_nodename. Special Testing Considerations for the <literal>monitor</literal> Script script testing monitoring script considerations monitoringscript testingThe monitor script tests the liveliness of applications and resources. The best way to test it is to induce a failure, run the script, and check if this failure is detected by the script; then repeat the process for another failure. Use this checklist for testing a monitor script: Verify that the script detects failure of the application successfully. Verify that the script always exits with a return value. Verify that the script does not contain commands that can hang (such as using DNS for name resolution) or those that continue forever, such as ping. Verify that the script completes before the time-out value specified in the configuration file. Verify that the script's return codes are correct. During testing, measure the time it takes for a script to complete and adjust the monitoring times in your script accordingly. To get a good estimate of the time required for the script to execute, run it under different system load conditions.