[BACK]Return to install.sgml CVS log [TXT][DIR] Up to [Development] / projects / failsafe / FailSafe-books / LnxFailSafe_PG

File: [Development] / projects / failsafe / FailSafe-books / LnxFailSafe_PG / install.sgml (download)

Revision 1.1, Wed Nov 29 22:01:12 2000 UTC (16 years, 11 months ago) by vasa
Branch: MAIN
CVS Tags: HEAD

New documentation files for the Programmers' Guide.

<!-- Fragment document type declaration subset:
ArborText, Inc., 1988-1997, v.4001
<!DOCTYPE SET PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
<!ENTITY scriptlib.sgml SYSTEM "scriptlib.sgml">
<!ENTITY scriptlibapp.sgml SYSTEM "scriptlibapp.sgml">
<!ENTITY startgui.sgml SYSTEM "startgui.sgml">
<!ENTITY preface.sgml SYSTEM "preface.sgml">
<!ENTITY overview.sgml SYSTEM "overview.sgml">
<!ENTITY action.sgml SYSTEM "action.sgml">
<!ENTITY failover.sgml SYSTEM "failover.sgml">
<!ENTITY database.sgml SYSTEM "database.sgml">
<!ENTITY gloss.sgml SYSTEM "gloss.sgml">
<!ENTITY index.sgml SYSTEM "index.sgml">
<!ENTITY monitor SYSTEM "figures/monitor.eps" NDATA eps>
<!ENTITY resource.ai SYSTEM "figures/resource.ai.eps" NDATA eps>
<!ENTITY optional.ai SYSTEM "figures/optional.ai.eps" NDATA eps>
<!ENTITY manager.ai SYSTEM "figures/manager.ai.eps" NDATA eps>
<!ENTITY depend.ai SYSTEM "figures/depend.ai.eps" NDATA eps>
<!ENTITY type.ai SYSTEM "figures/type.ai.eps" NDATA eps>
<!ENTITY attrib.ai SYSTEM "figures/attrib.ai.eps" NDATA eps>
<!ENTITY action.ai SYSTEM "figures/action.ai.eps" NDATA eps>
<!ENTITY star.configuration SYSTEM "figures/star.configuration.eps" NDATA eps>
<!ENTITY n.plus.2.configuration SYSTEM "figures/n.plus.2.configuration.eps" NDATA eps>
<!ENTITY square.configuration SYSTEM "figures/square.configuration.eps" NDATA eps>
]>
-->
<chapter id="LE96600-PARENT">
<title id="LE96600-TITLE">Testing Scripts</title>
<para>This chapter describes how to test action scripts without running Linux
FailSafe. It also provides tips on how to debug problems that you may encounter.<note>
<para>Parameters are passed to the action scripts as both input files and
output files. Each line of the input file contains the resource name; the
output file contains the resource name and the script exit status.</para>
</note></para>
<sect1>
<title id="LE42431-TITLE">General Testing and Debugging Techniques</title>
<para>Some general testing and debugging techniques you can use during testing
are as follows:<indexterm id="ITinstall-0"><primary>script testing</primary>
<secondary>techniques</secondary></indexterm> <indexterm id="ITinstall-1">
<primary>testing scripts. <literal>See</literal>  script testing</primary>
</indexterm></para>
<itemizedlist>
<listitem><para>To get debugging information, adding the following line to
each of your scripts in the main function of the script:<indexterm id="ITinstall-2">
<primary>debugging information in action scripts</primary></indexterm></para>
<programlisting>set -x</programlisting>
</listitem>
<listitem><para>To check that an application is running on a node, you may
be able to use a command provided by the application.</para>
</listitem>
<listitem><para>Another way to check that an application is running on a node,
is to enter this command on that node:</para>
<programlisting># <userinput>ps -ef | grep </userinput> <replaceable>application
</replaceable> </programlisting>
<para><replaceable>application</replaceable> is the name (or a portion of
the name) of the executable for the application.</para>
</listitem>
<listitem><para>To show the status of a resource, use the following <literal>
cluster_mgr</literal> command:</para>
<programlisting>cmgr> <userinput>set cluster </userinput><replaceable>clustername
</replaceable>
cmgr> <userinput>show status of resource </userinput><replaceable>resourcename
</replaceable><userinput> of resource_type </userinput><replaceable>typename
</replaceable></programlisting>
</listitem>
<listitem><para>To show the status of a node, use the following <literal>
cluster_mgr</literal> command:<indexterm id="ITinstall-3"><primary>status
of a node</primary></indexterm> <indexterm id="ITinstall-4"><primary>node
status</primary></indexterm></para>
<programlisting>cmgr> <userinput>show status of node </userinput><replaceable>
nodename</replaceable></programlisting>
</listitem>
<listitem><para>To show the status of a resource group, use the following <literal>
cluster_mgr</literal> command:</para>
<programlisting>cmgr> <userinput>show status of resource_group </userinput><replaceable>
rgname</replaceable><userinput>&ensp;in cluster </userinput><replaceable>
cname</replaceable></programlisting>
</listitem>
</itemizedlist>
</sect1>
<sect1 id="Z943900191lhj">
<title>Debugging Notes</title>
<para><itemizedlist>
<listitem><para>The <literal>exclusive</literal> script returns an error when
the resource is running in the local node. If the resource is actually running
in the node, there is no <literal>exclusive</literal> action script bug.</para>
</listitem>
<listitem><para>If the resource group does not become online on the primary
node, it can be because of a <literal>start</literal> script error on the
primary node or a <literal>monitor</literal> script error on the primary node.
The nature of the failure can be seen in the <literal>srmd</literal> logs
of the primary node.</para>
</listitem>
<listitem><para>If the action script failure status is <literal>timeout</literal>,
resource type timeouts for the action should be increased. In the case of
the <literal>monitor</literal> script, the check can be made more lightweight.
</para>
</listitem>
<listitem><para>The resource type action script timeouts are for a resource.
So, if an action is performed on two resources, the script timeout is twice
the configured resource type action timeout.</para>
</listitem>
<listitem><para>If the resource group has a configuration error, check the <literal>
srmd</literal> logs on the primary node for errors.</para>
</listitem>
<listitem><para>The action scripts that use <literal>${HA_LOG}</literal> and <literal>
${HA_DBGLOG}</literal> macros to log messages can find the messages in <filename>
/var/log/failsafe/script_<replaceable>nodename</replaceable></filename> file
in each node in the cluster.</para>
</listitem>
</itemizedlist></para>
</sect1>
<sect1>
<title>Testing an Action Script</title>
<para><indexterm id="ITinstall-5"><primary>action scripts</primary><secondary>
testing</secondary></indexterm> <indexterm id="ITinstall-6"><primary>script
testing</primary><secondary>action scripts</secondary></indexterm>To test
an action script, do the following:</para>
<orderedlist>
<listitem><para>Create an input file, such as <literal>/tmp/input</literal>,
that contains expected resource names. For example, to create a file that
contains the resource named <literal>disk1</literal> do the following:</para>
<programlisting># <userinput>echo "/disk1" > /tmp/input</userinput></programlisting>
</listitem>
<listitem><para>Create an input parameter file, such as <filename>/tmp/ipparamfile
</filename>, as follows:</para>
<programlisting># <userinput>echo "ClusterName web-cluster" > /tmp/ipparamfile
</userinput></programlisting>
</listitem>
<listitem><para>Execute the action script as follows:</para>
<programlisting># <userinput>./start /tmp/input /tmp/output /tmp/ipparamfile
</userinput></programlisting>
<note>
<para>The use of the input parameter file is optional.</para>
</note>
</listitem>
<listitem><para>Change the log level from <literal>HA_NORMLVL</literal> to <literal>
HA_DBGLVL</literal> to allow messages written  with <literal>HA_DBGLOG</literal>
to be printed by adding the following line after the <literal>set_global_variables
</literal> statement in your script:<programlisting>HA_CURRENT_LOGLEVEL=$HA_DBGLVL
</programlisting></para>
</listitem>
</orderedlist>
<para>The output file will contain one of the following return values for
the <literal>start</literal>, <literal>stop</literal>,  <literal>monitor</literal>,
and <literal>restart</literal> scripts:</para>
<programlisting>HA_SUCCESS=0
HA_INVAL_ARGS=1
HA_CMD_FAILED=2
HA_NOTSUPPORTED=3
HA_NOCFGINFO=4</programlisting>
<para>The output file will contain one of the following return values for
the <literal>exclusive</literal> script:</para>
<programlisting>HA_NOT_RUNNING=0
HA_RUNNING=2</programlisting>
<note>
<para><indexterm id="ITinstall-7"><primary><literal>exit_script()</literal>
 function</primary></indexterm>If you call the <command>exit_script</command>
function prior to normal termination, it should be preceded by the <command>
ha_write_status_for_resource</command> function and you should use the same
return code that is logged to the output file.</para>
</note>
<para>Suppose you have a resource named <literal>/disk1</literal> and the
following files:</para>
<itemizedlist>
<listitem><para>The syntax for the input file is: <replaceable>&lt;resourcename>
</replaceable></para>
</listitem>
<listitem><para>The syntax for the output file is: <replaceable>&lt;resourcename>
&lt;status></replaceable></para>
</listitem>
</itemizedlist>
<para>The following example shows:</para>
<itemizedlist>
<listitem><para>The exit status of the action script is 1</para>
</listitem>
<listitem><para>The exit status of the resource is 2</para>
<note>
<para>The use of <literal>anonymous</literal> indicates that the script was
run manually. When the script is run by Linux FailSafe, the full path to the
script name is displayed.</para>
</note>
<programlisting># <userinput>echo </userinput>"<userinput>/disk1</userinput>"<userinput>
&ensp;> /tmp/ipfile</userinput>
# <userinput>./m</userinput><userinput>onitor  /tmp/ipfile</userinput><userinput>
&ensp;/tmp/opfile /tmp/ipparamfile</userinput>
# <userinput>echo $?</userinput>
2
# <userinput>cat /tmp/opfile</userinput>
/disk1 2
# <userinput>tail /var/log/failsafe/script_heb1</userinput>
Tue Aug 25 11:32:57.437 &lt;anonymous script 23787:0 Unknown:0> ./monitor:
./monitor called with /tmp/ipfile and /tmp/opfile
Tue Aug 25 11:32:58.118 &lt;anonymous script 24556:0 Unknown:0> ./monitor:
check to see if /disk1 is mounted on /disk1
Tue Aug 25 11:32:58.433 &lt;anonymous script 23811:0 Unknown:0> ./monitor:
/bin/mount | grep /disk1 | grep /disk1 >> /dev/null 2>&amp;1 exited with
status 0
Tue Aug 25 11:32:58.665 &lt;anonymous script 24124:0 Unknown:0> ./monitor:
stat mount point /disk1
Tue Aug 25 11:32:58.969 &lt;anonymous script 23525:0 Unknown:0> ./monitor:
/bin/stat /disk1 exited with status 0
Tue Aug 25 11:32:59.258 &lt;anonymous script 24431:0 Unknown:0> ./monitor:
check the filesystem /disk1 is exported
Tue Aug 25 11:32:59.610 &lt;anonymous script 6982:0 Unknown:0> ./monitor:
Tue Aug 25 11:32:59.917 &lt;anonymous script 24040:0 Unknown:0> ./monitor:
awk '{print \$1}' /var/run/failasafe/tmp/exportfs.23762 | grep /disk1 exited
with status 1
Tue Aug 25 11:33:00.131 &lt;anonymous script 24418:0 Unknown:0> ./monitor:
echo failed to find /disk1 in exported filesystem list:-
Tue Aug 25 11:33:00.340 &lt;anonymous script 24236:0 Unknown:0> ./monitor:
echo /disk2</programlisting>
</listitem>
</itemizedlist>
<para>For additional information about a script's processing, see the <?Pub _nolinebreak><filename>
/var/log/failsafe/script_<replaceable>nodename</replaceable></filename><?Pub /_nolinebreak><?Pub Caret>.
</para>
</sect1>
<sect1>
<title>Special Testing Considerations for the <literal>monitor</literal> Script
</title>
<para><indexterm id="ITinstall-8"><primary>script testing</primary><secondary>
monitoring script considerations</secondary></indexterm> <indexterm id="ITinstall-9">
<primary>monitoring</primary><secondary>script testing</secondary></indexterm>The <literal>
monitor</literal> script tests the liveliness of applications and resources.
The best way to test it is to induce a failure, run the script, and check
if this failure is detected by the script; then repeat the process for another
failure.</para>
<para>Use this checklist for testing a <literal>monitor</literal> script:
</para>
<itemizedlist>
<listitem><para>Verify that the script detects failure of the application
successfully.</para>
</listitem>
<listitem><para>Verify that the script always exits with a return value.</para>
</listitem>
<listitem><para>Verify that the script does not contain commands that can
hang (such as using DNS for name resolution) or those that continue forever,
such as <literal>ping</literal>.</para>
</listitem>
<listitem><para>Verify that the script completes before the time-out value
specified in the configuration file.</para>
</listitem>
<listitem><para>Verify that the script's return codes are correct.</para>
</listitem>
</itemizedlist>
<para>During testing, measure the time it takes for a script to complete and
adjust the monitoring times in your script accordingly. To get a good estimate
of the time required for the script to execute, run it under different system
load conditions.</para>
</sect1>
</chapter>
<?Pub *0000012521>