[BACK]Return to action.sgml CVS log [TXT][DIR] Up to [Development] / projects / failsafe / FailSafe-books / LnxFailSafe_PG

File: [Development] / projects / failsafe / FailSafe-books / LnxFailSafe_PG / action.sgml (download)

Revision 1.1, Wed Nov 29 22:01:12 2000 UTC (16 years, 11 months ago) by vasa
Branch: MAIN
CVS Tags: HEAD

New documentation files for the Programmers' Guide.

<!-- Fragment document type declaration subset:
ArborText, Inc., 1988-1997, v.4001
<!DOCTYPE SET PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
<!ENTITY scriptlib.sgml SYSTEM "scriptlib.sgml">
<!ENTITY scriptlibapp.sgml SYSTEM "scriptlibapp.sgml">
<!ENTITY startgui.sgml SYSTEM "startgui.sgml">
<!ENTITY preface.sgml SYSTEM "preface.sgml">
<!ENTITY overview.sgml SYSTEM "overview.sgml">
<!ENTITY failover.sgml SYSTEM "failover.sgml">
<!ENTITY database.sgml SYSTEM "database.sgml">
<!ENTITY install.sgml SYSTEM "install.sgml">
<!ENTITY gloss.sgml SYSTEM "gloss.sgml">
<!ENTITY index.sgml SYSTEM "index.sgml">
<!ENTITY monitor SYSTEM "figures/monitor.eps" NDATA eps>
<!ENTITY resource.ai SYSTEM "figures/resource.ai.eps" NDATA eps>
<!ENTITY optional.ai SYSTEM "figures/optional.ai.eps" NDATA eps>
<!ENTITY manager.ai SYSTEM "figures/manager.ai.eps" NDATA eps>
<!ENTITY depend.ai SYSTEM "figures/depend.ai.eps" NDATA eps>
<!ENTITY type.ai SYSTEM "figures/type.ai.eps" NDATA eps>
<!ENTITY attrib.ai SYSTEM "figures/attrib.ai.eps" NDATA eps>
<!ENTITY action.ai SYSTEM "figures/action.ai.eps" NDATA eps>
<!ENTITY star.configuration SYSTEM "figures/star.configuration.eps" NDATA eps>
<!ENTITY n.plus.2.configuration SYSTEM "figures/n.plus.2.configuration.eps" NDATA eps>
<!ENTITY square.configuration SYSTEM "figures/square.configuration.eps" NDATA eps>
]>
-->
<chapter id="LE77672-PARENT">
<title id="LE77672-TITLE">Writing the Action Scripts and Adding Monitoring
Agents</title>
<para>This chapter provides information about writing the action scripts required
to make an application highly available and how to add monitoring agents.
It discusses the following topics:</para>
<itemizedlist>
<listitem><para><xref linkend="Z942786554lhj"></para>
</listitem>
<listitem><para><xref linkend="Z942787505lhj"></para>
</listitem>
<listitem><para><xref linkend="Z942786569lhj"></para>
</listitem>
<listitem><para><xref linkend="Z942786582lhj"></para>
</listitem>
<listitem><para><xref linkend="Z942786601lhj"></para>
</listitem>
<listitem><para><xref linkend="LE49536-PARENT"></para>
</listitem>
<listitem><para><xref linkend="Z942786646lhj"></para>
</listitem>
</itemizedlist>
<sect1 id="Z942786554lhj">
<title>Set of Action Scripts</title>
<caution>
<para>Multiple instances of scripts may be executed at the same time. For
more information, see <xref linkend="Z942787505lhj">.</para>
</caution>
<para>The following set of action scripts can be provided for each resource:<indexterm
id="ITaction-0"><primary>action scripts</primary><secondary>set of scripts
</secondary></indexterm></para>
<itemizedlist>
<listitem><para><literal>exclusive</literal>, which verifies that the resource
is not already running<indexterm><primary>exclusive script</primary><secondary>
definition</secondary></indexterm></para>
</listitem>
<listitem><para><literal>start</literal>, which starts the resource<indexterm>
<primary>start script</primary><secondary>definition</secondary></indexterm></para>
</listitem>
<listitem><para><literal>stop</literal>, which stops the resource<indexterm>
<primary>stop script</primary><secondary>definition</secondary></indexterm></para>
</listitem>
<listitem><para><literal>monitor</literal>, which monitors the resource<indexterm>
<primary>monitor script</primary><secondary>definition</secondary></indexterm></para>
</listitem>
<listitem><para><literal>restart</literal>, which restarts the resource on
the same node when a monitoring failure occurs<indexterm><primary>restart
script</primary><secondary>definition</secondary></indexterm></para>
</listitem>
</itemizedlist>
<para>The <literal>start</literal>, <literal>stop</literal>, and <literal>
exclusive</literal> scripts are required for every resource type.</para>
<note>
<para>The <literal>start</literal> and <literal>stop</literal> scripts must
be <firstterm>idempotent</firstterm>; that is, an action requested multiple
times successively should continue to return success, and should have no side-effects.
 For example, if the <literal>start</literal> script is run for a resource
that is already started, the script must not return an error.</para>
</note>
<para>A <literal>monitor</literal> script is required, but if you wish it
may contain only a return-success function. A <literal>restart</literal> script
is required if the application must have a restart ability on the same node
in case of failure. However, the <literal>restart</literal> script may contain
only a return-success function.  <indexterm id="ITaction-7"><primary>action
scripts</primary><secondary>required</secondary></indexterm> <indexterm id="ITaction-8">
<primary>action scripts</primary><secondary>optional</secondary></indexterm></para>
</sect1>
<sect1 id="Z942787505lhj">
<title>Understanding the Execution of Action Scripts</title>
<para>Before you can write a new action script, you must understand how action
scripts are executed. This section covers the following topics:<itemizedlist>
<listitem><para><xref linkend="Z944249968lhj"></para>
</listitem>
<listitem><para><xref linkend="Z943309055lhj"></para>
</listitem>
<listitem><para><xref linkend="Z942863078lhj"></para>
</listitem>
<listitem><para><xref linkend="Z944596365smg"></para>
</listitem>
<listitem><para><xref linkend="Z944596427smg"></para>
</listitem>
<listitem><para><xref linkend="Z944596453smg"></para>
</listitem>
</itemizedlist></para>
<sect2 id="Z944249968lhj">
<title id="Z942863046lhj">Multiple Instances of Script Executed at the Same
Time</title>
<para>Multiple instances of the same script may be executed at the same time.
To avoid problems this may cause, you can use the <literal>ha_filelock</literal>
and <literal>ha_execute_lock</literal> commands to achieve sequential execution
of commands in different instances of the same script.</para>
<para>For example, consider a script which modifies a configuration file to
start a new application instance.  Multiple instances of the script modifying
the file simultaneously could cause file corruption and data loss.  The start
script for the application should use <literal>ha_execute_lock</literal> when
executing the modification script to ensure correct configuration file modification.
</para>
<para>Assuming the script is named <literal>modify_configuration_file</literal>,
the start script would contain a statement similar to the following:</para>
<programlisting>${HA_CMDSPATH}/ha_execute_lock 30
    ${HA_SCRIPTTMPDIR}/lock.volume_assemble \"modify_configuration_file\"
</programlisting>
<para>The <literal>ha_execute_lock</literal> command takes 3 arguments:<itemizedlist>
<listitem><para>Number of seconds before the command times out waiting for
the file lock</para>
</listitem>
<listitem><para>File to be used for locking</para>
</listitem>
<listitem><para>Command to be executed</para>
</listitem>
</itemizedlist></para>
<para>The <command>ha_execute_lock</command> command tries to obtain a lock
on the file every second for <replaceable>timeout</replaceable> seconds. After
obtaining a lock on the file, it executes the command argument. On command
completion, it releases lock on the file.</para>
</sect2>
<sect2 id="Z943309055lhj">
<title>Differences between the <filename>exclusive</filename> and <filename>
monitor</filename> Scripts</title>
<para>Although the same check can be used in <literal>monitor</literal> and <literal>
exclusive</literal> action scripts, they are used for different purposes. <xref
linkend="Z943038525lhj-PARENT"> summarizes the differences between the scripts.
</para>
<table frame="topbot" pgwide="1" id="Z943038525lhj-PARENT">
<title id="Z943038525lhj">Differences Between the <literal>monitor</literal>
and <literal>exclusive</literal> Action Scripts</title>
<tgroup cols="2" colsep="0" rowsep="0">
<colspec colwidth="198*">
<colspec colwidth="198*">
<thead valign="bottom">
<row rowsep="1"><entry align="left" valign="bottom"><para><literal>exclusive
</literal></para></entry><entry align="left" valign="bottom"><para><literal>
monitor</literal></para></entry></row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><para>Executed in all nodes in the cluster.
</para></entry>
<entry align="left" valign="top"><para>Executed only on the node where the
resource group (which contains the resource) is online.</para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Executed before the resource is started
in the cluster.</para></entry>
<entry align="left" valign="top"><para>Executed when the resource is online
in the cluster. (The <literal>monitor</literal> script could degrade the services
provided by the HA server. Therefore, the check performed by the <literal>
monitor</literal> script should be lightweight and less time consuming than
the check performed by the <literal>exclusive</literal> script))</para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Executed only once before the resource
group is made online in the cluster.</para></entry>
<entry align="left" valign="top"><para>Executed periodically.</para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Failure will result in resource group
not becoming online in the cluster.</para></entry>
<entry align="left" valign="top"><para>Failure will cause a resource group
failover to another node or a restart of the resource in the local node. An
error will cause false resource group failovers in the cluster.</para></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2 id="Z942863078lhj">
<title>Successful Execution of Action Scripts</title>
<para><xref linkend="LE81926-PARENT"> shows the state of a resource group
after the successful execution of an action script for every resource within
a resource group. To view the state of a resource group, use the Cluster Manager
graphical user interface (GUI) or the <literal>cluster_mgr</literal> command.
</para>
<table frame="topbot" pgwide="1" id="LE81926-PARENT"><indexterm id="ITaction-9">
<primary>action scripts</primary><secondary>successful execution results</secondary>
</indexterm>
<title id="LE81926-TITLE">Successful Action Script Results </title>
<tgroup cols="3" colsep="0" rowsep="0">
<colspec colwidth="181*">
<colspec colwidth="113*">
<colspec colwidth="102*">
<thead>
<row rowsep="1"><entry align="left" valign="bottom"><para>Event</para></entry>
<entry align="left" valign="bottom"><para>Action Script to Execute</para></entry>
<entry align="left" valign="bottom"><para>Resource Group State</para></entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><para>Resource group is made online on a
node</para></entry>
<entry align="left" valign="top"><para><literal>start</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal><indexterm
id="ITaction-10"><primary>resource group</primary><secondary>states</secondary>
</indexterm></para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Resource group is made offline on a
node</para></entry>
<entry align="left" valign="top"><para><literal>stop</literal></para></entry>
<entry align="left" valign="top"><para><literal>offline</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Online status of the resource group
</para></entry>
<entry align="left" valign="top"><para><literal>exclusive</literal></para></entry>
<entry align="left" valign="top"><para>(No effect)</para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Normal monitoring of online resource
group</para></entry>
<entry align="left" valign="top"><para><literal>monitor</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para>Resource group monitoring failure</para></entry>
<entry align="left" valign="top"><para><literal>restart</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal></para></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2 id="Z944596365smg">
<title id="Z942863094lhj">Failure of Action Scripts</title>
<para><xref linkend="LE53980-PARENT"> shows the state of the resource group
and the error state when an action script fails.</para>
<table frame="topbot" id="LE53980-PARENT"><indexterm id="ITaction-11"><primary>
action scripts</primary><secondary>failure of</secondary></indexterm>
<title id="LE53980-TITLE">Failure of an Action Script</title>
<tgroup cols="3" colsep="0" rowsep="0">
<colspec colwidth="132*">
<colspec colwidth="132*">
<colspec colwidth="132*">
<thead>
<row rowsep="1"><entry align="left" valign="bottom"><para>Failing Action Script
</para></entry><entry align="left" valign="bottom"><para>Resource Group State
</para></entry><entry align="left" valign="bottom"><para>Error State</para></entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><para><literal>exclusive</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal></para></entry>
<entry align="left" valign="top"><para><literal>exclusivity</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>monitor</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal></para></entry>
<entry align="left" valign="top"><para><literal>monitoring failure</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>restart</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal></para></entry>
<entry align="left" valign="top"><para><literal>monitoring failure</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>start</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal></para></entry>
<entry align="left" valign="top"><para><literal>srmd executable error</literal></para></entry>
</row>
<row>
<entry align="left" valign="top"><para><literal>stop</literal></para></entry>
<entry align="left" valign="top"><para><literal>online</literal></para></entry>
<entry align="left" valign="top"><para><literal>srmd executable error</literal></para></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2 id="Z944596427smg">
<title id="Z942863106lhj">Implementing Timeouts and Retrying a Command</title>
<para>You can use the <command>ha_exec2</command> command to execute action
scripts using timeouts. This allows the action script to be completed within
the specified time, and permits proper error messages to be logged on failure
or timeout. The <replaceable>retry</replaceable> variable is especially useful
in <literal>monitor</literal> and <literal>exclusive</literal> action scripts.
</para>
<para>To retry a command, use the following syntax:<programlisting>/usr/lib/failsafe/bin/ha_exec2 <replaceable>
timeout_in_seconds number_of_retries command_to_be_executed</replaceable></programlisting></para>
<para>For example:<programlisting>${HA_CMDSPATH}/ha_exec2 30 2 "umount /fs"
</programlisting></para>
<para>The above <literal>ha_exec2</literal> command executes the <literal>
umount /fs</literal> command line. If the command does not complete within
30 seconds, it kills the <command>umount</command> command and retries the
command. The <command>ha_exec2</command> command retries the <literal>umount
</literal> command 2 times if it times out or fails.</para>
<para>For more information, see the <command>ha_exec2</command> man page.
</para>
</sect2>
<sect2 id="Z944596453smg">
<title id="Z942863135lhj">Sending UNIX Signals</title>
<para>You can use the <command>ha_exec2</command> command to send UNIX signals
to specific process. A process is identified by its name or its arguments.
</para>
<para>For example:<programlisting>${HA_CMDSPATH}/ha_exec2 -s 0 -t "knfsd"
</programlisting></para>
<para>The above command sends signal 0 (checks if the process exists) to all
processes whose name or arguments match the string <literal>knfsd</literal>.
The command returns 0 if it is a success.</para>
<para>You should use the <literal>ha_exec2</literal> command to check for
server processes in the <literal>monitor</literal> script instead of using
a <literal> ps -ef | grep </literal> command line construction, for performance
and speed considerations. </para>
<para>For more information, see the <command>ha_exec2</command> man page.
</para>
</sect2>
</sect1>
<sect1 id="Z942786569lhj">
<title>Preparation</title>
<para>Before you can write the action scripts, you must do the following:<indexterm
id="ITaction-12"><primary>action scripts</primary><secondary>preparation for
writing scripts</secondary></indexterm></para>
<itemizedlist>
<listitem><para>Understand the <literal>scriptlib</literal> functions described
in <xref linkend="Z944252972lhj">.</para>
</listitem>
<listitem><para>Familiarize yourself with the script templates provided in
the following directory:  <?Pub _nolinebreak><filename>/usr/lib/failsafe/resource_types/template
</filename><?Pub /_nolinebreak> <indexterm id="ITaction-13"><primary>action
scripts</primary><secondary>templates</secondary></indexterm> <indexterm id="ITaction-14">
<primary>templates</primary><secondary>action scripts</secondary></indexterm></para>
</listitem>
<listitem><para>Read the man pages for the following commands:<itemizedlist>
<listitem><para><command>cluster_mgr</command></para>
</listitem>
<listitem><para><command>cdbd</command></para>
</listitem>
<listitem><para><command>ha_cilog</command></para>
</listitem>
<listitem><para><command>ha_cmsd</command></para>
</listitem>
<listitem><para><command>ha_exec2</command></para>
</listitem>
<listitem><para><command>ha_fsd</command></para>
</listitem>
<listitem><para><command>ha_gcd</command></para>
</listitem>
<listitem><para><command>ha_ifd</command></para>
</listitem>
<listitem><para><command>ha_ifdadmin</command></para>
</listitem>
<listitem><para><command>ha_macconfig2</command></para>
</listitem>
<listitem><para><command>ha_srmd</command></para>
</listitem>
<listitem><para><command>ha_statd2</command></para>
</listitem>
<listitem><para><command>haStatus</command></para>
</listitem>
</itemizedlist></para>
</listitem>
<listitem><para>Familiarize yourself with the action scripts for other highly
available services in <?Pub _nolinebreak><filename>/usr/lib/failsafe/resource_types
</filename><?Pub /_nolinebreak><indexterm id="ITaction-15"><primary>action
scripts</primary><secondary>resource types provided</secondary></indexterm>
that are similar to the scripts you wish to create.</para>
</listitem>
<listitem><para>Understand how to do the following actions for your application:
</para>
<itemizedlist>
<listitem><para>Verify that the resource is running</para>
</listitem>
<listitem><para>Verify that the resource can be run</para>
</listitem>
<listitem><para>Start the resource</para>
</listitem>
<listitem><para>Stop the resource</para>
</listitem>
<listitem><para>Check for the server processes</para>
</listitem>
<listitem><para>Do a simple query as a client and understand the expected
response</para>
</listitem>
<listitem><para>Check for configuration file or directory existence (as needed)
</para>
</listitem>
</itemizedlist>
</listitem>
<listitem><para>Determine whether or not monitoring is required (see <xref
linkend="LE54960-PARENT">). However, even if monitoring is not needed, a <literal>
monitor</literal> script is still required; in this case, it can contain only
a return-success function.</para>
</listitem>
<listitem><para>Determine if a resource type must be added to the cluster
configuration database.</para>
</listitem>
<listitem><para>Understand the vendor-supplied startup and shutdown procedures.
</para>
</listitem>
<listitem><para>Determine the configuration parameters for the application;
these may be used in the action script and should be stored in the CDB. </para>
</listitem>
<listitem><para>Determine whether the resource type can be restarted in its
local node, and whether this action makes sense.</para>
</listitem>
</itemizedlist>
<sect2 id="LE54960-PARENT">
<title id="LE54960-TITLE">Is Monitoring Necessary?</title>
<para><indexterm id="ITaction-16"><primary>monitoring</primary><secondary>
necessity of</secondary></indexterm>In the following situations, you may not
need to perform application monitoring:<indexterm id="ITaction-17"><primary>
action scripts</primary><secondary>monitoring</secondary><tertiary>necessity
of</tertiary></indexterm> <indexterm id="ITaction-18"><primary>monitoring
</primary><secondary>types</secondary></indexterm></para>
<itemizedlist>
<listitem><para>Heartbeat monitoring is sufficient; that is, simply verifying
that the node is alive (provided automatically by the base software) determines
the health of the highly available service.</para>
</listitem>
<listitem><para>There is no process or resource that can be monitored. For
example, the Linux kernel ipchains filtering software performs IP filtering
on firewall nodes. Because the filtering is done in the kernel, there is no
process or resource to monitor.</para>
</listitem>
<listitem><para>A resource on which the application depends is already monitored.
For example, monitoring some client-node resources might best be done by monitoring
the file systems, volumes, and network interfaces they use. Because this is
already done by the base software, additional monitoring is not required.
</para>
<caution>
<para>Beware that monitoring should be as lightweight as possible so that
it does not affect system performance. Also, security issues may make monitoring
difficult. If you are unable to provide a monitoring script with appropriate
performance and security, consider a monitoring agent; see <xref linkend="Z942786646lhj">. 
</para>
<para></para>
</caution>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Types of Monitoring</title>
<para>There are two types of monitoring that may be accomplished in a <literal>
monitor</literal> script:<indexterm id="ITaction-19"><primary>action scripts
</primary><secondary>monitoring</secondary><tertiary>types</tertiary></indexterm></para>
<itemizedlist>
<listitem><para>Is the resource present?</para>
</listitem>
<listitem><para>Is the resource responding?</para>
</listitem>
</itemizedlist>
<para>You can define multiple levels of monitoring within the monitor script,
and the administrator can choose the desired level by configuring the resource
definition in the cluster configuration database. Ensure that the monitoring
level chosen does not affect system performance. For more information, see
the <citetitle>Linux FailSafe Administrator's Guide</citetitle>.</para>
</sect2>
<sect2>
<title>What are the Symptoms of Monitoring Failure?</title>
<para><indexterm id="ITaction-20"><primary>monitoring</primary><secondary>
failure</secondary></indexterm>Possible symptoms of failure include the following:
</para>
<itemizedlist>
<listitem><para>The resource returns an error code</para>
</listitem>
<listitem><para>The resource returns the wrong result</para>
</listitem>
<listitem><para>The resource does not return quickly enough</para>
</listitem>
</itemizedlist>
</sect2>
<sect2>
<title>How Often Should Monitoring Occur?</title>
<para>You must determine the monitoring interval and time-out values for the <literal>
 monitor</literal> script. The time-out must be long enough to guarantee that
occasional anomalies do not cause false failovers. It will be useful for you
to determine the peak load that resource may need to sustain.  <indexterm
id="ITaction-21"><primary>action scripts</primary><secondary>monitoring</secondary>
<tertiary>frequence</tertiary></indexterm> <indexterm id="ITaction-22"><primary>
monitoring</primary><secondary>frequence</secondary></indexterm></para>
<para>You must also determine if the <literal>monitor</literal> test should
execute multiple times so that an application is not declared dead after a
single failure. In general, testing more than once before declaring failure
is a good idea.</para>
</sect2>
<sect2>
<title>Examples of Testing for Monitoring Failure</title>
<para><indexterm id="ITaction-23"><primary>action scripts</primary><secondary>
monitoring</secondary><tertiary>testing examples</tertiary></indexterm> <indexterm
id="ITaction-24"><primary>monitoring</primary><secondary>testing examples
</secondary></indexterm>The test should be simple and should complete quickly,
whether it succeeds or fails. Some examples of tests are as follows: </para>
<itemizedlist>
<listitem><para>For a client/server applications that follows a well-defined
protocol, the <literal>monitor</literal> script can make a simple request
and verify that the proper response is received. </para>
</listitem>
<listitem><para>For a web server application, the <literal>monitor</literal>
script can request a home page, verify that the connection was made, and ignore
the resulting home page.</para>
</listitem>
<listitem><para>For a database, a simple request such as querying a table
can be made.</para>
</listitem>
<listitem><para>For NFS, more complicated end-to-end monitoring is required.
The test might consist of mounting an exported file system, checking access
to the file system with a <literal>stat()</literal> system call to the root
of the file system, and undoing the mount.</para>
</listitem>
<listitem><para>For a resource that writes to a log file, check that the size
of the log file is increasing or use the <command>grep</command> command to
check for a particular message.</para>
</listitem>
<listitem><para>The following command can be used to determine quickly whether
a process exists:</para>
<programlisting>/usr/bin/killall -0 <filename>process_name</filename></programlisting>
<para>You can also use the <command>ha_exec2</command> command to check if
a process is running.</para>
<para>The <literal>ha_exec2</literal> command differs from <command>killall
</command> in that it performs a more exhaustive check on the process name
as well as process arguments. <literal>killall</literal> searches for the
process using the process name only. The command line is as follows:</para>
<literallayout>/usr/lib/failsafe/bin/ha_exec2 -s 0 -t <replaceable>process_name
</replaceable></literallayout>
<note>
<para>Do not use the <literal>ps</literal> command to check on a particular
process because its execution can be too slow.</para>
</note>
</listitem>
</itemizedlist>
</sect2>
</sect1>
<sect1 id="Z942786582lhj">
<title>Script Format</title>
<para>Templates for the action scripts are provided in the following directory:<indexterm
id="ITaction-25"><primary>scripts. <literal>See</literal>  action scripts
or failover script</primary></indexterm> <indexterm id="ITaction-26"><primary>
action scripts</primary><secondary>format</secondary><tertiary>overview</tertiary>
</indexterm></para>
<literallayout><filename>/usr/lib/failsafe/resource_types/template</filename></literallayout>
<para>The template scripts have the same general format.  Following is the
order in which the information appears in the script:</para>
<itemizedlist>
<listitem><para>Header information</para>
</listitem>
<listitem><para>Set local variables</para>
</listitem>
<listitem><para>Read resource information</para>
</listitem>
<listitem><para>Exit status</para>
</listitem>
<listitem><para>Perform the basic action of the script, which is the customized
area you must provide</para>
</listitem>
<listitem><para>Set global variables</para>
</listitem>
<listitem><para>Verify arguments</para>
</listitem>
<listitem><para>Read input file </para>
<note>
<para>Action &ldquo;scripts&rdquo; can be of any form -- such as Bourne shell
script, perl script, or C&nbsp;language program.</para>
</note>
</listitem>
</itemizedlist>
<para>The following sections show an example from the NFS <literal>start</literal>
script. Note that the contents of these examples may not match the latest
software.</para>
<para></para>
<sect2>
<title>Header Information</title>
<para>The header information contains comments about the resource type, script
type, and resource configuration format. You must modify the code as needed.<indexterm
id="ITaction-27"><primary>action scripts</primary><secondary>format</secondary>
<tertiary>header</tertiary></indexterm></para>
<para>Following is the header for the NFS <literal>start</literal> script:
</para>
<programlisting>#!/bin/sh
&nbsp;
# **************************************************************************
# *                                                                        *
# *                  Copyright (C) 1998 Silicon Graphics, Inc.             *
# *                                                                        *
# *  These coded instructions, statements, and computer programs  contain  *
# *  unpublished  proprietary  information of Silicon Graphics, Inc., and  *
# *  are protected by Federal copyright law.  They  may  not be disclosed  *
# *  to  third  parties  or copied or duplicated in any form, in whole or  *
# *  in part, without the prior written consent of Silicon Graphics, Inc.  *
# *                                                                        *
# **************************************************************************

#ident "$Revision: 1.1 $"

# Resource type: NFS
# Start script NFS
&nbsp;
#
# Test resource configuration information is present in the database in
# the following format
#
# resource-type.NFS</programlisting>
</sect2>
<sect2>
<title>Set Local Variables</title>
<para><indexterm id="ITaction-28"><primary><literal>set_local_variables()
</literal>  section of an action script</primary></indexterm> <indexterm id="ITaction-29">
<primary>action scripts</primary><secondary>format</secondary><tertiary>read
resource information</tertiary></indexterm>The <command>set_local_variables()
</command> section of the script defines all of the variables that are local
to the script, such as temporary file names or database keys. All local variables
should use the <literal>LOCAL_</literal> prefix. You must modify the code
as needed.<indexterm id="ITaction-30"><primary>action scripts</primary><secondary>
format</secondary><tertiary>set local variables</tertiary></indexterm></para>
<para>Following is the <command>set_local_variables()</command> section from
the NFS <literal>start</literal> script:</para>
<programlisting>set_local_variables()
{
    LOCAL_TEST_KEY=NFS
}</programlisting>
</sect2>
<sect2>
<title>Read Resource Information</title>
<para>The<indexterm id="ITaction-31"><primary><literal>get_xxx_info()</literal>
 function</primary></indexterm> &ensp;<command>get_</command><replaceable>
xxx</replaceable><command>_info()</command> function, such as <command>get_nfs_info()
</command>, reads the resource information from the cluster configuration
database. <literal>$1</literal> is the test resource name. If the operation
is successful, a value of 0 is returned; if the operation fails, 1 is returned.<indexterm
id="ITaction-32"><primary>action scripts</primary><secondary>format</secondary>
<tertiary>read resource information</tertiary></indexterm> <indexterm id="ITaction-33">
<primary>resource information</primary><secondary>read into an action script
</secondary></indexterm></para>
<para>The information is returned in the <literal>HA_STRING</literal> variable.
For more information about <literal>HA_STRING</literal>, see <xref linkend="Z944252972lhj">.
</para>
<para>Following is the <command>get_nfs_info()</command> section from the
NFS <literal>start</literal> script</para>
<programlisting>get_nfs_info ()
{
    ha_get_info ${LOCAL_TEST_KEY} $1
    if [ $? -ne 0 ]; then
        return 1;
    else
        return 0;
    fi
}</programlisting>
<para>If you wish to get resource dependency information, you can call <literal>
ha_get_info</literal> with a third argument of any value. The resource dependency
 list will be returned in the <literal>HA_STRING</literal> variable. </para>
</sect2>
<sect2>
<title>Exit Status</title>
<para>In the<indexterm id="ITaction-34"><primary><literal>exit_script()</literal>
 function</primary></indexterm> <command>exit_script()</command> function, <literal>
$1</literal> contains the <command>exit_status</command> value. <indexterm
id="ITaction-35"><primary><literal>exit_status</literal>  value</primary>
</indexterm>If cleanup actions are required, such as the removal of temporary
files that were created as part of the process, place them before the <literal>
exit</literal> line.<indexterm id="ITaction-36"><primary>action scripts</primary>
<secondary>format</secondary><tertiary>exit status</tertiary></indexterm> <indexterm
id="ITaction-37"><primary>exit status in action scripts</primary></indexterm></para>
<para>Following is the <command>exit_script()</command> section from the NFS <literal>
start</literal> script</para>
<programlisting>exit_script()
{
    exit $1;
}</programlisting>
<note>
<para>If you call the <command>exit_script</command> function prior to normal
termination, it should be preceded by the  <command>ha_write_status_for_resource
</command> function and you should use the same return code that is logged
to the output file.  For more information see <xref linkend="Z944252972lhj">. <indexterm
id="ITaction-38"><primary><literal>ha_write_status_for_resource</literal>
 function</primary></indexterm></para>
</note>
</sect2>
<sect2>
<title>Basic Action </title>
<para>This area of the script is the portion you must customize. The templates
provide a minimal framework. <indexterm id="ITaction-39"><primary>action scripts
</primary><secondary>format</secondary><tertiary>basic action</tertiary></indexterm></para>
<para>Following is the framework for the basic action from the <filename>
start</filename> template:</para>
<programlisting>start_template()

# for all template resources passed as parameter
for TEMPLATE in $HA_RES_NAMES
do
    #HA_CMD="<replaceable>command to start $TEMPLATE resource on the local machine
</replaceable>";

    #ha_execute_cmd "<replaceable>string to describe the command being executed
</replaceable>";

    ha_write_status_for_resource $TEMPLATE $HA_SUCCESS;
done
}</programlisting>
<note>
<para>When testing the script, you can obtain debugging information by adding
the shell command <command> set -x </command> to this section.</para>
</note>
<para>For examples of this area, see <xref linkend="LE49536-PARENT">.</para>
</sect2>
<sect2>
<title>Set Global Variables</title>
<para>The following lines set all of the global and local variables and store
the resource names in <literal>$HA_RES_NAMES</literal>. <indexterm id="ITaction-40">
<primary>action scripts</primary><secondary>format</secondary><tertiary>set
global variables</tertiary></indexterm> <indexterm id="ITaction-41"><primary>
global variables</primary></indexterm></para>
<para><indexterm id="ITaction-42"><primary><literal>set_global_variables()
</literal>  function</primary></indexterm>Following is the <command>set_global_variables()
</command> function from the NFS <filename>start</filename> script:</para>
<programlisting>set_global_variables()
{
    HA_DIR=/usr/lib/failsafe
    COMMON_LIB=${HA_DIR}/common_scripts/scriptlib
&nbsp;
    # Execute the common library file
    . $COMMON_LIB
&nbsp;
    ha_set_global_defs;
}</programlisting>
</sect2>
<sect2>
<title>Verify Arguments</title>
<para>The<indexterm id="ITaction-43"><primary><literal>ha_check_args()</literal>
 function</primary></indexterm> <command>ha_check_args()</command> function
verifies the arguments and stores them in the <literal>$HA_INFILE</literal>
and <literal>$HA_OUTFILE</literal> variables. It returns 1 on error and 0
on success.<indexterm id="ITaction-44"><primary>action scripts</primary><secondary>
format</secondary><tertiary>verify arguments</tertiary></indexterm></para>
<para>Following is the following is the section from the NFS start script
that calls <literal>ha_check_args</literal>:</para>
<programlisting>ha_check_args $*;
if [ $? -ne 0 ]; then
    exit $HA_INVAL_ARGS;
fi</programlisting>
</sect2>
<sect2>
<title>Read Input File</title>
<para><indexterm id="ITaction-45"><primary><literal>ha_read_infile()</literal>
 function</primary></indexterm>The <command>ha_read_infile()</command> function
reads the input file and stores the resource names in the <literal>$HA_RES_NAMES
</literal> variable.<indexterm id="ITaction-46"><primary>action scripts</primary>
<secondary>format</secondary><tertiary>read input file</tertiary></indexterm></para>
<para>Following is the <command>ha_read_infile()</command> function from the
common library file <literal>scriptlib</literal>:</para>
<programlisting>ha_read_infile()
{
    HA_RES_NAMES="";
&nbsp;
    for HA_RESOURCE in `cat ${HA_INFILE}`
    do
        HA_TMP="${HA_RES_NAMES} ${HA_RESOURCE}";
        HA_RES_NAMES=${HA_TMP};
    done
}</programlisting>
</sect2>
<sect2>
<title>Complete the Action</title>
<para>Located at the bottom of the script file are the lines which perform
the actual work of the requested action using the prior sections and provided
tools.  The results are written as output to  <literal>$HA_OUTFILE</literal>:<indexterm
id="ITaction-47"><primary>action scripts</primary><secondary>format</secondary>
<tertiary>completion</tertiary></indexterm></para>
<programlisting><replaceable>action</replaceable>_<replaceable>resourcetype
</replaceable>;
&nbsp;
exit_script $HA_SUCCESS</programlisting>
<para>Following is the completion from the NFS <filename>start</filename>
script:</para>
<programlisting>start_nfs;
&nbsp;
exit_script $HA_SUCCESS;</programlisting>
</sect2>
</sect1>
<sect1 id="Z942786601lhj">
<title>Steps in Writing a Script</title>
<caution>
<para>Multiple copies of actions scripts can execute at the same time. Therefore,
all temporary file names must be unique within the storage space used. Often
adding a <indexterm id="ITaction-48"><primary>script.$$ suffix</primary></indexterm> <literal>
script.$$</literal> to the name is sufficient. If multiple nodes share a temporary
directory, you will also want to incorporate host identifier to ensure uniqueness.
 Another method is to use the resource name because it must be unique to the
cluster.<indexterm id="ITaction-49"><primary>action scripts</primary><secondary>
writing steps</secondary></indexterm></para>
</caution>
<para>For each script, you must do the following:</para>
<itemizedlist>
<listitem><para>Get the required variables</para>
</listitem>
<listitem><para>Check the variables</para>
</listitem>
<listitem><para>Perform the action</para>
</listitem>
<listitem><para>Check the action</para>
<note>
<para>The <literal>start</literal> and <literal>stop</literal> scripts are
required to be <firstterm>idempotent</firstterm>; that is, they have the appearance
of being run once but can in fact be run multiple times. For example, if the <literal>
start</literal> script is run for a resource that is already started, the
script must not return an error.</para>
</note>
<para>All action scripts must return the status to the  <?Pub _nolinebreak><filename>
/var/log/failsafe/script_<replaceable>nodename</replaceable></filename><?Pub /_nolinebreak><?Pub Caret> file.
</para>
</listitem>
</itemizedlist>
</sect1>
<sect1 id="LE49536-PARENT">
<title id="LE49536-TITLE">Examples of Action Scripts</title>
<para>The following sections use portions of the NFS scripts as examples. <indexterm
id="ITaction-50"><primary>action scripts</primary><secondary>examples</secondary>
</indexterm></para>
<note>
<para>The examples in this guide may not exactly match the released system.
</para>
</note>
<sect2>
<title><literal>start</literal> Script</title>
<para>The NFS <literal>start</literal> script does the following:<indexterm
id="ITaction-51"><primary><literal>start</literal>  script</primary><secondary><literal>
example</literal></secondary></indexterm></para>
<orderedlist>
<listitem><para>&ensp;Creates a resource-specific NFS status directory.</para>
</listitem>
<listitem><para>&ensp;Exports the specified export-point with the specified
export-options.</para>
</listitem>
</orderedlist>
<para>Following is a section from the NFS <literal>start</literal> script:
</para>
<programlisting># Start the resource on the local machine.
# Return HA_SUCCESS if the resource has been successfully started on the local
# machine and HA_CMD_FAILED otherwise.
#
start_nfs()
{
    ${HA_DBGLOG} "Entry: start_nfs()";

    # for all nfs resources passed as parameter
    for resource in ${HA_RES_NAMES}
    do
        NFSFILEDIR=${HA_SCRIPTTMPDIR}/${LOCAL_TEST_KEY}$resource
        HA_CMD="mkdir -p $NFSFILEDIR";
        ha_execute_cmd "creating nfs status file directory";
        if [ $? -ne 0 ]; then
           ${HA_LOG} "Failed to create ${NFSFILEDIR} directory";
           ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
           exit_script $HA_NOCFGINFO
        fi

        get_nfs_info $resource
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: $resource parameters not present in CDB";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi

        ha_get_field "${HA_STRING}" export-info
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: export-info not present in CDB for resource $resource";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi
        export_opts="$HA_FIELD_VALUE"

        ha_get_field "${HA_STRING}" filesystem
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: filesystem-info not present in CDB for resource $resource";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi
        filesystem="$HA_FIELD_VALUE"

        # Before we try and export the NFS resource, make sure
        # filesystem is mounted.
        HA_CMD="grep $filesystem /etc/mtab > /dev/null 2>&amp;1";
        ha_execute_cmd "check if the filesystem $filesystem is mounted";
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: filesystem $filesystem not mounted";
            ha_write_status_for_resource ${resource}  ${HA_CMD_FAILED};
            exit_script ${HA_CMD_FAILED};
        fi

        # Now do the job: export the new directory
        # Note: the export_dir command will check wether this directory
        # is already exported or not.
        HA_CMD="export_dir ${resource} ${export_opts}";
        ha_execute_cmd "export $resource directories to NFS clients";
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: could not export resoure ${resource}"
            ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
            exit_script ${HA_CMD_FAILED};
        else
            ha_write_status_for_resource ${resource} ${HA_SUCCESS};
        fi

    done
}</programlisting>
</sect2>
<sect2>
<title><literal>stop</literal> Script</title>
<para>The NFS <literal>stop</literal> script does the following:<indexterm
id="ITaction-52"><primary><literal>stop</literal>  script</primary><secondary><literal>
example</literal></secondary></indexterm></para>
<orderedlist>
<listitem><para>Unexports the specified export-point.</para>
</listitem>
<listitem><para>Removes the NFS status directory.</para>
</listitem>
</orderedlist>
<para>Following is an example from the NFS <literal>stop</literal> script:
</para>
<programlisting># Stop the nfs resource on the local machine.
# Return HA_SUCCESS if the resource has been successfully stopped on the local
# machine and HA_CMD_FAILED otherwise.
#
stop_nfs()
{

    ${HA_DBGLOG} "Entry: stop_nfs()";

    # for all nfs resources passed as parameter
    for resource in ${HA_RES_NAMES}
    do
        get_nfs_info ${resource}
        if [ $? -ne 0 ]; then
            # NFS resource information not available.
            ${HA_LOG} "NFS: $resource parameters not present in CDB";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi

        ha_get_field "${HA_STRING}" export-info
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: export-info not present in CDB for resource $resource";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi
        export_opts="$HA_FIELD_VALUE"


        # Unexport the directory
        HA_CMD="unexport_dir ${resource}"
        ha_execute_cmd "unexport ${resource} directory to NFS clients"
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: Failed to unexport resource ${resource}"
            ha_write_status_for_resource ${resource} ${HA_CMD_FAILED}
        fi

        ha_write_status_for_resource ${resource} ${HA_SUCCESS}
    done
}</programlisting>
</sect2>
<sect2><?Pub Dtl>
<title><literal>monitor</literal> Script</title>
<para>The NFS <literal>monitor</literal> script does the following:<indexterm
id="ITaction-54"><primary><literal>monitor</literal>  script</primary><secondary><literal>
example</literal></secondary></indexterm></para>
<orderedlist>
<listitem><para>Verifies that the file system is mounted at the correct mount
point.</para>
</listitem>
<listitem><para>Requests the status of the exported file system.</para>
</listitem>
<listitem><para>Checks the export-point.</para>
</listitem>
<listitem><para>Requests NFS statistics and (based on the results) make a
Remote Procedure Call (RPC) to NFS as needed.</para>
</listitem>
</orderedlist>
<para>Following is an example from the NFS <literal>monitor</literal> script:
</para>
<programlisting># Check if the nfs resource is allocated in the local node
# This check must be light weight and less intrusive compared to
# exclusive check. This check is done when the resource has been
# allocated in the local node.
# Return HA_SUCCESS if the resource is running in the local node
# and HA_CMD_FAILED if the resource is not running in the local node
# The list of the resources passed as input is in variable
# $HA_RES_NAMES
#
monitor_nfs()
{
    ${HA_DBGLOG} "Entry: monitor_nfs()";

    for resource in ${HA_RES_NAMES}
    do
        get_nfs_info ${resource}
        if [ $? -ne 0 ]; then
            # No resource information available.
            ${HA_LOG} "NFS: ${resource} parameters not present in CDB";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi

        ha_get_field "${HA_STRING}" filesystem
        if [ $? -ne 0 ]; then
            # filesystem not available available.
            ${HA_LOG} "NFS: filesystem not present in CDB for resource $resource";
	    ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi
        fs="$HA_FIELD_VALUE";

        # Check to see if the filesystem is mounted
        HA_CMD="mount | grep ${fs} >/dev/null 2>&amp;1"
        ha_execute_cmd "check to see if $fs is mounted"
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: ${fs} not mounted";
            ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
            exit_script $HA_CMD_FAILED;
        fi

        # stat the filesystem
        HA_CMD="fs_stat -r ${resource} >/dev/null 2>&amp;1";
        ha_execute_cmd "stat mount point $resource"
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: cannot stat ${resource} NFS export point";
            ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
            exit_script $HA_CMD_FAILED;
        fi

        # check the filesystem is exported
        showmount -e | grep "${resource} " >/dev/null 2>&amp;1
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: failed to find ${resource} in exported filesystem list:-"
            ${HA_LOG} "`showmount -e`"
            ha_write_status_for_resource ${resource} ${HA_CMD_FAILED}
            exit_script ${HA_CMD_FAILED}
        fi

        # check the NFS daemon is still alive and responding
        exec_rpcinfo;
        if [ $? -ne 0 ]; then
                ${HA_LOG} "NFS: exec_rpcinfo failed";
                ha_write_status_for_resource ${resource} ${HA_CMD_FAILED}
                exit_script $HA_CMD_FAILED
        fi

        # Check the stats ?
        # To Be Done... but there is no nfsstat command
        # for the user space NFS daemon.


        ha_write_status_for_resource $resource $HA_SUCCESS;
    done
}</programlisting>
</sect2>
<sect2><?Pub Dtl>
<title><literal>exclusive</literal> Script</title>
<para>The NFS <literal>exclusive</literal> script determines whether the file
system is already exported. The check made by an exclusive script can be more
expensive than a monitor check. Linux FailSafe uses this script to determine
if resources are running on a node in the cluster, and to thereby prevent
starting resources on multiple nodes in the cluster.<indexterm id="ITaction-55">
<primary><literal>exclusive</literal>  script</primary><secondary><literal>
example</literal></secondary></indexterm></para>
<para>Following is an example from the NFS <literal>exclusive </literal>script:
</para>
<programlisting># Check if the nfs resource is running in the local node. This check can
# more intrusive than the monitor check. This check is used to determine
# if the resource has to be started on a machine in the cluster.
# Return HA_NOT_RUNNING if the resource is not running in the local node
# and HA_RUNNING if the  resource is running in the local node
# The list of nfs resources passed as input is in variable
# $HA_RES_NAMES
#
exclusive_nfs()
{

    ${HA_DBGLOG} "Entry: exclusive_nfs()";

    # for all resources passed as parameter
    for resource in ${HA_RES_NAMES}
    do
        get_nfs_info $resource
        if [ $? -ne 0 ]; then
            # No resource information available
            ${HA_LOG} "NFS: $resource parameters not present in CDB";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi

        # Check if resource is already exported by the NFS server
        showmount -e | grep "${resource} " >/dev/null 2>&amp;1
        if [ $? -eq 0 ];then
            ha_write_status_for_resource ${resource} ${HA_RUNNING};
            ha_print_exclusive_status ${resource} ${HA_RUNNING};
        else
            ha_write_status_for_resource ${resource} ${HA_NOT_RUNNING};
            ha_print_exclusive_status ${resource} ${HA_NOT_RUNNING};
        fi

    done
}</programlisting>
</sect2>
<sect2>
<title><literal>restart</literal> Script</title>
<para>The NFS <literal>restart</literal> script exports the specified export-point
with the specified export-options.<indexterm id="ITaction-56"><primary><literal>
restart</literal>  script</primary><secondary><literal>example</literal></secondary>
</indexterm></para>
<para>Following is an example from the <literal>restart</literal> script for
NFS:</para>
<programlisting># Restart nfs resource
# Return HA_SUCCESS if nfs resource failed over successfully or
# return HA_CMD_FAILED if nfs resource could not be failed over locally.
# The list of nfs resources passed as input is in variable
# $HA_RES_NAMES
#
restart_nfs()
{
    ${HA_DBGLOG} "Entry: restart_nfs()";

    # for all nfs resources passed as parameter
    for resource in ${HA_RES_NAMES}
    do
        get_nfs_info $resource
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: $resource parameters not present in CDB";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi

        ha_get_field "${HA_STRING}" export-info
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: export-info not present in CDB for resource $resource";
            ha_write_status_for_resource ${resource} ${HA_NOCFGINFO};
            exit_script ${HA_NOCFGINFO};
        fi
        export_opts="$HA_FIELD_VALUE"


        # Note: the export_dir command will check wether this directory
        # is already exported or not.
        HA_CMD="export_dir ${resource} ${export_opts}";
        ha_execute_cmd "export $resource directories to NFS clients";
        if [ $? -ne 0 ]; then
            ${HA_LOG} "NFS: could not export resoure ${resource}"
            ha_write_status_for_resource ${resource} ${HA_CMD_FAILED};
            exit_script ${HA_CMD_FAILED};
        else
            ha_write_status_for_resource ${resource} ${HA_SUCCESS};
        fi

    done
}</programlisting>
</sect2>
</sect1>
<sect1 id="Z942786646lhj">
<title>Monitoring Agents</title>
<para>If resources cannot be monitored using a lightweight check, you should
use a <firstterm>monitoring agent</firstterm>. The <literal>monitor</literal>
action script contacts the monitoring agent to determine the status of the
resource in the node. The monitoring agent in turn periodically monitors the
resource. <xref linkend="Z943376870lhj"> shows the monitoring process.</para>
<figure id="Z943376870lhj">
<title id="Z943376954lhj">Monitoring Process</title>
<graphic entityref="monitor"></graphic>
</figure>
<para>Monitoring agents are useful for monitoring database resources. In databases,
creating the database connection is costly and time consuming. The monitoring
agent maintains connections to the database and it queries the database using
the connection in response to the <literal>monitor</literal> action script
request.</para>
<para>Monitoring agents are independent processes and can be started by <literal>
cmond</literal> process, although this is not required. For example, if a
monitoring agent must be started when activating highly available services
on a node, information about that agent can be added to the <literal>cmond
</literal><indexterm id="ITaction-57"><primary><literal>cmond</literal>  process
</primary><secondary><literal>configuration</literal></secondary></indexterm>
configuration on that node. The <literal>cmond</literal> configuration is
located in the <indexterm id="ITaction-58"><primary><literal>/etc/failsafe/cmon_process_groups
</literal>  directory</primary></indexterm> <?Pub _nolinebreak><filename>
/etc/failsafe/cmon_process_groups</filename><?Pub /_nolinebreak> directory.
Information about different agents should go into different files. The name
of the file is not relevant to the activate/deactivate procedure.<indexterm
id="ITaction-59"><primary>monitoring</primary><secondary>agents</secondary>
</indexterm> <indexterm id="ITaction-60"><primary>agents</primary></indexterm></para>
<para>If a monitoring agent exits or aborts, <literal>cmond</literal> will
automatically restart the monitoring agent. This prevents <literal>monitor
</literal> action script failures due to monitoring agent failures.</para>
<para>For example, the <?Pub _nolinebreak><filename>/etc/failsafe/cmon_process_groups/ip_addresses
</filename><?Pub /_nolinebreak> file contains information about the <literal>
ha_ifd</literal> process that monitors network interfaces. It contains the
following,  where <literal>ACTIONS</literal> represents what <literal>cmond
</literal> can perform on the agents (which will be the same for all scripts):
</para>
<programlisting>TYPE = cluster_agent
PROCS = ha_ifd
ACTIONS = start stop restart attach detach
AUTOACTION = attach</programlisting>
<para>If you create a new monitoring agent, you must also create a corresponding
file in the <?Pub _nolinebreak><filename>/etc/failsafe/cmon_process_groups
</filename><?Pub /_nolinebreak> directory that contains similar information
about the new agent. To do this, you can copy the <filename>ip_addresses</filename>
file and modify the <literal>PROCS</literal> line to list the executables
that constitute your new agent. These processes must be located in the  <filename>
/usr/lib/failsafe/bin</filename> directory. You should not modify the other
configuration lines (<literal>TYPE</literal>, <literal>ACTIONS</literal>,
and <literal>AUTOACTION</literal>).</para>
<para>Suppose you need to add a new agent called <literal>newagent</literal>
that consists of processes <literal>ha_x</literal> and <literal>ha_y</literal>.
 The configuration information for this agent will be located in the  <?Pub _nolinebreak><filename>
/etc/failsafe/cmon_process_groups/newagent</filename><?Pub /_nolinebreak> file,
which will contain the following:</para>
<programlisting>TYPE = cluster_agent
PROCS = ha_x ha_y
ACTIONS = start stop restart attach detach
AUTOACTION = attach</programlisting>
<para>In this case, the software will expect two executables (<?Pub _nolinebreak><filename>
/usr/lib/failsafe/bin/ha_x</filename><?Pub /_nolinebreak> and <?Pub _nolinebreak><filename>
/usr/lib/failsafe/bin/ha_y</filename><?Pub /_nolinebreak>) to be present.
</para>
</sect1>
</chapter>
<?Pub *0000056487>