Michael & Danny (& Nathan when the sun starts shining down there :),
Thanks for the ideas, but this seems to be way more complicated than it should
be. I just don't really know where to put in the logic you each recommended.
FYI: I have been writing shell scripts for many years on an as needed basis,
but this one is at the very edge of my skills.
FIrst, if the "xfs test harness" ran the test scripts in the standard
non-interactive mode, this issue would not come up at all. Unfortunately from
my perspective, it somehow invokes the scripts in interactive mode.i.e. The
shell notification messages I'm having problems with only occur in interactive
shells. Non-interactive shells simply don't have these messages.
If there is a way to have my script run as non-interactive, then the whole
problem goes away. (I just figured that out in the last hour or so.)
Ignoring that possibility, I have created as simple of a script as I could to
show the problem.
It is below my signature. (If anyone has a better way to do a timeout, I'm all
ears. I've never done one in shell code before.)
I have resolved the specific dd loop issue by changing it from a "while true"
to "while $RUNNING" and a reset the RUNNING variable in my cleanup logic.
I'm now only have problems with the timeout subshell I'm creating, but the
problem is very similar.
In the script, you will see the first thing you have to do is choose the
scenario you are trying to model: snapshot success, or hang
To run the script use the ". test_script" syntax. This runs it interactively
like the xfs test harness does.
Warning: This script kills your current shell, so invoke a subshell to run the
test_script in each time you want to run it.
In my normal shell with a success I get output like
- Done ( sleep $SIMULATED_SNAPSHOT_DELAY )
cleanup occurs here
WIth a simulated lockup I get:
+ Done sleep 10
snapshot creation lockup
cleanup occurs here
Unfortunately, when the shell notifications that occur inside the xfs test
harness have the pid instead of the subshell instance #.
What I need to do is get rid of those Done messages, or get them to be
consistent. i.e. Without a pid that changes on every invocation.
Deployment and Integration Specialist
Compaq ASE - Tru64 v4, v5
Compaq Master ASE - SAN Architect
The Norcross Group
==== Sample script to show problem
#Choose one of the below based on whether you are testing snapshot success, or
#SIMULATED_SNAPSHOT_DELAY=5 # simulated success
SIMULATED_SNAPSHOT_DELAY=5000 # simulated lockup
status=1 #default to failure
echo cleanup occurs here
trap 0 1 2 3 15
trap "_cleanup" 0 1 2 3 15
# Start of real code
sleep $SIMULATED_SNAPSHOT_DELAY &
sleep 10 & #This is my timeout for lvcreate to finish
TIMERpid=$! # Save my pid, so I can be cancelled
echo snapshot creation lockup
# xfs_freeze -u /scratch # This will allow the lvcreate to run to completion
kill $SNAPSHOT_pid # For this test script, just kill the sleep, but the
kill has no effect on the real hung process.
kill $$ # Terminate this whole test
TIMER_shell_pid=$! # Save the whole subshells pid, so it can be cancelled
kill $TIMER_shell_pid $TIMERpid #cancel the timeout
echo Snapshot success
status=0 # success