xfs
[Top] [All Lists]

Re: [PATCH 2/2] xfstests 294: test xfs direct IO nested transaction dead

To: Eryu Guan <eguan@xxxxxxxxxx>
Subject: Re: [PATCH 2/2] xfstests 294: test xfs direct IO nested transaction deadlock
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 11 Jan 2013 09:27:03 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <1357835665-15621-2-git-send-email-eguan@xxxxxxxxxx>
References: <1357835665-15621-1-git-send-email-eguan@xxxxxxxxxx> <1357835665-15621-2-git-send-email-eguan@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Jan 11, 2013 at 12:34:25AM +0800, Eryu Guan wrote:
> Test case is based on 293 and make xfs with mkfs command
> 
> mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b
> $SCRATCH_DEV
> 
> Regression test case for commit
> 
> 437a255 xfs: fix direct IO nested transaction deadlock.
> 
> Signed-off-by: Eryu Guan <eguan@xxxxxxxxxx>
> ---
>  294     | 90 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  294.out |  4 +++
>  group   |  1 +
>  3 files changed, 95 insertions(+)
>  create mode 100755 294
>  create mode 100644 294.out
> 
> diff --git a/294 b/294
> new file mode 100755
> index 0000000..817b877
> --- /dev/null
> +++ b/294
> @@ -0,0 +1,90 @@
> +#! /bin/bash
> +# FS QA Test No. 294
> +#
> +# Test freeze/unfreeze file system randomly under fsstress
> +# Based on 293, regression test for commit:
> +# 437a255 xfs: fix direct IO nested transaction deadlock.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2013 Red Hat, Inc.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +# creator
> +owner=eguan@xxxxxxxxxx
> +
> +seq=`basename $0`
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1     # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +    cd /
> +    rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +
> +# real QA test starts here
> +_supported_fs xfs
> +_supported_os IRIX Linux
> +
> +_require_scratch
> +_require_freeze
> +
> +_scratch_unmount >/dev/null 2>&1

SCRATCH_DEV is already unmounted at this point - it it was mounted
the _require_scratch unmounts it....

> +_scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b 
> >/dev/null 2>&1
> +_scratch_mount >/dev/null 2>&1
> +
> +TESTDIR="$SCRATCH_MNT/testdir"
> +mkdir -p $TESTDIR
> +rm -f $seq.full

YOu need to do this before making the filesystem - scratch_mkfs
writes the mkfs output to $seq.full.

> +
> +echo "Start fsstress" | tee -a $seq.full

No need to echo stuff like this to the output files.

> +$FSSTRESS_PROG -d $TESTDIR -n 100 -p 1000 $FSSTRESS_AVOID >/dev/null 2>&1 &
> +FSSTRESS_PID=$!
> +sleep 1

Why the sleep?

> +
> +# Freeze/unfreeze file system randomly
> +NR_FSSTRESS=`ps -eo ppid | grep -c $FSSTRESS_PID`

Is the sleep there so that fsstress has created some of it's threads
by this point?

> +echo "Start freeze/unfreeze randomly" | tee -a $seq.full
> +while [ $NR_FSSTRESS -gt 0 ];do
> +     TIMEOUT=`expr $RANDOM % 5`
> +     sleep $TIMEOUT
> +     echo "* Freeze file system after sleeping $TIMEOUT seconds" >>$seq.full
> +     xfs_freeze -f $SCRATCH_MNT
> +     if [ $? -ne 0 ];then
> +             echo " - Error: freeze filesystem failed"
> +     fi
> +     TIMEOUT=`expr $RANDOM % 3`
> +     sleep $TIMEOUT
> +     echo "* Unfreeze file system after sleeping $TIMEOUT seconds" 
> >>$seq.full
> +     xfs_freeze -u $SCRATCH_MNT
> +     if [ $? -ne 0 ];then
> +             echo " - Error: unfreeze filesystem failed"
> +     fi
> +     NR_FSSTRESS=`ps -eo ppid | grep -c $FSSTRESS_PID`
> +done
> +echo "Test done" | tee -a $seq.full
> +
> +_check_scratch_fs
> +status=0
> +exit
> diff --git a/294.out b/294.out
> new file mode 100644
> index 0000000..2ccaa6e
> --- /dev/null
> +++ b/294.out
> @@ -0,0 +1,4 @@
> +QA output created by 294
> +Start fsstress
> +Start freeze/unfreeze randomly
> +Test done
> diff --git a/group b/group
> index f9c697a..0c6d158 100644
> --- a/group
> +++ b/group
> @@ -412,3 +412,4 @@ deprecated
>  291 repair
>  292 auto mkfs quick
>  293 auto freeze dangerous
> +294 auto freeze dangerous

It's not a dangerous test - most kernels will not hang on it.

I suspect that auto is also questionable - what's the run time of
the test? If it's anything more than 5 minutes, then it's too long
for the auto group. Given that the bug it showed up triggered in a
few seconds on the first 2-3 freezes, perhaps you should run a fixed
number of freeze/unfreeze cycles rather than waiting for fsstress
processes to do all their ops? e.g. run 10-20 freeze/thaw cycles,
then kill all the fsstress processes...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>