xfs
[Top] [All Lists]

Re: [PATCH 2/2] xfstests 294: test xfs direct IO nested transaction dead

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 2/2] xfstests 294: test xfs direct IO nested transaction deadlock
From: Eryu Guan <eguan@xxxxxxxxxx>
Date: Fri, 11 Jan 2013 13:25:06 +0800
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20130110222702.GJ3120@dastard>
References: <1357835665-15621-1-git-send-email-eguan@xxxxxxxxxx> <1357835665-15621-2-git-send-email-eguan@xxxxxxxxxx> <20130110222702.GJ3120@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Jan 11, 2013 at 09:27:03AM +1100, Dave Chinner wrote:
> On Fri, Jan 11, 2013 at 12:34:25AM +0800, Eryu Guan wrote:
> > Test case is based on 293 and make xfs with mkfs command
> > 
> > mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b
> > $SCRATCH_DEV
> > 
> > Regression test case for commit
> > 
> > 437a255 xfs: fix direct IO nested transaction deadlock.
> > 
> > Signed-off-by: Eryu Guan <eguan@xxxxxxxxxx>
> > ---
> >  294     | 90 
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  294.out |  4 +++
> >  group   |  1 +
> >  3 files changed, 95 insertions(+)
> >  create mode 100755 294
> >  create mode 100644 294.out
> > 
> > diff --git a/294 b/294
> > new file mode 100755
> > index 0000000..817b877
> > --- /dev/null
> > +++ b/294
> > @@ -0,0 +1,90 @@
> > +#! /bin/bash
> > +# FS QA Test No. 294
> > +#
> > +# Test freeze/unfreeze file system randomly under fsstress
> > +# Based on 293, regression test for commit:
> > +# 437a255 xfs: fix direct IO nested transaction deadlock.
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2013 Red Hat, Inc.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#-----------------------------------------------------------------------
> > +#
> > +# creator
> > +owner=eguan@xxxxxxxxxx
> > +
> > +seq=`basename $0`
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1   # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +    cd /
> > +    rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common.rc
> > +. ./common.filter
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_supported_os IRIX Linux
> > +
> > +_require_scratch
> > +_require_freeze
> > +
> > +_scratch_unmount >/dev/null 2>&1
> 
> SCRATCH_DEV is already unmounted at this point - it it was mounted
> the _require_scratch unmounts it....
> 
> > +_scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b 
> > >/dev/null 2>&1
> > +_scratch_mount >/dev/null 2>&1
> > +
> > +TESTDIR="$SCRATCH_MNT/testdir"
> > +mkdir -p $TESTDIR
> > +rm -f $seq.full
> 
> YOu need to do this before making the filesystem - scratch_mkfs
> writes the mkfs output to $seq.full.
> 
> > +
> > +echo "Start fsstress" | tee -a $seq.full
> 
> No need to echo stuff like this to the output files.

Thanks for the review! I'll update in next version.

> 
> > +$FSSTRESS_PROG -d $TESTDIR -n 100 -p 1000 $FSSTRESS_AVOID >/dev/null 2>&1 &
> > +FSSTRESS_PID=$!
> > +sleep 1
> 
> Why the sleep?
> 
> > +
> > +# Freeze/unfreeze file system randomly
> > +NR_FSSTRESS=`ps -eo ppid | grep -c $FSSTRESS_PID`
> 
> Is the sleep there so that fsstress has created some of it's threads
> by this point?

The sleep seems redundant, I'll remove it.

> 
> > +echo "Start freeze/unfreeze randomly" | tee -a $seq.full
> > +while [ $NR_FSSTRESS -gt 0 ];do
> > +   TIMEOUT=`expr $RANDOM % 5`
> > +   sleep $TIMEOUT
> > +   echo "* Freeze file system after sleeping $TIMEOUT seconds" >>$seq.full
> > +   xfs_freeze -f $SCRATCH_MNT
> > +   if [ $? -ne 0 ];then
> > +           echo " - Error: freeze filesystem failed"
> > +   fi
> > +   TIMEOUT=`expr $RANDOM % 3`
> > +   sleep $TIMEOUT
> > +   echo "* Unfreeze file system after sleeping $TIMEOUT seconds" 
> > >>$seq.full
> > +   xfs_freeze -u $SCRATCH_MNT
> > +   if [ $? -ne 0 ];then
> > +           echo " - Error: unfreeze filesystem failed"
> > +   fi
> > +   NR_FSSTRESS=`ps -eo ppid | grep -c $FSSTRESS_PID`
> > +done
> > +echo "Test done" | tee -a $seq.full
> > +
> > +_check_scratch_fs
> > +status=0
> > +exit
> > diff --git a/294.out b/294.out
> > new file mode 100644
> > index 0000000..2ccaa6e
> > --- /dev/null
> > +++ b/294.out
> > @@ -0,0 +1,4 @@
> > +QA output created by 294
> > +Start fsstress
> > +Start freeze/unfreeze randomly
> > +Test done
> > diff --git a/group b/group
> > index f9c697a..0c6d158 100644
> > --- a/group
> > +++ b/group
> > @@ -412,3 +412,4 @@ deprecated
> >  291 repair
> >  292 auto mkfs quick
> >  293 auto freeze dangerous
> > +294 auto freeze dangerous
> 
> It's not a dangerous test - most kernels will not hang on it.

I see the description for dangerous group is "Tests which may oops or
hang", so I put it in the group. If you think it's not proper here I can
remove the dangerous group.

> 
> I suspect that auto is also questionable - what's the run time of
> the test? If it's anything more than 5 minutes, then it's too long
> for the auto group. Given that the bug it showed up triggered in a
> few seconds on the first 2-3 freezes, perhaps you should run a fixed
> number of freeze/unfreeze cycles rather than waiting for fsstress
> processes to do all their ops? e.g. run 10-20 freeze/thaw cycles,
> then kill all the fsstress processes...

I think auto group is for tests that can be run automatically, it's not
about time - I guess that's what the quick group is for. The description
for auto is "tests to be run as part of nightly qa". So I think it's
proper here.

I tested it on two hosts:
A - Intel Xeon E5-2630 CPU, 12 logical processors, 32G memory
B - Intel Core2 Quand CPU, 4 logical processors, 4G memory

And the run time and run cycles are:
A - 70 ~ 75 seconds, 19 ~ 21 iterations
B - about 380 seconds, 17 interations

Not too many interations, so I'd like to leave fsstress running to end.

Thanks again for the intensive review!

Eryu
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>