On Tue, Jun 10, 2014 at 11:21:49AM +1000, Dave Chinner wrote:
> On Fri, Jun 06, 2014 at 09:14:43AM -0400, Brian Foster wrote:
> > Changes in the XFS logging code have lead to small leaks in the log
> > grant heads that consume log space slowly over time. Such problems have
> > gone undetected for an unnecessarily long time due to code complexity
> > and potential for very subtle problems. Losing only a few bytes per
> > logged item on a reasonably large enough fs (10s of GB) means only the
> > most continuously stressful workloads will cause a severe enough failure
> > (deadlock due to log reservation exhaustion) quickly enough to indicate
> > something is seriously wrong.
> >
> > Recent changes in XFS export the state of the various log heads through
> > sysfs to aid in userspace/runtime analysis of the log. This test runs a
> > workload against an XFS filesystem, quiesces the fs and verifies that
> > the log reserve and write grant heads have not leaked any space with
> > respect to the current head of the physical log.
> >
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> ....
> >
> > +# Determine the system device name for a particular block device. The
> > device
> > +# name is how the block dev is referenced under sysfs.
> > +_get_device_name()
> > +{
> > + devpath=$1
> > +
> > + # check for a symlink (i.e., device mapper)
> > + if [ -L $devpath ]
> > + then
> > + devpath=`readlink -f $devpath`
> > + fi
> > +
> > + # grab the major minor and convert from hex to decimal
> > + major=$((0x`stat -c %t $devpath`))
> > + minor=$((0x`stat -c %T $devpath`))
> > +
> > + # refer to sysfs by major minor
> > + basename `readlink /sys/dev/block/$major:$minor`
> > +}
>
> $ basename `readlink -f /dev/mapper/vg0-home`
> dm-2
> $ basename `readlink /sys/dev/block/253:2`
> dm-2
>
> Why is _short_dev() not sufficient?
>
That should work... I guess I missed it.
> > +# Use the information exported by XFS to sysfs to determine whether the
> > log has
> > +# active reservations after a filesystem freeze.
> > +_check_scratch_log_state()
> > +{
> > + devname=`_get_device_name $SCRATCH_DEV`
> > + attrpath="/sys/fs/xfs/$devname/log"
> > +
> > + # freeze the fs to ensure data is synced and the log is flushed. this
> > + # means no outstanding transactions, and thus no outstanding log
> > + # reservations, should exist
> > + xfs_freeze -f $SCRATCH_MNT
> > +
> > + # the log head is exported in basic blocks and the log grant heads in
> > + # bytes. convert the log head to bytes for precise comparison
> > + log_head_cycle=`cat $attrpath/log_head_lsn | awk -F : '{ print $1 }'`
> > + log_head_bytes=`cat $attrpath/log_head_lsn | awk -F : '{ print $2 }'`
>
> awk can read files directly:
>
> log_head_cycle=`awk -F : '{ print $1 }' $attrpath/log_head_lsn`
>
> > + log_head_bytes=$((log_head_bytes * 512))
> > +
> > + for attr in "reserve_grant_head" "write_grant_head"
> > + do
> > + cycle=`cat $attrpath/$attr | awk -F : '{ print $1 }'`
> > + bytes=`cat $attrpath/$attr | awk -F : '{ print $2 }'`
> > +
> > + if [ $cycle != $log_head_cycle ] ||
> > + [ $bytes != $log_head_bytes ]
> > + then
> > + echo "$attr ($cycle:$bytes) does not match" \
> > + "log_head_lsn
> > ($log_head_cycle:$log_head_bytes)," \
> > + "possible leak detected."
> > + else
> > + echo "$attr matches log_head_lsn"
> > + fi
> > + done
> > +
> > + xfs_freeze -u $SCRATCH_MNT
> > +}
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_supported_os Linux
> > +
> > +_require_scratch
> > +_require_freeze
> > +
> > +if [ ! -e /sys/fs/xfs ]
> > +then
> > + _notrun "no kernel support for XFS sysfs attributes"
> > +fi
>
> _requires_xfs_sysfs
>
I was mulling this over as I think we'll probably end up in a situation
where a test that depends on sysfs bits will need to check for a
specific attribute file. E.g., some new test comes along using a new
attribute file. Checking for /sys/fs/xfs is not sufficient for that test
once we release a version that so far only exports the log bits.
I think we could handle that by supporting a parameter to
_requires_xfs_sysfs that specifies the sub-attribute that must exist
(similar to what we have for xfs_io commands). We don't need that at the
moment, but that's good enough for me to create the requires func.
> > +
> > +rm -f $seqres.full
> > +
> > +_scratch_mkfs_xfs | _filter_mkfs 2>> $seqres.full
> > +_scratch_mount
> > +
> > +_check_scratch_log_state
> > +
> > +$FSSTRESS_PROG -d $SCRATCH_MNT/fsstress -n 1000 -p 2 -S t \
> > + >> $seqres.full 2>&1
> > +
> > +_check_scratch_log_state
>
> wouldn't it be better to run fsstress as a background process and do
> several freeze/check/thaw cycles on a running workload?
>
Good idea, that might be a more effective test for catching future
issues. As it is, this test runs in a few seconds so I think we could
beef it up a little without losing the 'quick' categorization. I'll play
around with it.
> > +
> > +umount $SCRATCH_MNT
> > +_check_scratch_fs
> > +
> > +status=0
> > +exit
> > diff --git a/tests/xfs/011.out b/tests/xfs/011.out
> > new file mode 100644
> > index 0000000..a3f3805
> > --- /dev/null
> > +++ b/tests/xfs/011.out
> > @@ -0,0 +1,11 @@
> > +QA output created by 011
> > +meta-data=DDEV isize=XXX agcount=N, agsize=XXX blks
> > +data = bsize=XXX blocks=XXX, imaxpct=PCT
> > + = sunit=XXX swidth=XXX, unwritten=X
> > +naming =VERN bsize=XXX
> > +log =LDEV bsize=XXX blocks=XXX
> > +realtime =RDEV extsz=XXX blocks=XXX, rtextents=XXX
>
> Any particular reason for dumping the filtered mkfs information
> here? It won't ever cause a test failure unless we break
> _filter_mkfs...
>
Not really, I'll clean that up.
> > +reserve_grant_head matches log_head_lsn
> > +write_grant_head matches log_head_lsn
> > +reserve_grant_head matches log_head_lsn
> > +write_grant_head matches log_head_lsn
> > diff --git a/tests/xfs/group b/tests/xfs/group
> > index 19fd968..99bf0e1 100644
> > --- a/tests/xfs/group
> > +++ b/tests/xfs/group
> > @@ -8,6 +8,7 @@
> > 008 rw ioctl auto quick
> > 009 rw ioctl auto prealloc quick
> > 010 auto quick repair
> > +011 auto quick freeze
>
> log and metadata, too.
>
Ok. Thanks for the review.
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
|