[PATCH] xfstests: create a test for xfs log grant head leak detection
Dave Chinner
david at fromorbit.com
Mon Jun 9 20:21:49 CDT 2014
On Fri, Jun 06, 2014 at 09:14:43AM -0400, Brian Foster wrote:
> Changes in the XFS logging code have lead to small leaks in the log
> grant heads that consume log space slowly over time. Such problems have
> gone undetected for an unnecessarily long time due to code complexity
> and potential for very subtle problems. Losing only a few bytes per
> logged item on a reasonably large enough fs (10s of GB) means only the
> most continuously stressful workloads will cause a severe enough failure
> (deadlock due to log reservation exhaustion) quickly enough to indicate
> something is seriously wrong.
>
> Recent changes in XFS export the state of the various log heads through
> sysfs to aid in userspace/runtime analysis of the log. This test runs a
> workload against an XFS filesystem, quiesces the fs and verifies that
> the log reserve and write grant heads have not leaked any space with
> respect to the current head of the physical log.
>
> Signed-off-by: Brian Foster <bfoster at redhat.com>
> ---
....
>
> +# Determine the system device name for a particular block device. The device
> +# name is how the block dev is referenced under sysfs.
> +_get_device_name()
> +{
> + devpath=$1
> +
> + # check for a symlink (i.e., device mapper)
> + if [ -L $devpath ]
> + then
> + devpath=`readlink -f $devpath`
> + fi
> +
> + # grab the major minor and convert from hex to decimal
> + major=$((0x`stat -c %t $devpath`))
> + minor=$((0x`stat -c %T $devpath`))
> +
> + # refer to sysfs by major minor
> + basename `readlink /sys/dev/block/$major:$minor`
> +}
$ basename `readlink -f /dev/mapper/vg0-home`
dm-2
$ basename `readlink /sys/dev/block/253:2`
dm-2
Why is _short_dev() not sufficient?
> +# Use the information exported by XFS to sysfs to determine whether the log has
> +# active reservations after a filesystem freeze.
> +_check_scratch_log_state()
> +{
> + devname=`_get_device_name $SCRATCH_DEV`
> + attrpath="/sys/fs/xfs/$devname/log"
> +
> + # freeze the fs to ensure data is synced and the log is flushed. this
> + # means no outstanding transactions, and thus no outstanding log
> + # reservations, should exist
> + xfs_freeze -f $SCRATCH_MNT
> +
> + # the log head is exported in basic blocks and the log grant heads in
> + # bytes. convert the log head to bytes for precise comparison
> + log_head_cycle=`cat $attrpath/log_head_lsn | awk -F : '{ print $1 }'`
> + log_head_bytes=`cat $attrpath/log_head_lsn | awk -F : '{ print $2 }'`
awk can read files directly:
log_head_cycle=`awk -F : '{ print $1 }' $attrpath/log_head_lsn`
> + log_head_bytes=$((log_head_bytes * 512))
> +
> + for attr in "reserve_grant_head" "write_grant_head"
> + do
> + cycle=`cat $attrpath/$attr | awk -F : '{ print $1 }'`
> + bytes=`cat $attrpath/$attr | awk -F : '{ print $2 }'`
> +
> + if [ $cycle != $log_head_cycle ] ||
> + [ $bytes != $log_head_bytes ]
> + then
> + echo "$attr ($cycle:$bytes) does not match" \
> + "log_head_lsn ($log_head_cycle:$log_head_bytes)," \
> + "possible leak detected."
> + else
> + echo "$attr matches log_head_lsn"
> + fi
> + done
> +
> + xfs_freeze -u $SCRATCH_MNT
> +}
> +
> +# real QA test starts here
> +_supported_fs xfs
> +_supported_os Linux
> +
> +_require_scratch
> +_require_freeze
> +
> +if [ ! -e /sys/fs/xfs ]
> +then
> + _notrun "no kernel support for XFS sysfs attributes"
> +fi
_requires_xfs_sysfs
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs_xfs | _filter_mkfs 2>> $seqres.full
> +_scratch_mount
> +
> +_check_scratch_log_state
> +
> +$FSSTRESS_PROG -d $SCRATCH_MNT/fsstress -n 1000 -p 2 -S t \
> + >> $seqres.full 2>&1
> +
> +_check_scratch_log_state
wouldn't it be better to run fsstress as a background process and do
several freeze/check/thaw cycles on a running workload?
> +
> +umount $SCRATCH_MNT
> +_check_scratch_fs
> +
> +status=0
> +exit
> diff --git a/tests/xfs/011.out b/tests/xfs/011.out
> new file mode 100644
> index 0000000..a3f3805
> --- /dev/null
> +++ b/tests/xfs/011.out
> @@ -0,0 +1,11 @@
> +QA output created by 011
> +meta-data=DDEV isize=XXX agcount=N, agsize=XXX blks
> +data = bsize=XXX blocks=XXX, imaxpct=PCT
> + = sunit=XXX swidth=XXX, unwritten=X
> +naming =VERN bsize=XXX
> +log =LDEV bsize=XXX blocks=XXX
> +realtime =RDEV extsz=XXX blocks=XXX, rtextents=XXX
Any particular reason for dumping the filtered mkfs information
here? It won't ever cause a test failure unless we break
_filter_mkfs...
> +reserve_grant_head matches log_head_lsn
> +write_grant_head matches log_head_lsn
> +reserve_grant_head matches log_head_lsn
> +write_grant_head matches log_head_lsn
> diff --git a/tests/xfs/group b/tests/xfs/group
> index 19fd968..99bf0e1 100644
> --- a/tests/xfs/group
> +++ b/tests/xfs/group
> @@ -8,6 +8,7 @@
> 008 rw ioctl auto quick
> 009 rw ioctl auto prealloc quick
> 010 auto quick repair
> +011 auto quick freeze
log and metadata, too.
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list