[Top] [All Lists]

Re: [RFC] xfstests: define an INTENSITY level for testing

To: Alex Elder <aelder@xxxxxxx>
Subject: Re: [RFC] xfstests: define an INTENSITY level for testing
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sat, 23 Jan 2010 22:59:55 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <1AB9A794DBDDF54A8A81BE2296F7BDFE012A69B2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <1AB9A794DBDDF54A8A81BE2296F7BDFE012A69B2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Thu, Jan 21, 2010 at 03:36:07PM -0600, Alex Elder wrote:
> I've often felt it would be nice if testing could be
> done to a specified level of intensity.  That way,
> for example, I could perform a full suite of tests
> but have them just do very basic stuff, so that I
> get coverage but without having to wait as long as
> is required for a hard-core test.  Similarly, before
> a release I'd like to run tests exhaustively, to
> make sure things get really exercised.

At first glance, this sounds like a good idea to control
the runtime of test runs.

However, after thinking about it for a while and reflecting on
the approach the QA group in ASG (long live ASG!) took for release
testing, I have a few concerns about using the concept in xfstests.

> Right now there is a "quick" group defined for xfstests,
> but what I'm talking about is more of a parameter applied
> to all tests so that certain functions could be lightly
> tested that might not otherwise be covered by one of the
> "quick" ones.  We might even be able to get rid of the
> "quick" group.  And an inherently long-running test
> might make itself not run if the intensity level was
> not high enough.

IIRC we introduced the "quick" group as a way to provide developers
sufficient coverage to flush out major bugs in patches quickly, not
provide complete test coverage. i.e. to speed up the development
process, not speed up or improve the QA process. Patches still need
to pass the "auto" group tests without regressions before being
posted for review....

> So I propse we defined a global, set in common.rc, which
> defines an integer 0 < INTENSITY <= 100, which would
> define how hard each test should push.  INTENSITY of
> 100, would cause all tests would do their most exhaustive
> and/or demanding exercises.  INTENSITY of 1 would do very
> superficial testing.  Default might be 50.

How would you solve the problem that "intensity" is very dependent
on the system the tests are being run on? e.g. Something run on an
SSD is going to run far faster than the same test on a UML instance
on a slow laptop disk, even though they run at the same "intensity"

Another concern I have is that "intensity" might have different causes
on different systems.  e.g. on UML, it is forking new processes that
causes the massive slowdowns (300ms for a fork+exec on a 2GHz
Athlon64), not the amount of IO. Hence changing the number of files
or IOPS won't really change the runtime of tests significantly if
the problem is that the test runs "expr" 100,000 times. e.g:


> Tests can simply ignore the INTENSITY value, and initially
> that will be the case for most tests.  It may not even make
> sense for a given test to have its activity scaled by this
> setting.  Once we define it though, tests can be adapted
> to make use of it where possible.
> Below is a patch that shows how such feature might be
> used for tests 104 and 109.

/me looks at the changes

I think this is the wrong fix for decreasing test 104 runtime.
The fstress processes only need to run while the grows are in
progress, once they are complete the fsstress processes
can be killed rather than waited for. Using kill then wait
would reduce the runtime without potentially compromising the
test - if the number of ops are too low then fsstress doesn't run
long enough to effectively load up the filesystem during the grow
process to trigger the deadlock conditions.

For 109 I think changing the number of files compromises the initial
conditions required to trigger the deadlock on kernels <= 2.6.18.
It's an enospc test on a 160MB filesystem and the number of files it
uses is for fragementing free space sufficiently to trigger
out-of-order AG locking when ENOSPC in and AG occurs. Changing the
number of files results in different freespace fragmentation
patterns and  hence may not trigger the deadlock condition....


Stepping back and looking at this from an overall QA coverage point
of view, it seems to me that you are trying to make xfstests be
something that it is not intended to be. You want "exhaustive" test
coverage before a release, but xfstests have never been a vehicle
for exhaustive testing. That is, xfstests is really designed to
provide maximal code coverage with some load and stress tests
thrown in, but it is not intended to be the only testing mechanism
for the filesystem.

It might be instructive to go back and look at what the old SGI ASG
(long live ASG!) test group were doing (I hope it was archived!).
They were running xfstests on multiple platforms (x86_64, PPC and
ia64) for code coverage but not stress. To improve coverage, every
second xfstest run used a different set of non-default mkfs and
mount options to exercise different code paths (e.g. blocksize <
pagesize, directory block size > page size, etc) which otherwise
would not be tested.

There were separate test plans, procedures, processes and scripts to
execute long running stress and load tests. These were run as part
of the QA validation prior to major releases (the angle you appear
to be coming from, Alex) rather than day-to-day testing of the
current dev kernels.

More importantly, the load/stress tests weren't aimed at specific
XFS features (already handled by xfstests) - instead they were high
level tests aimed at trying to break the system.  e.g. one of the
stress tests was running tens of local processes creating and
destroying large and small files simultaneously with NFS clients
doing the same thing on the same filesystem whilst turning quotas on
and off randomly and running concurrent filesystem snapshots and
then mounting and running filesystem checks on the snapshots to
ensure they were consistent.  These tests would run for up to a week
at a time, so it takes dedicated resources to run this sort of

For load point tests, similar tests were run but the number of
processes creating load were varied over time so that the system
load varied between almost idle to almost 100% to ensure that
there weren't problems that light or medium loads exposed. Once
again these were long running tests on multiple platforms.


In my experience, exhaustive testing requires a combination of
testing from low level point tests (xfstests) all the way up to high
level system level integration tests. The methods and test processes
for these are different as the focus for the tests are different.

Hence I agree with your intent and reasoning behind intensity level
based stress testing, but I think that xfstests is not the right
sort of test suite to use for this type of testing. I think we'd
do better to try to recover some of the high level stress tests
and processes from the corpse of ASG than to try to use xfstests
for this....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>