[Top] [All Lists]

Re: [PATCH 10/10] xfstests: add disk failure simulation test

To: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
Subject: Re: [PATCH 10/10] xfstests: add disk failure simulation test
From: Rich Johnston <rjohnston@xxxxxxx>
Date: Fri, 1 Mar 2013 14:11:15 -0600
Cc: <xfs@xxxxxxxxxxx>, <linux-fsdevel@xxxxxxxxxxxxxxx>, <linux-ext4@xxxxxxxxxxxxxxx>, <dchinner@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1361356935-29153-11-git-send-email-dmonakhov@xxxxxxxxxx>
References: <1361356935-29153-1-git-send-email-dmonakhov@xxxxxxxxxx> <1361356935-29153-11-git-send-email-dmonakhov@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/20120615 Thunderbird/13.0.1
On 02/20/2013 04:42 AM, Dmitry Monakhov wrote:
There are many situations where disk may fail for example
1) brutal usb dongle unplug
2) iscsi (or any other netbdev) failure due to network issues
In this situation filesystem which use this blockdevice is
expected to fail(force RO remount, abort, etc) but whole system
should still be operational. In other words:
1) Kernel should not panic
2) Memory should not leak
3) Data integrity operations (sync,fsync,fdatasync, directio) should fail
    for affected filesystem
4) It should be possible to umount broken filesystem

Later when disk becomes available again we expect(only for journaled 
5) It will be possible to mount filesystem w/o explicit fsck (in order to caught

typo                                     s/caught/catch/g

    issues like https://patchwork.kernel.org/patch/1983981/)
6) Filesystem should be operational
7) After mount/umount has being done all errors should be fixed so fsck should
    not spot any issues.

This test use fault enjection (CONFIG_FAIL_MAKE_REQUEST=y config option )
  May want to mention all the kernel config options required.
i.e. CONFIG_FAULT_INJECTION=y ... are there others?

which force all new IO requests to fail for a given device. Xfs already has
  to force

XFS_IOC_GOINGDOWN ioctl which provides similar behaviour, but it is fs speciffic

typos s/behaviour/behavior/g  s/speciffic/specific
> and it does it in an easy way because it perform freeze_bdev() before actual
typo s/shotdown/shutdown/g

Test run fsstress in background and then force disk failure.
Once disk failed it check that (1)-(4) is true.
  Once the disk fails, check that (1)-(4) are true.

Then makes disk available again and check that (5)-(7) is also true
       make the disk ...                                 are

BE CAREFUL!! test known to cause memory corruption for XFS
see: https://gist.github.com/dmonakhov/4953045

<Prev in Thread] Current Thread [Next in Thread>