[Top] [All Lists]

Re: [PATCH] xfstests: add disk failure simulation test

To: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
Subject: Re: [PATCH] xfstests: add disk failure simulation test
From: Greg Freemyer <greg.freemyer@xxxxxxxxx>
Date: Thu, 21 Feb 2013 22:27:22 -0500
Cc: xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, dchinner@xxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=yE7wte2XxFsv66B7003IFaak5y4Bpdn5/fcOEwwLf6U=; b=WtM37cNj5hF1vkFHdKqU6oDl7ywFYoFBrzf7NoOA3QvCaduQAprHW+cbF848rNkOu/ mZ9y+fnk86fTd3j16oEDyS74zZnOcfUfhnCAMh5EnEVa/roRLhFyJL3X85UwKQhUwWek SpBTUrEIQsACikh4BwGHRq4ShyiSlcH2n7GzoMq069/ZijGQETBP1Fh5QTgJ3umE24tG W5MLfEbRqb/xyDjfA9EUXleBxYCoyGC93c+CUDvCrOIvQmROGb+0MlZRHlILhvmDmCow oGEt9byda1zzuBqA4uF00tNCxoTGMPp4AFFlRmYmRgSTF6hjmPWoWzRvQ43+vn1wlTiS FeLg==
In-reply-to: <1360770097-6351-1-git-send-email-dmonakhov@xxxxxxxxxx>
References: <1360770097-6351-1-git-send-email-dmonakhov@xxxxxxxxxx>
On Wed, Feb 13, 2013 at 10:41 AM, Dmitry Monakhov <dmonakhov@xxxxxxxxxx> wrote:
> There are many situations where disk may fail for example
> 1) brutal usb dongle unplug
> 2) iscsi (or any other netbdev) failure due to network issues
> In this situation filesystem which use this blockdevice is
> expected to fail(force RO remount, abort, etc) but whole system
> should still be operational. In other words:
> 1) Kernel should not panic
> 2) Memory should not leak
> 3) Data integrity operations (sync,fsync,fdatasync, directio) should fail
>    for affected filesystem
> 4) It should be possible to umount broken filesystem

Out of curiosity, does xfstest also have fault injection at the sector level?

It may be a little too aggressive, but hdparm --make-bad-sector
nnnnnnn can use a ATA long_write to write out a sector and
non-matching crc.  When the sector is then read after that, the drive
returns a media error.

At the end of the test  hdparm --repair-sector nnnnnnn will fix the
bad sector and store a valid crc.

The reason I say it is aggressive is that matched pairs of
--make-bad-sector and --repair-sector should have no long term effect
on the drive, but non-matched pairs will leave the drive with a media
error.  A normal write to that "bad" sector will force it to be
remapped to a spare sector.  I don't know of a simple way to undo that


<Prev in Thread] Current Thread [Next in Thread>