xfs
[Top] [All Lists]

Re: [PATCH] xfstests: test data integrity under disk failure

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] xfstests: test data integrity under disk failure
From: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
Date: Sat, 18 May 2013 16:13:25 +0400
Cc: xfs@xxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:from:to:cc:subject:in-reply-to:references :user-agent:date:message-id:mime-version:content-type; bh=wXAbQk9RJGgoHk1V+4c+TmsGyTuR4A98defWLVQ0p58=; b=UHd1cdtKIMgidp2s88OKjHnIefaMK4D37o3T/FwnM7lUm5MhRJ8NkJtJbpaKJEDB8R CzNYyDtMkRo36ZqVQfboJC0C1eBmN2rzBj0GL2kVshDf+TutmaE+T+EEsur2DpKVW8dt Kn+lMHU0peggHjPvyZY/PY9DwdHw9qTpclUFOsnMxX0VeIKbILRi2n6o4fxOuz9DGeSU FqhfFGiLIzzxMKcejFL0gnKm0By9UgXhIyCB4VeSe2a3zpyKTHi0R37z/t+OW3lqVoW+ BpxeKmcYzdB050s7YFFVzQp6mCF9eSf5u16E9ARQ+oatbxAb/dwYQSWht9Xg13QHbAH9 wKDg==
In-reply-to: <20130516233153.GI24635@dastard>
References: <1368706052-24391-1-git-send-email-dmonakhov@xxxxxxxxxx> <20130516233153.GI24635@dastard>
Sender: Dmitry Monakhov <rjevskiy@xxxxxxxxx>
User-agent: Notmuch/0.6.1 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-redhat-linux-gnu)
On Fri, 17 May 2013 09:31:53 +1000, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, May 16, 2013 at 04:07:32PM +0400, Dmitry Monakhov wrote:
> > Parallels team have old good tool called hwflush-check which is 
> > server/client
> > application for testing data integrity under system/disk failure conditions.
> > Usually we run hwflush-check on two different hosts and use PMU to trigger 
> > real
> > power failure of the client as a whole unit. This tests may be used for
> > SSD checking (some of them are known to have probelms with hwflush).
> > I hope it will be good to share it with community.
> > 
> > This tests simulate just one disk failure while client system should
> > survive this failure. This test extend idea of shared/305.
> > 1) Run hwflush-check server and client on same host as usual
> > 2) Simulare disk failure via blkdev failt injection API aka 'make-it-fail'
> > 3) Umount failed device
> > 4) Makes disk operatable again
> > 5) Mount filesystem
> > 3) Check data integrity
> 
> So, for local disk failure, why do we need a client/server network
> architecture? That just complicates the code, and AFAICT
> 
> all the client does is send report report packets to server which
> contain an id number that is kept in memory. If on restart of the
> client after failure the ID in the report packet doesn't match what
> the server wants, then it fails the test.
> 
> So, why is the server needed here? Just dump the IDs the client
> writes to the file on a device not being tested, and either diff
> them against a golden image or run a check to see all the IDs are
> monotonically increasing. That removes all the networking code from
> the test, the need for a client/server architecture, etc, and makes
> the test far easier to review
In fact the reason is quite simple. Initially the this tool was designed
for real disk cache testing under power failure conditions. And want to
share it with community. Off course it is possible to simplify things 
for 'one hose' case but it is not too big. Let's review it one and keep
it simple but useful not just for local but also for real power failure
tests.
Fairly to say that initial idea was to add persistent state to FIO.
But logic starts to getting too complex so we write hwflush-check.

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>