[Top] [All Lists]

Re: easily reproducible filesystem crash on rebuilding array

To: xfs@xxxxxxxxxxx
Subject: Re: easily reproducible filesystem crash on rebuilding array
From: Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>
Date: Tue, 13 Jan 2015 12:21:08 +0100
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20141211123936.1f3d713d@xxxxxxxxxxxxxxxxxxxx>
Organization: Intellique
References: <20141211123936.1f3d713d@xxxxxxxxxxxxxxxxxxxx>
Le Thu, 11 Dec 2014 12:39:36 +0100
Emmanuel Florac <eflorac@xxxxxxxxxxxxxx> Ãcrivait:

> Here's the setup: hardware RAID controller (Adaptec 7xx5 series,
> latest firmware), RAID-6 array (problem occured with different RAID
> width, sizes, and disk configuration), and different kernels from
> 3.2.x to 3.16.x.
> What happens: while the array is rebuilding, simultaneously reading
> and writing is a sure way to break the filesystem and at times,
> corrupt data.
> If the array is NOT rebuilding, nothing ever happens. When using the
> array in read-only mode while it rebuilds, nothing ever happens.
> However, while the array is rebuilding, relatively heavy IO almost
> certainly brings up something as follows [snip]

So here's where I am at the moment:

* XFS v4 on rebuilding adaptec RAID fails under heavy IO, with or
  without LVM, with kernels 3.2.xx up to 3.17.7.

* Today I've run the same test with ext4 : no problem whatsoever. I'm
  rechecking md5 of all files to get sure, but it looks OK so far after
  testing several terabytes. 

I don't understand how the RAID firmware could send back bad data
(corrupted metadata AND data) to XFS but not to ext4...

Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |   <eflorac@xxxxxxxxxxxxxx>
                    |   +33 1 78 94 84 02

<Prev in Thread] Current Thread [Next in Thread>