xfs
[Top] [All Lists]

Re: easily reproducible filesystem crash on rebuilding array

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: easily reproducible filesystem crash on rebuilding array
From: Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>
Date: Wed, 17 Dec 2014 12:21:59 +0100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20141216195815.GB15665@dastard>
Organization: Intellique
References: <20141211123936.1f3d713d@xxxxxxxxxxxxxxxxxxxx> <20141215130715.4dfaaa8e@xxxxxxxxxxxxxxxxxxxx> <20141215132500.13210fdb@xxxxxxxxxxxxxxxxxxxx> <20141215201036.GQ24183@dastard> <20141216123405.111c7ac0@xxxxxxxxxxxxxxxxxxxx> <20141216195815.GB15665@dastard>
Le Wed, 17 Dec 2014 06:58:15 +1100
Dave Chinner <david@xxxxxxxxxxxxx> Ãcrivait:

> On Tue, Dec 16, 2014 at 12:34:05PM +0100, Emmanuel Florac wrote:
> > The RAID hardware is an adaptec 71685 running the latest firmware
> > ( 32033 ). This is a 16 drives RAID-6 array of 4 TB HGST drives. The
> > problem occurs repeatly with any combination of 7xx5 controllers
> > and 3 or 4 TB HGST drives in RAID-6 of various types, with XFS or
> > JFS (it never occurs with either ext4 or reiserfs).
> 
> Do you have systems with any other type of 3/4TB drives in them?

No, only HGST drives.
 
> > As I mentioned, when the disk drives cache is on the corruption is
> > serious. With disk cache off, the corruption is minimal, however the
> > filesystem shuts down.
> 
> That really sounds like a hardware problem - maybe with the disk
> drives themselves, not necessarily the controller.

Actually the problem occurs without any error in the controller log, no
IO error, no disk time out, no bad block, nothing. So far I was pretty
confident about the Adaptec firmware being the culprit, I'm not so sure
now.


> > > I'd start with upgrading the firmware on your RAID controller and
> > > turning the XFS error level up to 11....
> > 
> > The firmware is the latest available. How do I turn logging to 11
> > please ?
> 
> # echo 11 > /proc/sys/fs/xfs/error_level
> 

Thanks done, while running again but *without using lvm* this time. I'm
changing one parameter at a time...

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |   <eflorac@xxxxxxxxxxxxxx>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>