easily reproducible filesystem crash on rebuilding array
Emmanuel Florac
eflorac at intellique.com
Wed Dec 17 05:21:59 CST 2014
Le Wed, 17 Dec 2014 06:58:15 +1100
Dave Chinner <david at fromorbit.com> écrivait:
> On Tue, Dec 16, 2014 at 12:34:05PM +0100, Emmanuel Florac wrote:
> > The RAID hardware is an adaptec 71685 running the latest firmware
> > ( 32033 ). This is a 16 drives RAID-6 array of 4 TB HGST drives. The
> > problem occurs repeatly with any combination of 7xx5 controllers
> > and 3 or 4 TB HGST drives in RAID-6 of various types, with XFS or
> > JFS (it never occurs with either ext4 or reiserfs).
>
> Do you have systems with any other type of 3/4TB drives in them?
No, only HGST drives.
> > As I mentioned, when the disk drives cache is on the corruption is
> > serious. With disk cache off, the corruption is minimal, however the
> > filesystem shuts down.
>
> That really sounds like a hardware problem - maybe with the disk
> drives themselves, not necessarily the controller.
Actually the problem occurs without any error in the controller log, no
IO error, no disk time out, no bad block, nothing. So far I was pretty
confident about the Adaptec firmware being the culprit, I'm not so sure
now.
> > > I'd start with upgrading the firmware on your RAID controller and
> > > turning the XFS error level up to 11....
> >
> > The firmware is the latest available. How do I turn logging to 11
> > please ?
>
> # echo 11 > /proc/sys/fs/xfs/error_level
>
Thanks done, while running again but *without using lvm* this time. I'm
changing one parameter at a time...
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac at intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
More information about the xfs
mailing list