xfs
[Top] [All Lists]

Re: easily reproducible filesystem crash on rebuilding array

To: Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>
Subject: Re: easily reproducible filesystem crash on rebuilding array
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 16 Dec 2014 07:10:36 +1100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20141215132500.13210fdb@xxxxxxxxxxxxxxxxxxxx>
References: <20141211123936.1f3d713d@xxxxxxxxxxxxxxxxxxxx> <20141215130715.4dfaaa8e@xxxxxxxxxxxxxxxxxxxx> <20141215132500.13210fdb@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Dec 15, 2014 at 01:25:00PM +0100, Emmanuel Florac wrote:
> Le Mon, 15 Dec 2014 13:07:15 +0100
> Emmanuel Florac <eflorac@xxxxxxxxxxxxxx> écrivait:
> 
> > Dec 12 00:40:18 TEST-ADAPTEC kernel: XFS (dm-0):
> > xfs_do_force_shutdown(0x1) called from line 383 of file
> > fs/xfs/xfs_trans_buf.c.  Return address = 0xffffffff8125cc90
> > Dec 12 00:40:31 TEST-ADAPTEC kernel: XFS (dm-0): xfs_log_force: error
> > 5 returned.
> > Dec 12 00:41:02 TEST-ADAPTEC kernel: XFS (dm-0): xfs_log_force: error
> > 5 returned.
> > 
> 
> Reading the source I see that the error occured in xfs_buf_read_map, I
> suppose it's when xfsbufd tries to scan dirty metadata?

a) we don't have an xfsbufd anymore, and b) the xfsbufd never
"scanned" or read metadata - it only wrote dirty buffers back to
disk.

> This is a read
> error, so it could very well be a simple IO starvation at the controller
> level (as the controller probably gives priority to whatever writes are
> pending over reads).

The controller is broken if it's returning EIO to reads when it
is busy.

> Maybe setting xfsbufd_centisecs to the max could help here?

Deprecated Sysctls
==================

  fs.xfs.xfsbufd_centisecs      (Min: 50  Default: 100  Max: 3000)
        Dirty metadata is now tracked by the log subsystem and
        flushing is driven by log space and idling demands. The
        xfsbufd no longer exists, so this syctl does nothing.

        Due for removal in 3.14.

Seems like the removal patch is overdue....

> Trying
> right away... Any advice welcome.

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

I'd start with upgrading the firmware on your RAID controller and
turning the XFS error level up to 11....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>