xfs
[Top] [All Lists]

RE: XFS issue xfs goes offline with various messages drive not recoverab

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: RE: XFS issue xfs goes offline with various messages drive not recoverable without reboot
From: Simon Dray <sdray@xxxxxxxxxx>
Date: Thu, 2 Oct 2014 11:05:14 +0000
Accept-language: en-GB, en-US
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140925081254.GH4758@dastard>
References: <dd6d1d6e9fa7469584e72574347bb088@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20140925081254.GH4758@dastard>
Thread-index: AQHP2JiOlCHFKVSF30yUfVq4ZEQPRJwcrPTw
Thread-topic: XFS issue xfs goes offline with various messages drive not recoverable without reboot
Dave Hi

Not sure if I should send this to you direct or not, I now have further 
information on the issue as per the reporting xfs issues page

I would appreciate any insight you can give on the issue, I believe it to a 
hardware issue but need confirmation either way


Best regards Simon

-----Original Message-----
From: Dave Chinner [mailto:david@xxxxxxxxxxxxx] 
Sent: 25 September 2014 09:13
To: Simon Dray
Cc: xfs@xxxxxxxxxxx
Subject: Re: XFS issue xfs goes offline with various messages drive not 
recoverable without reboot

On Thu, Sep 25, 2014 at 07:30:23AM +0000, Simon Dray wrote:
> Dear Sirs
> 
> I wonder if you can help with an issue we see re-occuring on a regular 
> basis with one of our HP systems which uses a HP 420 Raid controller

tl;dr: more information is needed about your system to make sense of the 
problem. See here:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> Action taken
> 
> We first saw the following:
> [root@ content]# ls
> ls: cannot open directory .: Input/output error

The filesystem has shut down due to a fatal error.

> We try to run:
> [root@ /]# xfs_check /dev/md0
> xfs_check: /dev/md0 contains a mounted and writable filesystem fatal 
> error -- couldn't initialize XFS library

FYI, xfs_check was deprecated a quite a while ago. It no longer exists in 
current releases....

> We also tried to umount the /dev/md0 before runniing xfs_check but no 
> luck. We received the error: device is in use

That can happen if the storage has gone bad and IOs have been lost.

> We use xfs for one of our large raid file systems and we are seeing 
> the xfs filesystem go offline with the following messages in dmesg
> 
> messages-20140921:Sep 18 23:01: kernel: XFS (md0): Device md0: 
> metadata write error block 0x5e28623d8

What messages occurred before this? Something reported an IO error back to XFS, 
and so that something should have logged an error message...

> messages-20140921:Sep 18 23:01:04 kernel: XFS (md0): I/O error 
> occurred: meta-data dev md0 block 0x445cccc40 ("xlog_iodone") error 5 
> buf count 32768 messages-20140921:Sep 18 23:01:04 kernel: XFS (md0): 
> xfs_do_force_shutdown(0x2) called from line 891 of file 
> fs/xfs/xfs_log.c. Return address = 0xffffffffa2c428dc messages-20140921:Sep 
> 18 23:01:04 kernel: XFS (md0): Log I/O Error Detected. Shutting down 
> filesystem messages-20140921:Sep 18 23:01:04 kernel: XFS (md0): Please umount 
> the filesystem and rectify the problem(s) messages-20140921:Sep 18 23:01:04 
> kernel: XFS (md0): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> messages-20140921:Sep 18 23:01:04 kernel: XFS (md0): xfs_iunlink_remove: 
> xfs_itobp() returned error 5.

Yup, kernel code is at least 2 years old, because we removed xfs_itobp in mid 
2012... ;)

> In all occurrences the only way to recover from this is to reboot the 
> system and allow xfs_repair to run during boot this clears the issue 
> until next time
> 
> We have checked the RAID health and nothing seems to be amiss, if you 
> could help with this it would be much appreciated

That's par for the course when hardware raid goes AWOL - they almost never 
report that they had a problem when they hang (e.g. firmware crashes so can't 
log an event to say it crashed).

But really, more information about your system and more complete logs are 
needed to be able to make any progress triaging the problem.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

Attachment: logs.txt
Description: logs.txt

<Prev in Thread] Current Thread [Next in Thread>