[Top] [All Lists]

Re: [ceph-users] xfs corruption, data disaster!

To: ceph-users@xxxxxxxxxxxxxx, Linux fs XFS <xfs@xxxxxxxxxxx>
Subject: Re: [ceph-users] xfs corruption, data disaster!
From: Ric Wheeler <rwheeler@xxxxxxxxxx>
Date: Mon, 11 May 2015 17:47:59 +0300
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <loom.20150505T030824-422@xxxxxxxxxxxxxx>
References: <loom.20150504T085721-88@xxxxxxxxxxxxxx> <20150504161912.6ff8621b@xxxxxxxxxxxxxxxxxxxx> <loom.20150505T030824-422@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
On 05/05/2015 04:13 AM, Yujian Peng wrote:
Emmanuel Florac <eflorac@...> writes:

Le Mon, 4 May 2015 07:00:32 +0000 (UTC)
Yujian Peng <pengyujian5201314 <at> 126.com> Ãcrivait:

I'm encountering a data disaster. I have a ceph cluster with 145 osd.
The data center had a power problem yesterday, and all of the ceph
nodes were down. But now I find that 6 disks(xfs) in 4 nodes have
data corruption. Some disks are unable to mount, and some disks have
IO errors in syslog. mount: Structure needs cleaning
        xfs_log_forece: error 5 returned
I tried to repair one with xfs_repair -L /dev/sdx1, but the ceph-osd
reported a leveldb error:
        Error initializing leveldb: Corruption: checksum mismatch
I cannot start the 6 osds and 22 pgs is down.
This is really a tragedy for me. Can you give me some idea to
recovery the xfs? Thanks very much!
For XFS problems, ask the XFS ML: xfs <at> oss.sgi.com

You didn't give enough details, by far. What version of kernel and
distro are you running? If there were errors, please post extensive
logs. If you have IO errors on some disks, you probably MUST replace
them before going any further.

Why did you run xfs_repair -L ? Did you try xfs_repair without options
first? Were you running the very very latest version of xfs_repair
(3.2.2) ?

The OS is ubuntu 12.04.5 with kernel 3.13.0
uname -a
Linux ceph19 3.13.0-32-generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/issue
Ubuntu 12.04.5 LTS \n \l
xfs_repair -V
xfs_repair version 3.1.7
I've tried xfs_repair without options, but it showed me some errors, so I
used the -L option.
Thanks for your reply!

Responding quickly to a couple of things:

* xfs_repair -L wipes out the XFS log, not normally a good thing to do

* replacing disks with IO errors is not a great idea if you still need that data. You might want to copy the data from that disk to a new disk (same or greater size) and then try to repair that new disk. A lot depends on the type of IO error you see - you might have cable issues, HBA issues, or fairly normal read issues (which are not worth replacing a disk for).

You should work with your vendor's support team if you have a support contract or post the the XFS devel list (copied above) for help.

Good luck!


<Prev in Thread] Current Thread [Next in Thread>