On 05/05/2015 04:13 AM, Yujian Peng wrote:
Emmanuel Florac <eflorac@...> writes:
Le Mon, 4 May 2015 07:00:32 +0000 (UTC)
Yujian Peng <pengyujian5201314 <at> 126.com> Ãcrivait:
I'm encountering a data disaster. I have a ceph cluster with 145 osd.
The data center had a power problem yesterday, and all of the ceph
nodes were down. But now I find that 6 disks(xfs) in 4 nodes have
data corruption. Some disks are unable to mount, and some disks have
IO errors in syslog. mount: Structure needs cleaning
xfs_log_forece: error 5 returned
I tried to repair one with xfs_repair -L /dev/sdx1, but the ceph-osd
reported a leveldb error:
Error initializing leveldb: Corruption: checksum mismatch
I cannot start the 6 osds and 22 pgs is down.
This is really a tragedy for me. Can you give me some idea to
recovery the xfs? Thanks very much!
For XFS problems, ask the XFS ML: xfs <at> oss.sgi.com
You didn't give enough details, by far. What version of kernel and
distro are you running? If there were errors, please post extensive
logs. If you have IO errors on some disks, you probably MUST replace
them before going any further.
Why did you run xfs_repair -L ? Did you try xfs_repair without options
first? Were you running the very very latest version of xfs_repair
The OS is ubuntu 12.04.5 with kernel 3.13.0
Linux ceph19 3.13.0-32-generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 12.04.5 LTS \n \l
xfs_repair version 3.1.7
I've tried xfs_repair without options, but it showed me some errors, so I
used the -L option.
Thanks for your reply!
Responding quickly to a couple of things:
* xfs_repair -L wipes out the XFS log, not normally a good thing to do
* replacing disks with IO errors is not a great idea if you still need that
data. You might want to copy the data from that disk to a new disk (same or
greater size) and then try to repair that new disk. A lot depends on the type
of IO error you see - you might have cable issues, HBA issues, or fairly normal
read issues (which are not worth replacing a disk for).
You should work with your vendor's support team if you have a support contract
or post the the XFS devel list (copied above) for help.