[Top] [All Lists]

Partially corrupted raid array beneath xfs

To: xfs@xxxxxxxxxxx
Subject: Partially corrupted raid array beneath xfs
From: Christopher Evans <christophere@xxxxxxxxxxxxxxxxx>
Date: Tue, 24 Jan 2012 09:59:09 -0800
I made a mistake by recreating a raid 6 array, instead of taking the proper steps to rebuild it. Is there a way I can get find out which directories, files are/might be corrupted if 64k blocks of data offset every 21 times for an unknown count. Unfortunetly I've already mounted the raid array and have gotten xfs errors because of the corrupted data beneath it.

OS: Centos 5.5 64bit 2.6.18-194.el5
kmod-xfs: 0.4-2
xfsprogs: 2.9.4-1.el5.centos

I ran mdadm --create /dev/md0 --level=6 --raid-devices=23 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx, and then 5 minutes later set the drive I replaces as faulty with mdadm --manage --set-faulty /dev/md0 /dev/sdm. This should result in 64k every 21 times being random data, for 5 minutes worth of raid rebuild ( which took ~20-30 hours to rebuild ).

In my testing with a vm with 4 drives in raid 6, I believe I only corrupted the first 5 minutes of raid rebuild. After I create a raid 6 array, I would dd if=/dev/zero of=/dev/md0. Then I would set a data drive to faultly, and remove it. Running hexdump on it would result in all zeros. To give different data I would dd if=/dev/urandom of=/dev/removed_drive. When I recreate the array, it would recognize that three of the drives had been in an array already and ask if I want to continue. Since I said yes it would use the data that would be for the data drives it seems. If I then set the drive with randomized data to faulty during the rebuild, it would seem to continue the rebuild as if the drive was failed/missing. When I would add the drive back, it would rebuild the array again. The beginning of the corrupted drive would still show random data, but data further down the disk would show zeros.
<Prev in Thread] Current Thread [Next in Thread>