| To: | Brian Foster <bfoster@xxxxxxxxxx>, stefanrin@xxxxxxxxx |
|---|---|
| Subject: | Re: XFS corrupt after RAID failure and resync |
| From: | David Raffelt <david.raffelt@xxxxxxxxxxxxx> |
| Date: | Wed, 7 Jan 2015 07:34:37 +1100 |
| Cc: | "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx> |
| Delivered-to: | xfs@xxxxxxxxxxx |
| In-reply-to: | <44b127de199c445fa12c3b832a05f108@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> |
| References: | <CAOFq7B5PaPCJdAxyYa6feCXgGbkz+1Qs+Gfb2WG=5af=A+WOQg@xxxxxxxxxxxxxx> <44b127de199c445fa12c3b832a05f108@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> |
|
Hi Brian and Stefan, Thanks for your reply. I checked the status of the array after the rebuild (and before the reset). md0 : active raid6 sdd1[8] sdc1[4] sda1[3] sdb1[7] sdi1[5] sde1[1]    14650667520 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [UUUUUU_]   Â However given that I've never had any problems before with mdadm rebuilds I did not think to check the data before rebooting. Note that the array is still in this state. Before the reboot I tried to run a smartctl check on the failed drives and it could not read them. When I rebooted I did not actually replace any drives, I just power cycled to see if I could re-access the drives that were thrown out of the array. According to smartctl they are completely fine. I guess there is no way I can re-add the old drives and remove the newly synced drive? Even though I immediately kicked all users off the system when I got the mdadm alert, it's possible a small amount of data was written to the array during the resync. It looks like the filesystem was not unmounted properly before reboot: Jan 06 09:11:54ÂserverÂsystemd[1]: Failed unmounting /export/data. Jan 06 09:11:54ÂserverÂsystemd[1]: Shutting down. Here is the mount errors in the log after rebooting: Jan 06 09:15:17 server kernel: XFS (md0): Mounting Filesystem Jan 06 09:15:17ÂserverÂkernel: XFS (md0): Corruption detected. Unmount and run xfs_repair Jan 06 09:15:17ÂserverÂkernel: XFS (md0): Corruption detected. Unmount and run xfs_repair Jan 06 09:15:17ÂserverÂkernel: XFS (md0): Corruption detected. Unmount and run xfs_repair Jan 06 09:15:17ÂserverÂkernel: XFS (md0): metadata I/O error: block 0x400 ("xfs_trans_read_buf_map") error 117 numblks 16 Jan 06 09:15:17ÂserverÂkernel: XFS (md0): xfs_imap_to_bp: xfs_trans_read_buf() returned error 117. Jan 06 09:15:17ÂserverÂkernel: XFS (md0): failed to read root inode xfs_repair -n -L also complains about a bad magic number. Unfortunately this 15TB RAID was part of a 45TB GlusterFS distributed volume. It was only ever meant to be a scratch drive for intermediate scientific results, however inevitably most users used it to store lots of data. Oh well. Thanks again, Dave On 6 January 2015 at 23:47, Brian Foster <bfoster@xxxxxxxxxx> wrote: On Tue, Jan 06, 2015 at 05:12:14PM +1100, David Raffelt wrote: David Raffelt (PhD) Postdoctoral Fellow The Florey Institute of Neuroscience and Mental Health Melbourne Brain Centre - Austin Campus 245 Burgundy Street Heidelberg Vic 3084 Ph:Â+61 3 9035 7024 |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: What is a recommended XFS sector size for hybrid (512e) advanced format hard drives?, Chris Murphy |
|---|---|
| Next by Date: | Re: [PATCH 0/2] Add support to RENAME_EXCHANGE flag to XFS V9, Dave Chinner |
| Previous by Thread: | Re: XFS corrupt after RAID failure and resync, Brian Foster |
| Next by Thread: | Re: XFS corrupt after RAID failure and resync, Brian Foster |
| Indexes: | [Date] [Thread] [Top] [All Lists] |