<div dir="ltr">Hi Brian and Stefan,<div>Thanks for your reply. I checked the status of the array after the rebuild (and before the reset). </div><div><br></div><div><div>md0 : active raid6 sdd1[8] sdc1[4] sda1[3] sdb1[7] sdi1[5] sde1[1]</div><div> 14650667520 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [UUUUUU_]</div><div> </div><div>However given that I've never had any problems before with mdadm rebuilds I did not think to check the data before rebooting. Note that the array is still in this state. Before the reboot I tried to run a smartctl check on the failed drives and it could not read them. When I rebooted I did not actually replace any drives, I just power cycled to see if I could re-access the drives that were thrown out of the array. According to smartctl they are completely fine. </div><div><br></div><div>I guess there is no way I can re-add the old drives and remove the newly synced drive? Even though I immediately kicked all users off the system when I got the mdadm alert, it's possible a small amount of data was written to the array during the resync. </div><div><br></div><div>It looks like the filesystem was not unmounted properly before reboot:<br></div></div><div>Jan 06 09:11:54 server systemd[1]: Failed unmounting /export/data.<br></div><div>Jan 06 09:11:54 server systemd[1]: Shutting down.<br></div><div><br></div><div>Here is the mount errors in the log after rebooting:</div><div>Jan 06 09:15:17 server kernel: XFS (md0): Mounting Filesystem<br></div><div>Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and run xfs_repair<br></div><div>Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and run xfs_repair<br></div><div><div>Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and run xfs_repair</div></div><div>Jan 06 09:15:17 server kernel: XFS (md0): metadata I/O error: block 0x400 ("xfs_trans_read_buf_map") error 117 numblks 16<br></div><div>Jan 06 09:15:17 server kernel: XFS (md0): xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.<br></div><div><div>Jan 06 09:15:17 server kernel: XFS (md0): failed to read root inode</div><div><br></div><div>xfs_repair -n -L also complains about a bad magic number. <br></div><div><br></div><div>Unfortunately this 15TB RAID was part of a 45TB GlusterFS distributed volume. It was only ever meant to be a scratch drive for intermediate scientific results, however inevitably most users used it to store lots of data. Oh well. </div><div><br></div><div>Thanks again,</div><div>Dave</div><div><br></div><div><br></div></div><div><br></div><div><br></div><div><br></div><div><div><br></div><div><br></div></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 6 January 2015 at 23:47, Brian Foster <span dir="ltr"><<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Tue, Jan 06, 2015 at 05:12:14PM +1100, David Raffelt wrote:<br>
> Hi again,<br>
> Some more information.... the kernel log show the following errors were<br>
> occurring after the RAID recovery, but before I reset the server.<br>
><br>
<br>
</span>By after the raid recovery, you mean after the two drives had failed out<br>
and 1 hot spare was activated and resync completed? It certainly seems<br>
like something went wrong in this process. The output below looks like<br>
it's failing to read in some inodes. Is there any stack trace output<br>
that accompanies these error messages to confirm?<br>
<br>
I suppose I would try to verify that the array configuration looks sane,<br>
but after the hot spare resync and then one or two other drive<br>
replacements (was the hot spare ultimately replaced?), it's hard to say<br>
whether it might be recoverable.<br>
<span class="HOEnZb"><font color="#888888"><br>
Brian<br>
</font></span><span class="im HOEnZb"><br>
> Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount and<br>
> run xfs_repair<br>
> Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount and<br>
> run xfs_repair<br>
> Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount and<br>
> run xfs_repair<br>
> Jan 06 00:00:27 server kernel: XFS (md0): metadata I/O error: block<br>
> 0x36b106c00 ("xfs_trans_read_buf_map") error 117 numblks 16<br>
> Jan 06 00:00:27 server kernel: XFS (md0): xfs_imap_to_bp:<br>
> xfs_trans_read_buf() returned error 117.<br>
><br>
><br>
> Thanks,<br>
> Dave<br>
<br>
</span><div class="HOEnZb"><div class="h5">> _______________________________________________<br>
> xfs mailing list<br>
> <a href="mailto:xfs@oss.sgi.com">xfs@oss.sgi.com</a><br>
> <a href="http://oss.sgi.com/mailman/listinfo/xfs" target="_blank">http://oss.sgi.com/mailman/listinfo/xfs</a><br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><b><font color="#ff6600">David Raffelt (PhD)</font></b></div><div><font color="#ff6600">Postdoctoral Fellow</font></div><div><br></div><div>The Florey Institute of Neuroscience and Mental Health</div><div>Melbourne Brain Centre - Austin Campus</div><div>245 Burgundy Street</div><div>Heidelberg Vic 3084<div>Ph: <a value="+61390357024">+61 3 9035 7024</a></div></div><div><a value="+61390357024">www.florey.edu.au</a></div></div></div>
</div>