XFS corrupt after RAID failure and resync
David Raffelt
david.raffelt at florey.edu.au
Tue Jan 6 14:34:37 CST 2015
Hi Brian and Stefan,
Thanks for your reply. I checked the status of the array after the rebuild
(and before the reset).
md0 : active raid6 sdd1[8] sdc1[4] sda1[3] sdb1[7] sdi1[5] sde1[1]
14650667520 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6]
[UUUUUU_]
However given that I've never had any problems before with mdadm rebuilds I
did not think to check the data before rebooting. Note that the array is
still in this state. Before the reboot I tried to run a smartctl check on
the failed drives and it could not read them. When I rebooted I did not
actually replace any drives, I just power cycled to see if I could
re-access the drives that were thrown out of the array. According to
smartctl they are completely fine.
I guess there is no way I can re-add the old drives and remove the newly
synced drive? Even though I immediately kicked all users off the system
when I got the mdadm alert, it's possible a small amount of data was
written to the array during the resync.
It looks like the filesystem was not unmounted properly before reboot:
Jan 06 09:11:54 server systemd[1]: Failed unmounting /export/data.
Jan 06 09:11:54 server systemd[1]: Shutting down.
Here is the mount errors in the log after rebooting:
Jan 06 09:15:17 server kernel: XFS (md0): Mounting Filesystem
Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and
run xfs_repair
Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and
run xfs_repair
Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and
run xfs_repair
Jan 06 09:15:17 server kernel: XFS (md0): metadata I/O error: block 0x400
("xfs_trans_read_buf_map") error 117 numblks 16
Jan 06 09:15:17 server kernel: XFS (md0): xfs_imap_to_bp:
xfs_trans_read_buf() returned error 117.
Jan 06 09:15:17 server kernel: XFS (md0): failed to read root inode
xfs_repair -n -L also complains about a bad magic number.
Unfortunately this 15TB RAID was part of a 45TB GlusterFS distributed
volume. It was only ever meant to be a scratch drive for intermediate
scientific results, however inevitably most users used it to store lots of
data. Oh well.
Thanks again,
Dave
On 6 January 2015 at 23:47, Brian Foster <bfoster at redhat.com> wrote:
> On Tue, Jan 06, 2015 at 05:12:14PM +1100, David Raffelt wrote:
> > Hi again,
> > Some more information.... the kernel log show the following errors were
> > occurring after the RAID recovery, but before I reset the server.
> >
>
> By after the raid recovery, you mean after the two drives had failed out
> and 1 hot spare was activated and resync completed? It certainly seems
> like something went wrong in this process. The output below looks like
> it's failing to read in some inodes. Is there any stack trace output
> that accompanies these error messages to confirm?
>
> I suppose I would try to verify that the array configuration looks sane,
> but after the hot spare resync and then one or two other drive
> replacements (was the hot spare ultimately replaced?), it's hard to say
> whether it might be recoverable.
>
> Brian
>
> > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount
> and
> > run xfs_repair
> > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount
> and
> > run xfs_repair
> > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount
> and
> > run xfs_repair
> > Jan 06 00:00:27 server kernel: XFS (md0): metadata I/O error: block
> > 0x36b106c00 ("xfs_trans_read_buf_map") error 117 numblks 16
> > Jan 06 00:00:27 server kernel: XFS (md0): xfs_imap_to_bp:
> > xfs_trans_read_buf() returned error 117.
> >
> >
> > Thanks,
> > Dave
>
> > _______________________________________________
> > xfs mailing list
> > xfs at oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>
--
*David Raffelt (PhD)*
Postdoctoral Fellow
The Florey Institute of Neuroscience and Mental Health
Melbourne Brain Centre - Austin Campus
245 Burgundy Street
Heidelberg Vic 3084
Ph: +61 3 9035 7024
www.florey.edu.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150107/74110dd9/attachment-0001.html>
More information about the xfs
mailing list