XFS corrupt after RAID failure and resync

David Raffelt david.raffelt at florey.edu.au
Tue Jan 6 17:47:00 CST 2015


Hi Brain,
Below is the root inode data. I'm currently running xfs_metadump and will
send you a link to the file.
Cheers!
David




xfs_db> sb
xfs_db> p rootino
rootino = 1024
xfs_db> inode 1024
xfs_db> p
core.magic = 0
core.mode = 0
core.version = 0
core.format = 0 (dev)
core.uid = 0
core.gid = 0
core.flushiter = 0
core.atime.sec = Thu Jan  1 10:00:00 1970
core.atime.nsec = 000000000
core.mtime.sec = Thu Jan  1 10:00:00 1970
core.mtime.nsec = 000000000
core.ctime.sec = Thu Jan  1 10:00:00 1970
core.ctime.nsec = 000000000
core.size = 0
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 0 (dev)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 0
next_unlinked = 0
u.dev = 0


On 7 January 2015 at 10:16, Brian Foster <bfoster at redhat.com> wrote:

> On Wed, Jan 07, 2015 at 07:34:37AM +1100, David Raffelt wrote:
> > Hi Brian and Stefan,
> > Thanks for your reply.  I checked the status of the array after the
> rebuild
> > (and before the reset).
> >
> > md0 : active raid6 sdd1[8] sdc1[4] sda1[3] sdb1[7] sdi1[5] sde1[1]
> >       14650667520 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6]
> > [UUUUUU_]
> >
> > However given that I've never had any problems before with mdadm
> rebuilds I
> > did not think to check the data before rebooting.  Note that the array is
> > still in this state. Before the reboot I tried to run a smartctl check on
> > the failed drives and it could not read them. When I rebooted I did not
> > actually replace any drives, I just power cycled to see if I could
> > re-access the drives that were thrown out of the array. According to
> > smartctl they are completely fine.
> >
> > I guess there is no way I can re-add the old drives and remove the newly
> > synced drive?  Even though I immediately kicked all users off the system
> > when I got the mdadm alert, it's possible a small amount of data was
> > written to the array during the resync.
> >
> > It looks like the filesystem was not unmounted properly before reboot:
> > Jan 06 09:11:54 server systemd[1]: Failed unmounting /export/data.
> > Jan 06 09:11:54 server systemd[1]: Shutting down.
> >
> > Here is the mount errors in the log after rebooting:
> > Jan 06 09:15:17 server kernel: XFS (md0): Mounting Filesystem
> > Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount
> and
> > run xfs_repair
> > Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount
> and
> > run xfs_repair
> > Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount
> and
> > run xfs_repair
> > Jan 06 09:15:17 server kernel: XFS (md0): metadata I/O error: block 0x400
> > ("xfs_trans_read_buf_map") error 117 numblks 16
> > Jan 06 09:15:17 server kernel: XFS (md0): xfs_imap_to_bp:
> > xfs_trans_read_buf() returned error 117.
> > Jan 06 09:15:17 server kernel: XFS (md0): failed to read root inode
> >
>
> So it fails to read the root inode. You could also try to read said
> inode via xfs_db (e.g., 'sb,' 'p rootino,' 'inode <ino#>,' 'p') and see
> what it shows.
>
> Are you able to run xfs_metadump against the fs? If so and you're
> willing/able to make the dump available somewhere (compressed), I'd be
> interested to take a look to see what might be causing the difference in
> behavior between repair and xfs_db.
>
> Brian
>
> > xfs_repair -n -L also complains about a bad magic number.
> >
> > Unfortunately this 15TB RAID was part of a 45TB GlusterFS distributed
> > volume. It was only ever meant to be a scratch drive for intermediate
> > scientific results, however inevitably most users used it to store lots
> of
> > data. Oh well.
> >
> > Thanks again,
> > Dave
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 6 January 2015 at 23:47, Brian Foster <bfoster at redhat.com> wrote:
> >
> > > On Tue, Jan 06, 2015 at 05:12:14PM +1100, David Raffelt wrote:
> > > > Hi again,
> > > > Some more information.... the kernel log show the following errors
> were
> > > > occurring after the RAID recovery, but before I reset the server.
> > > >
> > >
> > > By after the raid recovery, you mean after the two drives had failed
> out
> > > and 1 hot spare was activated and resync completed? It certainly seems
> > > like something went wrong in this process. The output below looks like
> > > it's failing to read in some inodes. Is there any stack trace output
> > > that accompanies these error messages to confirm?
> > >
> > > I suppose I would try to verify that the array configuration looks
> sane,
> > > but after the hot spare resync and then one or two other drive
> > > replacements (was the hot spare ultimately replaced?), it's hard to say
> > > whether it might be recoverable.
> > >
> > > Brian
> > >
> > > > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected.
> Unmount
> > > and
> > > > run xfs_repair
> > > > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected.
> Unmount
> > > and
> > > > run xfs_repair
> > > > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected.
> Unmount
> > > and
> > > > run xfs_repair
> > > > Jan 06 00:00:27 server kernel: XFS (md0): metadata I/O error: block
> > > > 0x36b106c00 ("xfs_trans_read_buf_map") error 117 numblks 16
> > > > Jan 06 00:00:27 server kernel: XFS (md0): xfs_imap_to_bp:
> > > > xfs_trans_read_buf() returned error 117.
> > > >
> > > >
> > > > Thanks,
> > > > Dave
> > >
> > > > _______________________________________________
> > > > xfs mailing list
> > > > xfs at oss.sgi.com
> > > > http://oss.sgi.com/mailman/listinfo/xfs
> > >
> > >
> >
> >
> > --
> > *David Raffelt (PhD)*
> > Postdoctoral Fellow
> >
> > The Florey Institute of Neuroscience and Mental Health
> > Melbourne Brain Centre - Austin Campus
> > 245 Burgundy Street
> > Heidelberg Vic 3084
> > Ph: +61 3 9035 7024
> > www.florey.edu.au
>
> > _______________________________________________
> > xfs mailing list
> > xfs at oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
>


-- 
*David Raffelt (PhD)*
Postdoctoral Fellow

The Florey Institute of Neuroscience and Mental Health
Melbourne Brain Centre - Austin Campus
245 Burgundy Street
Heidelberg Vic 3084
Ph: +61 3 9035 7024
www.florey.edu.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150107/e1a0f3e1/attachment-0001.html>


More information about the xfs mailing list