[Top] [All Lists]

What to do when... xfs_repair hangs?

To: xfs@xxxxxxxxxxx, Sean Caron <scaron@xxxxxxxxx>
Subject: What to do when... xfs_repair hangs?
From: Sean Caron <scaron@xxxxxxxxx>
Date: Fri, 30 May 2014 15:49:13 -0400
Delivered-to: xfs@xxxxxxxxxxx
Hi all,

Long story short, we have a big array formatted as XFS, we had a machine go down hard maybe a month, month and a half ago... when it came back up, XFS faulted out when we attempted to mount the filesystem; it complained the log was bad or something... I did a dry run of xfs_repair (-L) and it looked pretty bad, so we mounted up the filesystem read-only, ran a backup... I think we got pretty much everything out OK except maybe files that were open at the time of the crash.

Now with a backup in hand, we kicked off xfs_repair "for real"... it ran for a while and did its thing, but now it appears to be stuck at the stage -

- agno = 436
rebuilding directory inode ...
rebuilding directory inode ...
rebuilding directory inode ...
- traversal finished ...
- moving disconected inodes to lost+found ...
disconnected inode 1109099673,

and then it just stops. I don't know how long its been sitting like that, but it hasn't moved in the last hour or two. I assume that's not good...

Interestingly when we ran a dry run of xfs_repair (-L) it got all the way through; it never hung up at any point. Not sure why it would start to hang up, once it gets run "for real".

This machine is in single-user-mode, I have exactly 24 lines of console with no scrollback buffer, no other tty available besides that which I'm running xfs_repair on, the system console.

Running Linux kernel 3.4.61, Ubuntu 12.04 LTS 64-bit with whatever their current xfsprogs is.

This is a bit of an exceptional situation for me; I've never seen xfs_repair just hang outright. I hoped I could maybe get some feedback from the experts here... what should I do?

Try to Control-C out of the xfs_repair and ... re-run it?

Should I just quit wasting time at this point, wipe out the filesystem, reformat, then just start the long process of restoring from the backups?

Original plan was just to run xfs_repair, see what happened and pull from backups as required to fix damage. Perhaps we should just cut to the chase, rebuild, and restore everything? Probably the file system would be ultimately healthier starting from scratch, than what xfs_repair leaves behind?

Any insight would be very much appreciated!


<Prev in Thread] Current Thread [Next in Thread>