On Fri, May 29, 2015 at 03:03:57PM +0100, Mike Grant wrote:
> We recently had a 180TB XFS filesystem go down after following some
> ill-considered advice from a Dell tech (re-onlining a maybe-failed disk,
> which one might think was ok..). It's not irreplaceable data, but
> xfs_repair segfaults when trying to fix up and I thought that might be
> of interest here to help fix the segfault. We're not expecting to
> recover the data, though it would be nice.
> Partial logs & backtraces of xfs_repair runs using the latest Centos-7
> xfsprogs package and also run with the xfs_repair built from the git
> master, copies of core dumps and a metadump are at:
Given it is choking on directory corruption repair, I'd strong
recommend trying the current git version (3.2.3-rc1) here:
> Maximum memory use was only about 1GB by the time of the crash, and
> there was 120GB+ of swap available, so I don't think that was an issue.
> The command was "xfs_repair -v /dev/md0 -t 60 -P".
> Run time is about 2 hours to a crash and we'll probably want to wipe and
Probably because you turned off prefetch, which makes it *slow*. :P
I'd build the new xfsprogs, restore the metadump to a file on a
different machine, and then run the new xfs_repair binary on the
restored metadump image. That will tell you pretty quickly if the
problem is solved. If it is solved, then you can run the new
xfs_repair on the real server.
Just remember, though, that even once the FS has been repaired,
you'll still have to search for data corruption manually and deal