[Top] [All Lists]

Re: What to do when... xfs_repair hangs?

To: Dave Chinner <david@xxxxxxxxxxxxx>, Sean Caron <scaron@xxxxxxxxx>
Subject: Re: What to do when... xfs_repair hangs?
From: Sean Caron <scaron@xxxxxxxxx>
Date: Mon, 2 Jun 2014 14:32:17 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140601224825.GP14410@dastard>
References: <CAA43vkVzWRTqNQh2VSi5yvFLtstmVOKRJUnYw_ZSkYJGsex8Uw@xxxxxxxxxxxxxx> <20140531000117.GM6677@dastard> <CAA43vkUBF3q-F6XYTPTVx43KXg_3_COgaK8wtHxRynshOT9smg@xxxxxxxxxxxxxx> <20140601224825.GP14410@dastard>
I tried re-running the version that came with Ubuntu 12.04 LTS and it very consistently segfaults at that point... so I went and pulled a copy of the most recent source from Git and I'm trying xfs_repair 3.2.0 now. I'll see how that goes (it'll probably take a day or two to run; 450 TB volume) and report back. Thanks everyone for the suggestions and feedback so far.



On Sun, Jun 1, 2014 at 6:48 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Sun, Jun 01, 2014 at 12:21:55PM -0400, Sean Caron wrote:
> Sorry, all, I was a little out-of-it on Friday afternoon, of course I had
> kicked off xfs_repair actually in the background with all output sent to a
> file, and I was just doing 'tail -f' on that file.
> So I kill the 'tail -f' and jump back to the command line, it appears that
> xfs_repair segfaulted and died.
> That line of text:
> disconnected inode 1109099673,
> was indeed the last thing that it printed before it crashed.
> If I look in dmesg, I just see -
> xfs_repair[6770]: segfault at 28 ip 000000000042307b sp 00007fffef61bad0
> error 4 in xfs_repair[400000+72000]
> and that's it.
> I checked with 'df' and there's plenty of space everywhere; I don't see why
> it would have faulted out trying to connect something to lost+found.
> Underlying storage should be good; this is basically a RAID 60 built on top
> of a bunch of JBODs with LSI SAS9200 cards. MD sees all strings as started
> and running OK; no problems getting the array assembled at all.
> Since Dave is saying it's OK to try re-running xfs_repair; it'll just pick
> up where it left off; let me give it another pass and see if it manages to
> complete, or if it segfaults out again. I guess it it poops out a second
> time, maybe we'll just want to consider rebuilding the filesystem and
> restoring from our copies?

You should update to the latest version of xfs_repair first (3.2.0).
If that still crashes, running xfs-repair under gdb to get a stack
trace would be a good start, or sending me a metadump image so I can
reproduce the crash myself would be even better...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>