[Top] [All Lists]

Re: What to do when... xfs_repair hangs?

To: Sean Caron <scaron@xxxxxxxxx>
Subject: Re: What to do when... xfs_repair hangs?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 2 Jun 2014 08:48:25 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CAA43vkUBF3q-F6XYTPTVx43KXg_3_COgaK8wtHxRynshOT9smg@xxxxxxxxxxxxxx>
References: <CAA43vkVzWRTqNQh2VSi5yvFLtstmVOKRJUnYw_ZSkYJGsex8Uw@xxxxxxxxxxxxxx> <20140531000117.GM6677@dastard> <CAA43vkUBF3q-F6XYTPTVx43KXg_3_COgaK8wtHxRynshOT9smg@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sun, Jun 01, 2014 at 12:21:55PM -0400, Sean Caron wrote:
> Sorry, all, I was a little out-of-it on Friday afternoon, of course I had
> kicked off xfs_repair actually in the background with all output sent to a
> file, and I was just doing 'tail -f' on that file.
> So I kill the 'tail -f' and jump back to the command line, it appears that
> xfs_repair segfaulted and died.
> That line of text:
> disconnected inode 1109099673,
> was indeed the last thing that it printed before it crashed.
> If I look in dmesg, I just see -
> xfs_repair[6770]: segfault at 28 ip 000000000042307b sp 00007fffef61bad0
> error 4 in xfs_repair[400000+72000]
> and that's it.
> I checked with 'df' and there's plenty of space everywhere; I don't see why
> it would have faulted out trying to connect something to lost+found.
> Underlying storage should be good; this is basically a RAID 60 built on top
> of a bunch of JBODs with LSI SAS9200 cards. MD sees all strings as started
> and running OK; no problems getting the array assembled at all.
> Since Dave is saying it's OK to try re-running xfs_repair; it'll just pick
> up where it left off; let me give it another pass and see if it manages to
> complete, or if it segfaults out again. I guess it it poops out a second
> time, maybe we'll just want to consider rebuilding the filesystem and
> restoring from our copies?

You should update to the latest version of xfs_repair first (3.2.0).
If that still crashes, running xfs-repair under gdb to get a stack
trace would be a good start, or sending me a metadump image so I can
reproduce the crash myself would be even better...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>