[Top] [All Lists]

Re: Problem recovering XFS filesystem

To: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Subject: Re: Problem recovering XFS filesystem
From: Aaron Williams <aaron.w2@xxxxxxxxx>
Date: Fri, 27 Apr 2012 19:04:48 -0700
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=haAz1JNZ4JS75PIGAbXbr0F6dPdLiTOHvW4JyFD7iMM=; b=fEwp0QvGQqimAsxG3nVauZMBuYKoxslPPSjMFPCVxB9aaimJsf6Kj6QkywvGoY/4v3 exJFIE1cwdqZMvkteo/Ld/qEEaw3IPai9Ge3QW8974aYTX+NGQsBIYaDPI+Nm5y1dQBC NeQ6NO2/3Ntn53dd5azgabHqb0HJRiR2+PBkbGzORi4ghsVRgnMP/PwKvLiT/v8TnsTN omDHo+Alt+LZmvwsDwDjl83IsdXGtIva9Ydgoi7dJG1AGbibCAH6GdBKedJ9LHP+dS1m wcW6o32nwQqMkQTxG2OGTvaPMueJWG9fcqiFWfvL0wS/MtgY/RLgTFOzp1sgcj7Zc+0v hQxg==
In-reply-to: <18818650.cGNyynGa9I@saturn>
References: <CAK6JqP3ze2pocsgoKUTZx_J6w-Zc9V=StqDnqG0Gx0v5hw=FGQ@xxxxxxxxxxxxxx> <18818650.cGNyynGa9I@saturn>

On Fri, Apr 27, 2012 at 2:31 PM, Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx> wrote:
Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams:
> I was able to recover the filesystem.

So your RAID busted the filesystem. Maybe the devs could want an
xfs_metadump of the FS before your repair, so they can inspect it and
improve xfs_repair.

Hi Michael,

It appears that way, or it may be the fact that I mounted with nobarrier and in the process of recovering the RAID the information in the battery-backed RAID cache got blown away. I have an Areca ARC-1210 controller that was in the process of rebuilding when I attempted to shut down and reboot my Linux system after I mistakenly unplugged the wrong drive from my RAID array. I had another drive fail on me and it had completed rebuilding itself using a hot spare drive. I intended to remove the bad drive to replace it but disconnected the wrong drive. After reconnecting the good drive it went on to start rebuilding itself again. At this point I decided it might be safer to shut down Linux to replace the drive and thought the RAID controller would pick up where it left off in rebuilding.

Linux did not shut down all the way however. I don't know if it was waiting for the array to rebuild itself or if something else happened. Anyway, I eventually hit the reset button. The RAID BIOS reported it could not find the array and I had to go about rebuilding the array. I also did a volume check which found about 70,000 blocks that it repaired.  Needless to say I was quite nervous.

Once that was done Linux refused to mount the XFS partition, I think due to corruption in the log.

I have an image of my pre-repaired filesystem by using dd and can try and do a meta dump. The filesystem is 1.9TB in size with about 1.2TB of data in use.

It looks like I was able to recover everything fine after blowing away the log. I see a bunch of files recovered in lost+found but those all appear to be files like cached web pages, etc.

I also dumped the log to a file (128M).

So far it looks like any actual data loss is minimal (thankfully) and was a good wakeup call to start doing more frequent backups.

I also upgraded xfsprogs from 3.1.6-2.1.2 to 3.1.8 which did a much better job at recovery than my previous attempt.

It would be nice if xfs_db would allow me to continue when the log is dirty instead of requiring me to mount the filesystem first. It also would be nice if xfs_logprint could try and identify the filenames of the inodes involved.

I understand that there are plans to update XFS to include the UID in all of the on-disk structures. Any idea on when this will happen?

<Prev in Thread] Current Thread [Next in Thread>