[Top] [All Lists]

Re: XFS-filesystem corrupted by defragmentation

To: Robert Brockway <robert@xxxxxxxxxxxxxxxxx>
Subject: Re: XFS-filesystem corrupted by defragmentation
From: Bernhard Gschaider <bgschaid_lists@xxxxxxxxx>
Date: Tue, 13 Apr 2010 17:24:04 +0200
Cc: xfs@xxxxxxxxxxx
In-reply-to: <alpine.DEB.1.10.1004131022140.32213@xxxxxxxxxxxxxxxxxxxx> (Robert Brockway's message of "Tue, 13 Apr 2010 10:58:22 -0400 (EDT)")
Organization: ICE Stroemungsforschung
References: <87r5mjpn8l.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <alpine.DEB.1.10.1004131022140.32213@xxxxxxxxxxxxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.5-b29 (linux)
Thanks for the answer

>>>>> On Tue, 13 Apr 2010 10:58:22 -0400 (EDT)
>>>>> "RB" == Robert Brockway <robert@xxxxxxxxxxxxxxxxx> wrote:

    RB> On Tue, 13 Apr 2010, Bernhard Gschaider wrote:
    >>> xfs_db -r /dev/mapper/VolGroup00-LogVol04
    >> xfs_db: unexpected XFS SB magic number 0x00000000 xfs_db: read
    >> failed: Invalid argument xfs_db: data size check failed
    >> cache_node_purge: refcount was 1, not zero (node=0x2a25c20)
    >> xfs_db: cannot read root inode (22)

    RB> Hi Bernhard.  Hmm that doesn't sound good.

http://oss.sgi.com/archives/xfs/2007-04/msg00580.html suggests a sync
for that kind of situation. Any thoughts on this? I know that there is
no definite answer to his. Only guesses by people who have more
experience than me

    >> The file-system is still mounted and working and I don't dare
    >> to do anything about it (am in a mild state of panic) because I
    >> think it might not come back if I do.

    RB> I think your choice to sit back and evaluate your options
    RB> before acting is a wise one, especially since the filesystem
    RB> is apparently mounted and functioning.

    RB> Depending on how worried you are there are various options
    RB> available.  Eg you could declare an emergency on the server
    RB> and use xfs_freeze to freeze the filesystem while you take a
    RB> backup.  Note - I have never used xfs_freeze like this, it is
    RB> just a suggestion.  Naturally this will cause an outage and
    RB> problems for users.

They'll have to live with that

    RB> Alternatively you could use xfsdump to capture an incremental
    RB> or full backup on the running system. (depending on whether
    RB> you already have a level 0 xfs dump file or not).  The
    RB> developers have confirmed (on this list) that xfsdump will
    RB> provide a consistent backup on a live filesystem.

    RB> Please note that any heavy I/O (like a backup) has the
    RB> potential to cause problems on a sick filesystem.  In my
    RB> experience xfs is inclined to automatically remount read-only
    RB> if it detects problems.  While this can be catastrophic for
    RB> running processes it is helpful in protecting data so I'm
    RB> happy it works this way.

    RB> One last note.  I hope you have good backups already.  If you
    RB> don't then this is the time to start taking good backups.

I have weekly backups with amanda. The tapes verify OK, but I never
tried a full-scale recover before. 

    RB> These are the notes from my backup talk:

    RB> http://www.timetraveller.org/talks/backup_talk.pdf

    >> I swear to god: I did not do anything else with the
    >> xfs_*-commands than the stuff mentioned above

    RB> I defrag XFS filesystems from cron as recommended by SGI and
    RB> I've never had a problem.  Maybe defragmentation didn't cause
    RB> the problem - maybe it just revealed an underlying problem.

But it doesn't have to do with the hole that the xfs_bmap reported for
that file?


<Prev in Thread] Current Thread [Next in Thread>