xfs
[Top] [All Lists]

Re: XFS Mount Recovery Failed on Root File System After Power Outage

To: "Dave Chinner" <david@xxxxxxxxxxxxx>
Subject: Re: XFS Mount Recovery Failed on Root File System After Power Outage
From: "Chin Gim Leong" <CHIN_Gim_Leong@xxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 31 Aug 2012 12:24:21 +0800
Cc: xfs@xxxxxxxxxxx
Importance: Normal
In-reply-to: <20120830225826.GG15292@dastard>
References: <0e85aee5ff82e567e872230ef416766a.squirrel@xxxxxxxxxxxxxxxxxxxx> <20120830225826.GG15292@dastard>
Sender: "Chin Gim Leong" <CHIN_Gim_Leong@xxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: SquirrelMail/1.4.19
Thanks Dave for the reply.

>> I just had a power outage and my Acer notebook computer running openSUSE
>> 11.4 x86_64 with kernel 2.6.37 stopped as a result.
>
> What mount options? What is the output of /proc/mounts? What are the
> messages in dmesg when you mount a partition?
>

The dmesg messages after the mount failure are exactly as in
/var/log/messages:

XFS: Invalid block length (0xfffffffc) given for buffer
XFS: log mount/recovery failed: error 117

The mount options I used, I do not remember the fstab options I put in for
root, but any way, whether with fstab or a manual mount, the mount fails.
The OS actually starts with a read-only mount of root file system, and the
rest of the system refuses to start of course.

I will put in an upddate to this thread after I repair the file system,
maybe tomorrow, about the mount options in fstab.

I can do a manual mount with ready only and no recovery option, and I can
traverse the directories.  In fact the /var/log/message is still there and
readable with entries up to 5 min before the power outage.

>
>> I would like to know the cause of this log recovery failure and if there
>
> You've got an old, unsupported kernel that was going through
> significant changes to the log code at the time, so it may not be
> possible to work out what the problem was. It seems likely that the
> power loss caused the disk not to write everything it should have to
> the log - laptops are not supposed to just lose power because
> they have a built in UPS (i.e. battery)....
>

My note book is connected to the mains, and there is no battery.  The Acer
user manual advises removing the battery when one is connected to the
mains, since repeated charging of battery will shorten its life span.

I have had servers and workstations with SLES 10 SP2, SLED 10 SP2, SLED
11, suffer numerous power outages in the labs (a repeated occurence
there), and never with file system error on start-up.  The LSI internal
RAID controllers disable the SAS drive caches, and the Areca RAID
controller has battery backed DRAM and disabled SATA drive cache.  But
there is one PC with five SATA WD Caviar Black 1 TB drives on the
motherboard AMD chipset with MD RAID RAID 0, and drive cache enabled.  The
XFS barrier is supposed to take care of drive cache issue, but I guess I
am unlucky this time.

Maybe disabling drive cache write-back is the solution for safety, but I
have tested numerous desktop SATA drives in the labs, there is a big big
drop in sequential write performance when the cache is set to
write-through.  There is no file system, just direct writes to the
sectors.  Hard drives seem to really need write-back for performance.

> The files in lost+found are numbered by their inode number. You need
> to look at the contents of them to determine where they came from.
>

Inode numbers are not informative, right?
Text files are readable, but binaries......, I will have no clue, and if I
delete them, who knows if I am deleting some thing really important?

Intuitively, the only files in root file system (by the way, root also
contains /boot) that are open for writing are those in logs and /tmp and
/var/tmp, I hope that is the case and I can safely discard those in
lost+found.

I also hope that the inode link clean-ups done by xfs_repair do not
actually remove any files that are really there.


GL

<Prev in Thread] Current Thread [Next in Thread>