xfs
[Top] [All Lists]

xfs_repair dies with a fatal error

To: sandeen@xxxxxxxxxxxxxxxxxxxxxxxx
Subject: xfs_repair dies with a fatal error
From: Jeff Snyder <jeff@xxxxxxxxxxxxxxxxx>
Date: Wed, 24 Mar 2004 05:30:25 +0000
Cc: linux-xfs@xxxxxxxxxxx
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: KMail/1.6
Hi, 

I've been having some problems getting xfs_repair to work on one of my xfs 
filesystems.. after talking to Eric Sandeen on #xfs irc for a bit he 
suggested that I post my findings to the list, so here goes (:

the case: http://oss.sgi.com/archives/linux-xfs/2002-08/msg00243.html is 
similar to mine, the error message and the xfs_db stuff.

I'm getting the error "fatal error -- can't read block 0 for directory inode 
xxxxxxxx" when i try to run xfs_repair. The filesystem is on an evms volume, 
which is part of an LVM which is on a raid1 array. full output of xfs_repair 
is at:
http://caffeinated.me.uk/~jeff/xfs_repair_script

there were no hardware errors logged, and the output is fully repeatable.

I've taken gdb to xfs_repair, and the output is coming from phase6.c:2723 - 
               if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
                                XFS_DATA_FORK)) {
                        do_error(_("can't read block %u for directory inode "
                                   "%llu\n"),

digging deeper, libxfs_da_read_bufr is a wrapper around xfs_da_do_buf, and the 
error code is getting set in xfs_da_do_buf at xfs_da_btree.c:2082 - 
               error = mappedbno == -2 ? 0 : XFS_ERROR(EFSCORRUPTED);
                if (unlikely(error == EFSCORRUPTED)) {
                      if (xfs_error_level >= XFS_ERRLEVEL_LOW) {
                                .. error reporting code, didnt execute ..
                      }
                }
                goto exit0;
so xfs_da_do_buf returns EFSCORRUPTED, hence do_error gets called.

after forcing the error reporting code from the above snippet to execute, i 
got the extra output: 
"xfs_da_do_buf: bno 0"
"dir: inode 8713174".


Well, I hope thats useful to someone.. if it's fixable & my filesystem can be 
repaired, i'd love to know about it asap, so please cc me on all replies as 
I'm not on the list (-:

Cheers, 

Jeff

some background info on how this happened, for the bored:
I set up a 1-way raid1 mirror on evms, on a new hdd, and put a lvm on the 
raid1. then made various volumes  in the lvm, for my entire system 
(/, /user, /home, ....). I made xfs filesystems on all of these, and moved my 
files over. I then rebooted to a livecd, and added the 2nd partition to the 
1-way raid1 mirror with evms(n) to make it 2-way, the raid sync'd, and I 
rebooted, got lots of kernel oops's looking xfs-related as soon as / was 
mounted, I didnt even see init start. Went back to the livecd, the raid 
wanted to sync again, i let it, then xfs_check'd all the partitions - at 
least 4 of them were siginifigantly damaged, but all were repairable apart 
from /usr.


<Prev in Thread] Current Thread [Next in Thread>