Bug 631 - xfs_repairs fails to recover filesystem (phase 6)
: xfs_repairs fails to recover filesystem (phase 6)
Status: RESOLVED FIXED
Product: XFS
Classification: Unclassified
Component: xfsprogs
: Current
: PC Linux
: P2 normal
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-14 09:21 CDT by Duncan Sands
Modified: 2006-08-03 20:22 CDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan Sands 2006-04-14 09:21:40 CDT
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
rebuilding directory inode 128

fatal error -- can't read block 16777216 for directory inode 151153880

At first I thought that the hard-drive had failed, but further tests
showed that it was OK; also strace showed that xfs_repair managed to
successfully read all blocks it asked the kernel for.  I then noticed
that the block number is 0x1000000 in hex, which seemed awfully
suspicious, and began to suspect a bug in xfs_repair.  Here's a
backtrace at the failure point:

#0  libxfs_da_do_buf (trans=0x0, dp=0x877dcd0, bno=16777216,
mappedbnop=0xbfb1d49c, bpp=0xbfb1d4bc,
    whichfork=16777216, caller=2, ra=0x8072b7a) at xfs_da_btree.c:2033
#1  0x08084ed4 in libxfs_da_read_bufr (trans=0x0, dp=0x0, bno=0, mappedbno=-1,
bpp=0x0, whichfork=0) at util.c:667
#2  0x08072b7a in longform_dir2_check_node (mp=0xbfb1d820, ip=0x877dcd0,
hashtab=0xbfb1d4c0, freetab=0x877e408)
    at phase6.c:2218
#3  0x08074dc5 in longform_dir2_entry_check (mp=0xbfb1d820, ino=151153880,
ip=0x877dcd0, num_illegal=0xbfb1d744,
    need_dot=0xbfb1d748, stack=0xbfb1d7d0, irec=0x8177344, ino_offset=56) at
phase6.c:2739
#4  0x08078ab8 in process_dirstack (mp=0xbfb1d820, stack=0xbfb1d7d0) at
phase6.c:3560
#5  0x08079333 in phase6 (mp=0xbfb1d820) at phase6.c:3968
#6  0x0808041c in main (argc=0, argv=0xbfb1d820) at xfs_repair.c:509

This call to xfs_da_map_covers_blocks in xfs_da_do_buf returns zero
because mapp->br_startblock == HOLESTARTBLOCK (returning zero is bad):

2032            if (!xfs_da_map_covers_blocks(nmap, mapp, bno, nfsb)) {

Here nmap==1, *mapp=={br_startoff = 16777216, br_startblock =
18446744073709551614, br_blockcount = 1, br_state = XFS_EXT_NORM},
bno==16777216, nfsb==1.

Because mappedbno==-1, xfs_da_do_buf then returns EFSCORRUPTED,
which causes longform_dir2_check_node to abort with a call to
do_error:

2216                    if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno,
XFS_DATA_FORK))
2217                            break;
2218                    if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
2219                                    XFS_DATA_FORK)) {
2220                            do_error(_("can't read block %u for directory
inode "
2221                                       "%llu\n"),
2222                                    da_bno, ip->i_ino);

At this point I got stuck.  Given the fact that the failure occurs
for a block number with a very special set of bits, I guess it could
be due to an incorrect bit manipulation that only fails when given
all zeros or something like that.

Any ideas?  I would really like to recover this filesystem...

Best wishes,

Duncan.
Comment 1 Avuton Olrich 2006-06-20 10:10:00 CDT
I'd confirm this bug, and it's being discussed on the LKML/xfs list: 

http://marc.theaimsgroup.com/?l=linux-xfs&m=115070339717508&w=2
Comment 2 Nicolas Kowalski 2006-07-04 03:32:40 CDT
Just for the record, I just encountered the same error, with the *same block
number*:

corbeau:~# xfs_repair /dev/sda3
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - clearing existing "lost+found" inode
        - deleting existing "lost+found" entry
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
free block 16777216 for directory inode 179378000 bad nused
rebuilding directory inode 179378000
free block 16777216 for directory inode 44034189 bad nused
rebuilding directory inode 44034189
free block 16777216 for directory inode 81055936 bad nused
rebuilding directory inode 81055936
free block 16777216 for directory inode 91578785 bad nused
rebuilding directory inode 91578785
free block 16777216 for directory inode 80884649 bad nused
rebuilding directory inode 80884649
free block 16777216 for directory inode 80693163 bad nused
rebuilding directory inode 80693163
free block 16777216 for directory inode 6687312 bad nused
rebuilding directory inode 6687312
free block 16777216 for directory inode 230575923 bad nused
rebuilding directory inode 230575923
free block 16777216 for directory inode 6654076 bad nused
rebuilding directory inode 6654076
free block 16777216 for directory inode 24104399 bad nused
rebuilding directory inode 24104399
free block 16777216 for directory inode 176567235 bad nused
rebuilding directory inode 176567235
free block 16777216 for directory inode 23859213 bad nused
rebuilding directory inode 23859213
free block 16777216 for directory inode 229773361 bad nused
rebuilding directory inode 229773361
free block 16777216 for directory inode 79659586 bad nused
rebuilding directory inode 79659586
free block 16777216 for directory inode 89913240 bad nused
rebuilding directory inode 89913240

fatal error -- can't read block 16777216 for directory inode 23727780



This is a 2.6.15.7 kernel, with xfs_repair 2.6.20, on Debian Sarge.
The filesystem is on top of a SATA drive:

corbeau:~# xfs_info /dev/sda3
meta-data=/local                 isize=256    agcount=16, agsize=1037573 blks
         =                       sectsz=512  
data     =                       bsize=4096   blocks=16601168, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

-- 
Nicolas
Comment 3 Barry Naujok 2006-08-03 18:22:37 CDT
xfsprogs 2.8.10 has fixed this problem.