Bugzilla – Bug 631
xfs_repairs fails to recover filesystem (phase 6)
Last modified: 2006-08-03 20:22:37 CDT
Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... rebuilding directory inode 128 fatal error -- can't read block 16777216 for directory inode 151153880 At first I thought that the hard-drive had failed, but further tests showed that it was OK; also strace showed that xfs_repair managed to successfully read all blocks it asked the kernel for. I then noticed that the block number is 0x1000000 in hex, which seemed awfully suspicious, and began to suspect a bug in xfs_repair. Here's a backtrace at the failure point: #0 libxfs_da_do_buf (trans=0x0, dp=0x877dcd0, bno=16777216, mappedbnop=0xbfb1d49c, bpp=0xbfb1d4bc, whichfork=16777216, caller=2, ra=0x8072b7a) at xfs_da_btree.c:2033 #1 0x08084ed4 in libxfs_da_read_bufr (trans=0x0, dp=0x0, bno=0, mappedbno=-1, bpp=0x0, whichfork=0) at util.c:667 #2 0x08072b7a in longform_dir2_check_node (mp=0xbfb1d820, ip=0x877dcd0, hashtab=0xbfb1d4c0, freetab=0x877e408) at phase6.c:2218 #3 0x08074dc5 in longform_dir2_entry_check (mp=0xbfb1d820, ino=151153880, ip=0x877dcd0, num_illegal=0xbfb1d744, need_dot=0xbfb1d748, stack=0xbfb1d7d0, irec=0x8177344, ino_offset=56) at phase6.c:2739 #4 0x08078ab8 in process_dirstack (mp=0xbfb1d820, stack=0xbfb1d7d0) at phase6.c:3560 #5 0x08079333 in phase6 (mp=0xbfb1d820) at phase6.c:3968 #6 0x0808041c in main (argc=0, argv=0xbfb1d820) at xfs_repair.c:509 This call to xfs_da_map_covers_blocks in xfs_da_do_buf returns zero because mapp->br_startblock == HOLESTARTBLOCK (returning zero is bad): 2032 if (!xfs_da_map_covers_blocks(nmap, mapp, bno, nfsb)) { Here nmap==1, *mapp=={br_startoff = 16777216, br_startblock = 18446744073709551614, br_blockcount = 1, br_state = XFS_EXT_NORM}, bno==16777216, nfsb==1. Because mappedbno==-1, xfs_da_do_buf then returns EFSCORRUPTED, which causes longform_dir2_check_node to abort with a call to do_error: 2216 if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 2217 break; 2218 if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp, 2219 XFS_DATA_FORK)) { 2220 do_error(_("can't read block %u for directory inode " 2221 "%llu\n"), 2222 da_bno, ip->i_ino); At this point I got stuck. Given the fact that the failure occurs for a block number with a very special set of bits, I guess it could be due to an incorrect bit manipulation that only fails when given all zeros or something like that. Any ideas? I would really like to recover this filesystem... Best wishes, Duncan.
I'd confirm this bug, and it's being discussed on the LKML/xfs list: http://marc.theaimsgroup.com/?l=linux-xfs&m=115070339717508&w=2
Just for the record, I just encountered the same error, with the *same block number*: corbeau:~# xfs_repair /dev/sda3 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - clearing existing "lost+found" inode - deleting existing "lost+found" entry - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... free block 16777216 for directory inode 179378000 bad nused rebuilding directory inode 179378000 free block 16777216 for directory inode 44034189 bad nused rebuilding directory inode 44034189 free block 16777216 for directory inode 81055936 bad nused rebuilding directory inode 81055936 free block 16777216 for directory inode 91578785 bad nused rebuilding directory inode 91578785 free block 16777216 for directory inode 80884649 bad nused rebuilding directory inode 80884649 free block 16777216 for directory inode 80693163 bad nused rebuilding directory inode 80693163 free block 16777216 for directory inode 6687312 bad nused rebuilding directory inode 6687312 free block 16777216 for directory inode 230575923 bad nused rebuilding directory inode 230575923 free block 16777216 for directory inode 6654076 bad nused rebuilding directory inode 6654076 free block 16777216 for directory inode 24104399 bad nused rebuilding directory inode 24104399 free block 16777216 for directory inode 176567235 bad nused rebuilding directory inode 176567235 free block 16777216 for directory inode 23859213 bad nused rebuilding directory inode 23859213 free block 16777216 for directory inode 229773361 bad nused rebuilding directory inode 229773361 free block 16777216 for directory inode 79659586 bad nused rebuilding directory inode 79659586 free block 16777216 for directory inode 89913240 bad nused rebuilding directory inode 89913240 fatal error -- can't read block 16777216 for directory inode 23727780 This is a 2.6.15.7 kernel, with xfs_repair 2.6.20, on Debian Sarge. The filesystem is on top of a SATA drive: corbeau:~# xfs_info /dev/sda3 meta-data=/local isize=256 agcount=16, agsize=1037573 blks = sectsz=512 data = bsize=4096 blocks=16601168, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 -- Nicolas
xfsprogs 2.8.10 has fixed this problem.