Bugzilla – Bug 723
xfs_repair does not fix corruption detected by xfs_check
Last modified: 2006-11-01 14:01:35 CST
You need to log in before you can comment on or make changes to this bug.
I have a filesystem which has suffered corruption from the 2.6.17 kernel bug and tried to repair it using xfsprogs-2.8.11 on an x86_64 system. Initially xfs_check has shown these errors: agi unlinked bucket 7 is 387655 in ag 0 (inode=387655) bad free block nused 0 should be 5 for dir ino 11191120 block 16777216 agi unlinked bucket 36 is 275940 in ag 2 (inode=8664548) agi unlinked bucket 32 is 730784 in ag 3 (inode=13313696) agi unlinked bucket 21 is 1642709 in ag 4 (inode=18419925) agi unlinked bucket 35 is 1642659 in ag 4 (inode=18419875) agi unlinked bucket 44 is 419884 in ag 4 (inode=17197100) agi unlinked bucket 52 is 419956 in ag 4 (inode=17197172) agi unlinked bucket 1 is 381697 in ag 5 (inode=21353217) agi unlinked bucket 2 is 12098 in ag 5 (inode=20983618) agi unlinked bucket 4 is 381828 in ag 5 (inode=21353348) agi unlinked bucket 8 is 381960 in ag 5 (inode=21353480) agi unlinked bucket 12 is 381900 in ag 5 (inode=21353420) agi unlinked bucket 16 is 381968 in ag 5 (inode=21353488) agi unlinked bucket 21 is 381909 in ag 5 (inode=21353429) agi unlinked bucket 22 is 381974 in ag 5 (inode=21353494) agi unlinked bucket 26 is 381850 in ag 5 (inode=21353370) agi unlinked bucket 47 is 381935 in ag 5 (inode=21353455) agi unlinked bucket 58 is 381946 in ag 5 (inode=21353466) agi unlinked bucket 61 is 381821 in ag 5 (inode=21353341) link count mismatch for inode 387655 (name ?), nlink 0, counted 2 allocated inode 8664548 has 0 link count link count mismatch for inode 13313696 (name ?), nlink 0, counted 2 allocated inode 18419875 has 0 link count allocated inode 18419925 has 0 link count link count mismatch for inode 17150516 (name ?), nlink 7, counted 8 link count mismatch for inode 17197100 (name ?), nlink 0, counted 1 allocated inode 17197172 has 0 link count allocated inode 21353217 has 0 link count allocated inode 21353341 has 0 link count allocated inode 21353348 has 0 link count allocated inode 21353370 has 0 link count allocated inode 21353420 has 0 link count allocated inode 21353429 has 0 link count allocated inode 21353455 has 0 link count allocated inode 21353466 has 0 link count allocated inode 21353469 has 0 link count allocated inode 21353480 has 0 link count allocated inode 21353488 has 0 link count allocated inode 21353494 has 0 link count allocated inode 20983617 has 0 link count allocated inode 20983618 has 0 link count allocated inode 20983663 has 0 link count I had run xfs_repair and got this output: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... error following ag 5 unlinked list - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... free block 16777216 for directory inode 11191120 bad nused rebuilding directory inode 11191120 - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... disconnected dir inode 387655, moving to lost+found disconnected inode 8664548, moving to lost+found disconnected dir inode 13313696, moving to lost+found disconnected dir inode 17197100, moving to lost+found disconnected inode 17197172, moving to lost+found disconnected inode 18419875, moving to lost+found disconnected inode 18419925, moving to lost+found disconnected inode 20983617, moving to lost+found disconnected inode 20983618, moving to lost+found disconnected inode 20983663, moving to lost+found disconnected inode 21353217, moving to lost+found disconnected inode 21353341, moving to lost+found disconnected inode 21353348, moving to lost+found disconnected inode 21353370, moving to lost+found disconnected inode 21353420, moving to lost+found disconnected inode 21353429, moving to lost+found disconnected inode 21353455, moving to lost+found disconnected inode 21353466, moving to lost+found disconnected inode 21353469, moving to lost+found disconnected inode 21353480, moving to lost+found disconnected inode 21353488, moving to lost+found disconnected inode 21353494, moving to lost+found Phase 7 - verify and correct link counts... cache_purge: shake on cache 0x63b0a0 left 1 nodes!? resetting inode 387655 nlinks from 0 to 2 resetting inode 13313696 nlinks from 0 to 2 resetting inode 17197100 nlinks from 0 to 2 cache_purge: shake on cache 0x63b0a0 left 1 nodes!? cache_purge: shake on cache 0x63b0a0 left 1 nodes!? done Then I mounted the filesystem and looked at lost+found - there were some files in it, and 3 directories which looked like empty: 387655, 13313696, 17197100. I have tried to delete one of these directories, but got the "Directory not empty" error, and these kernel messages: xfs_inotobp: xfs_imap() returned an error 22 on sda6. Returning error. xfs_iunlink_remove: xfs_inotobp() returned an error 22 on sda6. Returning error. xfs_inactive: xfs_ifree() returned an error = 22 on sda6 xfs_force_shutdown(sda6,0x1) called from line 1762 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffff8811594b Filesystem "sda6": I/O Error Detected. Shutting down filesystem: sda6 Please umount the filesystem, and rectify the problem(s) xfs_force_shutdown(sda6,0x1) called from line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff881195b8 xfs_force_shutdown(sda6,0x1) called from line 338 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff881195b8 Then I umounted the filesystem and tried to run xfs_repair again, but it did not fix the problem. Now xfs_check finds some errors on the filesystem: link count mismatch for inode 387655 (name ?), nlink 0, counted 2 link count mismatch for inode 13313696 (name ?), nlink 0, counted 2 link count mismatch for inode 17197100 (name ?), nlink 0, counted 2 xfs_repair seems to complete successfully: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - clearing existing "lost+found" inode - marking entry "lost+found" to be deleted - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... rebuilding directory inode 128 - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done However, xfs_check still finds the same errors, and the broken directories cannot be deleted. Here is some more information about the problematic filesystem: # xfs_info /dev/sda6 meta-data=/dev/sda6 isize=256 agcount=8, agsize=251014 blks = sectsz=512 attr=1 data = bsize=4096 blocks=2008112, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=4096, version=2 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 These directories are broken (I have renamed the lost+found directory to "Broken", so that xfs_repair won't rebuild it every time - this did not help): # xfs_bmap -vvl Broken/387655 Broken/387655: no extents # xfs_bmap -vvl Broken/13313696 Broken/13313696: no extents # xfs_bmap -vvl Broken/17197100 Broken/17197100: no extents
Can you run the following commands on your filesystem: # xfs_db -c "inode 387655" -c "print" /dev/sda6 # xfs_db -c "inode 13313696" -c "print" /dev/sda6 # xfs_db -c "inode 17197100" -c "print" /dev/sda6 and post the output.
# xfs_db -c "inode 387655" -c "print" /dev/sda6 core.magic = 0x494e core.mode = 040755 core.version = 1 core.format = 1 (local) core.nlinkv1 = 0 core.uid = 0 core.gid = 0 core.flushiter = 131 core.atime.sec = Fri Dec 30 21:57:29 2005 core.atime.nsec = 517230500 core.mtime.sec = Sat Dec 31 22:06:17 2005 core.mtime.nsec = 464569000 core.ctime.sec = Sat Dec 31 22:06:17 2005 core.ctime.nsec = 464569000 core.size = 6 core.nblocks = 0 core.extsize = 0 core.nextents = 0 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.gen = 0 next_unlinked = null u.sfdir2.hdr.count = 0 u.sfdir2.hdr.i8count = 0 u.sfdir2.hdr.parent.i4 = 135 # xfs_db -c "inode 13313696" -c "print" /dev/sda6 core.magic = 0x494e core.mode = 040755 core.version = 1 core.format = 1 (local) core.nlinkv1 = 0 core.uid = 0 core.gid = 0 core.flushiter = 128 core.atime.sec = Fri Dec 30 21:57:29 2005 core.atime.nsec = 561233250 core.mtime.sec = Sat Dec 31 22:06:17 2005 core.mtime.nsec = 380563750 core.ctime.sec = Sat Dec 31 22:06:17 2005 core.ctime.nsec = 380563750 core.size = 6 core.nblocks = 0 core.extsize = 0 core.nextents = 0 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.gen = 0 next_unlinked = null u.sfdir2.hdr.count = 0 u.sfdir2.hdr.i8count = 0 u.sfdir2.hdr.parent.i4 = 135 # xfs_db -c "inode 17197100" -c "print" /dev/sda6 core.magic = 0x494e core.mode = 040755 core.version = 1 core.format = 1 (local) core.nlinkv1 = 0 core.uid = 0 core.gid = 0 core.flushiter = 128 core.atime.sec = Fri Dec 30 21:57:29 2005 core.atime.nsec = 561233250 core.mtime.sec = Sat Dec 31 22:06:17 2005 core.mtime.nsec = 380563750 core.ctime.sec = Sat Dec 31 22:06:17 2005 core.ctime.nsec = 380563750 core.size = 6 core.nblocks = 0 core.extsize = 0 core.nextents = 0 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.gen = 0 next_unlinked = null u.sfdir2.hdr.count = 0 u.sfdir2.hdr.i8count = 0 u.sfdir2.hdr.parent.i4 = 135
Sergay, Could you do one more xfs_db command: # xfs_db -c "inode 387655" -c "type text" -c "print" /dev/sda6 Thanks.
# xfs_db -c "inode 387655" -c "type text" -c "print" /dev/sda6 00: 49 4e 41 ed 01 01 00 00 00 00 00 00 00 00 00 00 INA............. 10: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 83 ................ 20: 43 b5 83 19 1e d4 4f a4 43 b6 d6 a9 1b b0 c2 a8 C.....O.C....... 30: 43 b6 d6 a9 1b b0 c2 a8 00 00 00 00 00 00 00 06 C............... 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 50: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................ 60: ff ff ff ff 00 00 00 00 00 87 00 00 00 00 00 0b ................ 70: d4 e0 00 01 20 20 30 3a 30 30 3a 30 30 0d 0a 76 ......0.00.00..v 80: 73 75 2f 4d 61 69 6c 31 2f 41 4c 54 4c 69 6e 75 su.Mail1.ALTLinu 90: 78 2f 73 69 73 79 70 68 75 73 2f 32 31 38 33 37 x.sisyphus.21837 a0: 0d 0a 20 20 20 20 20 20 20 20 33 30 39 32 20 31 ..........3092.1 b0: 30 30 25 20 20 20 20 30 2e 30 30 6b 42 2f 73 20 00.....0.00kB.s. c0: 20 20 20 30 3a 30 30 3a 30 30 0d 20 20 20 20 20 ...0.00.00...... d0: 20 20 20 33 30 39 32 20 31 30 30 25 20 20 20 20 ...3092.100..... e0: 30 2e 30 30 6b 42 2f 73 20 20 20 20 30 3a 30 30 0.00kB.s....0.00 f0: 3a 30 30 0d 0a 76 73 75 2f 4d 61 69 6c 31 2f 41 .00..vsu.Mail1.A
Fix should be out in a few days. In the meantime, using -P command line option with xfs_repair will fix the nlink count properly.
Hmm, what does the -P option really do? It seems to be undocumented (at least in xfsprogs-2.8.11).
It turns off prefetching in xfs_repair that was introduced in 2.8.11. The prefetching code is the source of the nlink bug in phase 7.
xfsprogs-2.8.15 source tarball is now available on oss ftp server.
Unfortunately, this version also does not fix the corruption: # xfs_repair -V xfs_repair version 2.8.15 # xfs_check /dev/sda6 link count mismatch for inode 387655 (name ?), nlink 0, counted 2 link count mismatch for inode 13313696 (name ?), nlink 0, counted 2 link count mismatch for inode 17197100 (name ?), nlink 0, counted 2 # xfs_repair /dev/sda6 - creating 2 worker thread(s) Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - 13:12:42: scanning filesystem freespace - 8 of 8 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - 13:12:42: scanning agi unlinked lists - 8 of 8 allocation groups done - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - 13:13:22: process known inodes and inode discovery - 292544 of 292544 inodes done - process newly discovered inodes... - 13:13:22: process newly discovered inodes - 8 of 8 allocation groups done Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - clearing existing "lost+found" inode - marking entry "lost+found" to be deleted - 13:13:22: setting up duplicate extent list - 8 of 8 allocation groups done - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - 13:13:22: check for inodes claiming duplicate blocks - 292544 of 292544 inodes done Phase 5 - rebuild AG headers and trees... - 13:13:22: rebuild AG headers and trees - 8 of 8 allocation groups done - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... rebuilding directory inode 128 - 13:13:57: traversing filesystem - 8 of 8 allocation groups done - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... resetting inode 387655 nlinks from 0 to 2 resetting inode 13313696 nlinks from 0 to 2 resetting inode 17197100 nlinks from 0 to 2 - 13:14:09: verify and correct link counts - 292544 of 292544 inodes done done # xfs_check /dev/sda6 link count mismatch for inode 387655 (name ?), nlink 0, counted 2 link count mismatch for inode 13313696 (name ?), nlink 0, counted 2 link count mismatch for inode 17197100 (name ?), nlink 0, counted 2
Argh, sorry about that. I have updated xfsprogs to 2.8.16.
Thanks - this version finally fixed the corruption, now xfs_check is silent and I can remove these directories.