Bug 723 - xfs_repair does not fix corruption detected by xfs_check
: xfs_repair does not fix corruption detected by xfs_check
Status: VERIFIED FIXED
: XFS
xfsprogs
: unspecified
: PC Linux
: P2 normal
: ---
Assigned To:
:
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-10-13 12:28 CST by
Modified: 2006-11-01 14:01 CST (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-10-13 12:28:20 CST
I have a filesystem which has suffered corruption from the 2.6.17
kernel bug and tried to repair it using xfsprogs-2.8.11 on an x86_64
system.

Initially xfs_check has shown these errors:

agi unlinked bucket 7 is 387655 in ag 0 (inode=387655)
bad free block nused 0 should be 5 for dir ino 11191120 block 16777216
agi unlinked bucket 36 is 275940 in ag 2 (inode=8664548)
agi unlinked bucket 32 is 730784 in ag 3 (inode=13313696)
agi unlinked bucket 21 is 1642709 in ag 4 (inode=18419925)
agi unlinked bucket 35 is 1642659 in ag 4 (inode=18419875)
agi unlinked bucket 44 is 419884 in ag 4 (inode=17197100)
agi unlinked bucket 52 is 419956 in ag 4 (inode=17197172)
agi unlinked bucket 1 is 381697 in ag 5 (inode=21353217)
agi unlinked bucket 2 is 12098 in ag 5 (inode=20983618)
agi unlinked bucket 4 is 381828 in ag 5 (inode=21353348)
agi unlinked bucket 8 is 381960 in ag 5 (inode=21353480)
agi unlinked bucket 12 is 381900 in ag 5 (inode=21353420)
agi unlinked bucket 16 is 381968 in ag 5 (inode=21353488)
agi unlinked bucket 21 is 381909 in ag 5 (inode=21353429)
agi unlinked bucket 22 is 381974 in ag 5 (inode=21353494)
agi unlinked bucket 26 is 381850 in ag 5 (inode=21353370)
agi unlinked bucket 47 is 381935 in ag 5 (inode=21353455)
agi unlinked bucket 58 is 381946 in ag 5 (inode=21353466)
agi unlinked bucket 61 is 381821 in ag 5 (inode=21353341)
link count mismatch for inode 387655 (name ?), nlink 0, counted 2
allocated inode 8664548 has 0 link count
link count mismatch for inode 13313696 (name ?), nlink 0, counted 2
allocated inode 18419875 has 0 link count
allocated inode 18419925 has 0 link count
link count mismatch for inode 17150516 (name ?), nlink 7, counted 8
link count mismatch for inode 17197100 (name ?), nlink 0, counted 1
allocated inode 17197172 has 0 link count
allocated inode 21353217 has 0 link count
allocated inode 21353341 has 0 link count
allocated inode 21353348 has 0 link count
allocated inode 21353370 has 0 link count
allocated inode 21353420 has 0 link count
allocated inode 21353429 has 0 link count
allocated inode 21353455 has 0 link count
allocated inode 21353466 has 0 link count
allocated inode 21353469 has 0 link count
allocated inode 21353480 has 0 link count
allocated inode 21353488 has 0 link count
allocated inode 21353494 has 0 link count
allocated inode 20983617 has 0 link count
allocated inode 20983618 has 0 link count
allocated inode 20983663 has 0 link count

I had run xfs_repair and got this output:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
error following ag 5 unlinked list
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
free block 16777216 for directory inode 11191120 bad nused
rebuilding directory inode 11191120
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
disconnected dir inode 387655, moving to lost+found
disconnected inode 8664548, moving to lost+found
disconnected dir inode 13313696, moving to lost+found
disconnected dir inode 17197100, moving to lost+found
disconnected inode 17197172, moving to lost+found
disconnected inode 18419875, moving to lost+found
disconnected inode 18419925, moving to lost+found
disconnected inode 20983617, moving to lost+found
disconnected inode 20983618, moving to lost+found
disconnected inode 20983663, moving to lost+found
disconnected inode 21353217, moving to lost+found
disconnected inode 21353341, moving to lost+found
disconnected inode 21353348, moving to lost+found
disconnected inode 21353370, moving to lost+found
disconnected inode 21353420, moving to lost+found
disconnected inode 21353429, moving to lost+found
disconnected inode 21353455, moving to lost+found
disconnected inode 21353466, moving to lost+found
disconnected inode 21353469, moving to lost+found
disconnected inode 21353480, moving to lost+found
disconnected inode 21353488, moving to lost+found
disconnected inode 21353494, moving to lost+found
Phase 7 - verify and correct link counts...
cache_purge: shake on cache 0x63b0a0 left 1 nodes!?
resetting inode 387655 nlinks from 0 to 2
resetting inode 13313696 nlinks from 0 to 2
resetting inode 17197100 nlinks from 0 to 2
cache_purge: shake on cache 0x63b0a0 left 1 nodes!?
cache_purge: shake on cache 0x63b0a0 left 1 nodes!?
done

Then I mounted the filesystem and looked at lost+found - there were
some files in it, and 3 directories which looked like empty: 387655,
13313696, 17197100.  I have tried to delete one of these
directories, but got the "Directory not empty" error, and these
kernel messages:

xfs_inotobp: xfs_imap()  returned an error 22 on sda6.  Returning error.
xfs_iunlink_remove: xfs_inotobp()  returned an error 22 on sda6.  Returning error.
xfs_inactive:   xfs_ifree() returned an error = 22 on sda6
xfs_force_shutdown(sda6,0x1) called from line 1762 of file
fs/xfs/xfs_vnodeops.c.  Return address = 0xffffffff8811594b
Filesystem "sda6": I/O Error Detected.  Shutting down filesystem: sda6
Please umount the filesystem, and rectify the problem(s)
xfs_force_shutdown(sda6,0x1) called from line 338 of file fs/xfs/xfs_rw.c. 
Return address = 0xffffffff881195b8
xfs_force_shutdown(sda6,0x1) called from line 338 of file fs/xfs/xfs_rw.c. 
Return address = 0xffffffff881195b8

Then I umounted the filesystem and tried to run xfs_repair again,
but it did not fix the problem.

Now xfs_check finds some errors on the filesystem:

link count mismatch for inode 387655 (name ?), nlink 0, counted 2
link count mismatch for inode 13313696 (name ?), nlink 0, counted 2
link count mismatch for inode 17197100 (name ?), nlink 0, counted 2

xfs_repair seems to complete successfully:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - clearing existing "lost+found" inode
        - marking entry "lost+found" to be deleted
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
rebuilding directory inode 128
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
Phase 7 - verify and correct link counts...
done

However, xfs_check still finds the same errors, and the broken
directories cannot be deleted.

Here is some more information about the problematic filesystem:

# xfs_info /dev/sda6
meta-data=/dev/sda6              isize=256    agcount=8, agsize=251014 blks
         =                       sectsz=512   attr=1
data     =                       bsize=4096   blocks=2008112, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=4096, version=2
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

These directories are broken (I have renamed the lost+found directory to
"Broken", so that xfs_repair won't rebuild it every time - this did not help):

# xfs_bmap -vvl Broken/387655   
Broken/387655: no extents
# xfs_bmap -vvl Broken/13313696
Broken/13313696: no extents
# xfs_bmap -vvl Broken/17197100
Broken/17197100: no extents
------- Comment #1 From 2006-10-15 20:05:45 CST -------
Can you run the following commands on your filesystem:

# xfs_db -c "inode 387655" -c "print" /dev/sda6
# xfs_db -c "inode 13313696" -c "print" /dev/sda6
# xfs_db -c "inode 17197100" -c "print" /dev/sda6

and post the output.
------- Comment #2 From 2006-10-16 11:23:12 CST -------
# xfs_db -c "inode 387655" -c "print" /dev/sda6
core.magic = 0x494e
core.mode = 040755
core.version = 1
core.format = 1 (local)
core.nlinkv1 = 0
core.uid = 0
core.gid = 0
core.flushiter = 131
core.atime.sec = Fri Dec 30 21:57:29 2005
core.atime.nsec = 517230500
core.mtime.sec = Sat Dec 31 22:06:17 2005
core.mtime.nsec = 464569000
core.ctime.sec = Sat Dec 31 22:06:17 2005
core.ctime.nsec = 464569000
core.size = 6
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.gen = 0
next_unlinked = null
u.sfdir2.hdr.count = 0
u.sfdir2.hdr.i8count = 0
u.sfdir2.hdr.parent.i4 = 135

# xfs_db -c "inode 13313696" -c "print" /dev/sda6
core.magic = 0x494e
core.mode = 040755
core.version = 1
core.format = 1 (local)
core.nlinkv1 = 0
core.uid = 0
core.gid = 0
core.flushiter = 128
core.atime.sec = Fri Dec 30 21:57:29 2005
core.atime.nsec = 561233250
core.mtime.sec = Sat Dec 31 22:06:17 2005
core.mtime.nsec = 380563750
core.ctime.sec = Sat Dec 31 22:06:17 2005
core.ctime.nsec = 380563750
core.size = 6
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.gen = 0
next_unlinked = null
u.sfdir2.hdr.count = 0
u.sfdir2.hdr.i8count = 0
u.sfdir2.hdr.parent.i4 = 135

# xfs_db -c "inode 17197100" -c "print" /dev/sda6
core.magic = 0x494e
core.mode = 040755
core.version = 1
core.format = 1 (local)
core.nlinkv1 = 0
core.uid = 0
core.gid = 0
core.flushiter = 128
core.atime.sec = Fri Dec 30 21:57:29 2005
core.atime.nsec = 561233250
core.mtime.sec = Sat Dec 31 22:06:17 2005
core.mtime.nsec = 380563750
core.ctime.sec = Sat Dec 31 22:06:17 2005
core.ctime.nsec = 380563750
core.size = 6
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.gen = 0
next_unlinked = null
u.sfdir2.hdr.count = 0
u.sfdir2.hdr.i8count = 0
u.sfdir2.hdr.parent.i4 = 135
------- Comment #3 From 2006-10-16 23:27:20 CST -------
Sergay,

Could you do one more xfs_db command:

# xfs_db -c "inode 387655" -c "type text" -c "print" /dev/sda6

Thanks.
------- Comment #4 From 2006-10-17 11:13:25 CST -------
# xfs_db -c "inode 387655" -c "type text" -c "print" /dev/sda6
00:  49 4e 41 ed 01 01 00 00 00 00 00 00 00 00 00 00  INA.............
10:  00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 83  ................
20:  43 b5 83 19 1e d4 4f a4 43 b6 d6 a9 1b b0 c2 a8  C.....O.C.......
30:  43 b6 d6 a9 1b b0 c2 a8 00 00 00 00 00 00 00 06  C...............
40:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
50:  00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00  ................
60:  ff ff ff ff 00 00 00 00 00 87 00 00 00 00 00 0b  ................
70:  d4 e0 00 01 20 20 30 3a 30 30 3a 30 30 0d 0a 76  ......0.00.00..v
80:  73 75 2f 4d 61 69 6c 31 2f 41 4c 54 4c 69 6e 75  su.Mail1.ALTLinu
90:  78 2f 73 69 73 79 70 68 75 73 2f 32 31 38 33 37  x.sisyphus.21837
a0:  0d 0a 20 20 20 20 20 20 20 20 33 30 39 32 20 31  ..........3092.1
b0:  30 30 25 20 20 20 20 30 2e 30 30 6b 42 2f 73 20  00.....0.00kB.s.
c0:  20 20 20 30 3a 30 30 3a 30 30 0d 20 20 20 20 20  ...0.00.00......
d0:  20 20 20 33 30 39 32 20 31 30 30 25 20 20 20 20  ...3092.100.....
e0:  30 2e 30 30 6b 42 2f 73 20 20 20 20 30 3a 30 30  0.00kB.s....0.00
f0:  3a 30 30 0d 0a 76 73 75 2f 4d 61 69 6c 31 2f 41  .00..vsu.Mail1.A
------- Comment #5 From 2006-10-17 17:22:48 CST -------
Fix should be out in a few days. In the meantime, using -P command line
option with xfs_repair will fix the nlink count properly. 
------- Comment #6 From 2006-10-18 11:10:59 CST -------
Hmm, what does the -P option really do? It seems to be undocumented (at least in
xfsprogs-2.8.11).
------- Comment #7 From 2006-10-18 17:28:17 CST -------
It turns off prefetching in xfs_repair that was introduced in 2.8.11. 
The prefetching code is the source of the nlink bug in phase 7.
------- Comment #8 From 2006-10-26 19:31:20 CST -------
xfsprogs-2.8.15 source tarball is now available on oss ftp server.
------- Comment #9 From 2006-10-28 02:20:21 CST -------
Unfortunately, this version also does not fix the corruption:

# xfs_repair -V     
xfs_repair version 2.8.15

# xfs_check /dev/sda6
link count mismatch for inode 387655 (name ?), nlink 0, counted 2
link count mismatch for inode 13313696 (name ?), nlink 0, counted 2
link count mismatch for inode 17197100 (name ?), nlink 0, counted 2

# xfs_repair /dev/sda6
        - creating 2 worker thread(s)
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - 13:12:42: scanning filesystem freespace - 8 of 8 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 13:12:42: scanning agi unlinked lists - 8 of 8 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - 13:13:22: process known inodes and inode discovery - 292544 of 292544
inodes done
        - process newly discovered inodes...
        - 13:13:22: process newly discovered inodes - 8 of 8 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - clearing existing "lost+found" inode
        - marking entry "lost+found" to be deleted
        - 13:13:22: setting up duplicate extent list - 8 of 8 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - 13:13:22: check for inodes claiming duplicate blocks - 292544 of
292544 inodes done
Phase 5 - rebuild AG headers and trees...
        - 13:13:22: rebuild AG headers and trees - 8 of 8 allocation groups done
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
rebuilding directory inode 128
        - 13:13:57: traversing filesystem - 8 of 8 allocation groups done
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
Phase 7 - verify and correct link counts...
resetting inode 387655 nlinks from 0 to 2
resetting inode 13313696 nlinks from 0 to 2
resetting inode 17197100 nlinks from 0 to 2
        - 13:14:09: verify and correct link counts - 292544 of 292544 inodes done
done

# xfs_check /dev/sda6
link count mismatch for inode 387655 (name ?), nlink 0, counted 2
link count mismatch for inode 13313696 (name ?), nlink 0, counted 2
link count mismatch for inode 17197100 (name ?), nlink 0, counted 2
------- Comment #10 From 2006-10-29 23:01:04 CST -------
Argh, sorry about that. I have updated xfsprogs to 2.8.16.
------- Comment #11 From 2006-11-01 12:01:35 CST -------
Thanks - this version finally fixed the corruption, now xfs_check is silent and
I can remove these directories.