Bugzilla – Bug 389
Untarring an archive corrupts XFS file system
Last modified: 2004-11-26 10:29:30 CST
I have a tar archive that when I extract the files, it results in a corrupt directory inode. This is repeatable on _every_ XFS file system I've tried (kernels 2.4.20, 2.4.22 with XFS 1.3.X and what ever version comes with 2.4.26) It also happens on IRIX (6.5.19m) !! The tar archive contains one directory with about 500 files. If extract the files, then move the extracted directory to somewhere else on the file system, I get an xfs_shutdown. When I do an xfs_repair I get: Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad directory leaf magic # 0x9a62 for directory inode 801 block 8388609 - agno = 1 - agno = 2 ... Phase 6 - check inode connectivity... - traversing filesystem starting at / ... unknown magic number 0x9a62 for block 8388609 in directory inode 801 - traversal finished ... ... I think it has something to do with the order and 'nature' (name, size, whatever) of the files being added to the directory - as shown by the following tests: untar all but the last file in the archive umount xfs_repair - no problem mount untar just the last file in the archive umount xfs_repair - unknown magic number 0x9a62 ... However, if I do: untar just the last file in the archive umount xfs_repair - no problem mount untar all but the last file in the archive umount xfs_repair - no problem Or even: untar all but the last file in the archive umount xfs_repair - no problem mount create any new file in problem directory umount xfs_repair - no problem untar just the last file in the archive umount xfs_repair - no problem This tar archive was made from a repaired corrupted directory on another XFS file system - the corruption was exactly the same. I thought it strange that the tar archive 'kept' the corruption - very strange as a tar archive doesn't contain anything XFS specific ... but it's not the tar archive, it's the file creation nature and order that triggers the bug. I believe tar creates archives with files in the 'directory' order i.e. the same order that the files were created. Unfortunately, the tar archive is 1.1Gb, so it is a bit difficult to make it available - it also contains production data, so I don't want to make it widely available - however I will make it available to the XFS developers (how??) James Pearson
Created attachment 148 [details] List of filename that corrupt file system Just realized that I can reproduce the problem by just touching empty files - so the following works without need for the tar archive: cd /some/tmp/xfs/file/system mkdir shd for i in `cat corrupt.list`; do touch $i; done
Hi James, Wow, a reproducible test case! I'm not having any luck reproducing it though - I'm following your second (touch) recipe but not seeing the problem yet. I also tried touching all but the last file in the list, then umount/mount, and then touch the last file (which kind of matches your earlier description I think) but that didn't cause any problem either. Could you try my recipe below and see if it fails for you? If not, is there a modified set of steps I can take to reproduce it? thanks! bruce /home/fsgqa# mkfs.xfs -f /dev/sdb7 meta-data=/dev/sdb7 isize=256 agcount=8, agsize=8031 blks = sectsz=512 data = bsize=4096 blocks=64248, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=1200, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 bruce /home/fsgqa# mount /dev/sdb7 /scratch/xfs1 bruce /home/fsgqa# cd !$ cd /scratch/xfs1 bruce /scratch/xfs1# mkdir shd bruce /scratch/xfs1# sh sh-2.05b# for i in `cat /tmp/corrupt.list`; do touch $i; done sh-2.05b# exit bruce /scratch/xfs1# cd bruce /root# umount /scratch/xfs1 bruce /root# ls /scratch/xfs bruce /root# umount /scratch/xfs1 bruce /root# xfs_repair /dev/sdb7 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done bruce /root#
Sorry for missing any text when marking resolved - I'm used to a different bug tracking system. I created the files with your list of names and sure enough on 6.5.19 I got: 10:50 tes@boing 28# ./bug148 Cannot access /mnt/test/shd/bake3_read_0001.topShape3.shd.0017.tex: Filesystem is corrupted However, I then tried on top-of-tree 6.5.27 and it worked fine. It appears that his bug has been fixed. And looking at the SGI database, it is likely to be fixed by pv#901151 - corruption in xfs dir2 "node" format directories which was checked into IRIX in October 2003, 6.5.23. It was also checked into the Linux/XFS tree in October 2003 as well. --Tim
Thanks for the info. Just for completeness, looks like the fix appeared in kernel 2.4.27 (I only tested up to 2.4.26 ...). This looks like the problem: http://marc.theaimsgroup.com/?l=linux-xfs&m=108213125827763&w=2 and the fix: http://marc.theaimsgroup.com/?l=bk-commits-24&m=108514191810699&w=2 James Pearson