Bug 389 - Untarring an archive corrupts XFS file system
: Untarring an archive corrupts XFS file system
Status: RESOLVED FIXED
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: 1.3.x
: All Linux
: critical
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-25 07:47 CST by James Pearson
Modified: 2004-11-26 10:29 CST (History)
0 users

See Also:


Attachments
List of filename that corrupt file system (22.34 KB, text/plain)
2004-11-25 08:15 CST, James Pearson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Pearson 2004-11-25 07:47:41 CST
I have a tar archive that when I extract the files, it results in a corrupt
directory inode.

This is repeatable on _every_ XFS file system I've tried (kernels 2.4.20, 2.4.22
with XFS 1.3.X and what ever version comes with 2.4.26)

It also happens on IRIX (6.5.19m) !!

The tar archive contains one directory with about 500 files.

If extract the files, then move the extracted directory to somewhere else on the
file system, I get an xfs_shutdown.

When I do an xfs_repair I get:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad directory leaf magic # 0x9a62 for directory inode 801 block 8388609
        - agno = 1
        - agno = 2
...
Phase 6 - check inode connectivity...
        - traversing filesystem starting at / ... 
unknown magic number 0x9a62 for block 8388609 in directory inode 801
        - traversal finished ... 
...


I think it has something to do with the order and 'nature' (name, size,
whatever) of the files being added to the directory - as shown by the following
tests:

untar all but the last file in the archive
umount
xfs_repair - no problem
mount
untar just the last file in the archive
umount
xfs_repair - unknown magic number 0x9a62 ...

However, if I do:

untar just the last file in the archive
umount
xfs_repair - no problem
mount
untar all but the last file in the archive
umount
xfs_repair - no problem

Or even:

untar all but the last file in the archive
umount
xfs_repair - no problem
mount
create any new file in problem directory
umount
xfs_repair - no problem
untar just the last file in the archive
umount
xfs_repair - no problem 

This tar archive was made from a repaired corrupted directory on another XFS
file system - the corruption was exactly the same.

I thought it strange that the tar archive 'kept' the corruption - very strange
as a tar archive doesn't contain anything XFS specific ... but it's not the tar
archive, it's the file creation nature and order that triggers the bug. I
believe tar creates archives with files in the 'directory' order i.e. the same
order that the files were created.

Unfortunately, the tar archive is 1.1Gb, so it is a bit difficult to make it
available - it also contains production data, so I don't want to make it widely
available - however I will make it available to the XFS developers (how??)

James Pearson
Comment 1 James Pearson 2004-11-25 08:15:42 CST
Created attachment 148 [details]
List of filename that corrupt file system

Just realized that I can reproduce the problem by just touching empty files -
so the following works without need for the tar archive:

cd /some/tmp/xfs/file/system
mkdir shd
for i in `cat corrupt.list`; do touch $i; done
Comment 2 Nathan Scott 2004-11-25 15:04:37 CST
Hi James,

Wow, a reproducible test case!  I'm not having any luck reproducing
it though - I'm following your second (touch) recipe but not seeing
the problem yet.  I also tried touching all but the last file in the
list, then umount/mount, and then touch the last file (which kind of
matches your earlier description I think) but that didn't cause any
problem either.

Could you try my recipe below and see if it fails for you?  If not,
is there a modified set of steps I can take to reproduce it?

thanks!


bruce /home/fsgqa# mkfs.xfs -f /dev/sdb7
meta-data=/dev/sdb7              isize=256    agcount=8, agsize=8031 blks
         =                       sectsz=512  
data     =                       bsize=4096   blocks=64248, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal log           bsize=4096   blocks=1200, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0
bruce /home/fsgqa# mount /dev/sdb7 /scratch/xfs1
bruce /home/fsgqa# cd !$
cd /scratch/xfs1
bruce /scratch/xfs1# mkdir shd
bruce /scratch/xfs1# sh
sh-2.05b# for i in `cat /tmp/corrupt.list`; do touch $i; done
sh-2.05b# exit
bruce /scratch/xfs1# cd
bruce /root# umount /scratch/xfs1 
bruce /root# ls /scratch/xfs
bruce /root# umount /scratch/xfs1 
bruce /root# xfs_repair /dev/sdb7
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
Phase 7 - verify and correct link counts...
done
bruce /root# 
Comment 3 Tim Shimmin 2004-11-25 16:20:10 CST
Sorry for missing any text when marking resolved - I'm used to a different
bug tracking system.
I created the files with your list of names and sure enough on
6.5.19 I got:
  10:50 tes@boing 28# ./bug148
  Cannot access /mnt/test/shd/bake3_read_0001.topShape3.shd.0017.tex: Filesystem
  is corrupted

However, I then tried on top-of-tree 6.5.27 and it worked fine.

It appears that his bug has been fixed.
And looking at the SGI database, it is likely to be fixed by
  pv#901151 - corruption in xfs dir2 "node" format directories
which was checked into IRIX in October 2003, 6.5.23.
It was also checked into the Linux/XFS tree in October 2003 as well.

--Tim
Comment 4 James Pearson 2004-11-26 08:29:30 CST
Thanks for the info.

Just for completeness, looks like the fix appeared in kernel 2.4.27 (I only
tested up to 2.4.26 ...).

This looks like the problem:

http://marc.theaimsgroup.com/?l=linux-xfs&m=108213125827763&w=2

and the fix:

http://marc.theaimsgroup.com/?l=bk-commits-24&m=108514191810699&w=2

James Pearson