View Incident:
http://co-op.engr.sgi.com/BugWorks/code/bwxquery.cgi?search=Search&wlong=1&view_type=Bug&wi=787427
Submitter : mostek Submitter Domain : sgi.com
Assigned Engineer : cattelan Assigned Domain : engr
Assigned Group : xfs-linux Category : software
Customer Reported : F Priority : 1
Project : xfs-linux Status : open
Description :
I did a complete kernel build with page buf meta data on.
Then, I did unmount and mount and when trying to use
that file system again, it was corrupt. Here is the output from
xfs_repair -n:
[root@carlos /root]# xfs_repair -n /dev/hda1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
would zero unused portion of secondary superblock 0 sector
would zero unused portion of secondary superblock 1 sector
would zero unused portion of secondary superblock 2 sector
would zero unused portion of secondary superblock 3 sector
would zero unused portion of secondary superblock 4 sector
would zero unused portion of secondary superblock 5 sector
would zero unused portion of secondary superblock 6 sector
would zero unused portion of secondary superblock 7 sector
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
imap claims in-use inode 441017 is free, would correct imap
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 441017, would move to lost+found
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
[root@carlos /root]# xfs_repair -n /dev/hda1
[root@carlos /root]# xfs_db /dev/hda1
xfs_db: mode = NATIVE, sim mode = NATIVE, arch = 1, magic = 0x58465342
xfs_db: inode 441017
xfs_db: p
core.magic = 0x494e
core.mode = 0100600
core.version = 1
core.format = 2 (extents)
core.nlinkv1 = 1
core.uid = 0
core.gid = 0
core.atime.sec = Thu Apr 6 22:32:15 2000
core.atime.nsec = 050000000
core.mtime.sec = Thu Apr 6 22:32:15 2000
core.mtime.nsec = 050000000
core.ctime.sec = Thu Apr 6 22:32:15 2000
core.ctime.nsec = 050000000
core.size = 0
core.nblocks = 16
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.gen = 0
next_unlinked = null
u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,170100,16,0]
xfs_db: quit
We can see this inode has blocks and should have been freed.
Steve claims that this corruption is due to unmount not flushing
out all the dirty buffers. I suspec that this inode was really freed
before.
Here is a mail exchange between Steve and I on this issue:
>
>
> I was running with page buf meta data turned on and XFSDEBUG and DEBUG.
>
> I did lots of stuff on an XFS file system including building the
> kernel: make clean, make dep, ...
>
> Then, I noticed with top that 113MB or the 128MB of memory were in use.
> I did an unmount and this went down to 30MBs. I'm guessing that we had
> lots in cache.
>
> After the unmount, I did a mount and was going to run some performance
> numbers. The first file create resulted in the following ASSERT:
>
> linux/fs/xfs/xfs_inode.c:xfs_ialloc()
>
> ASSERT(ip->i_d.di_nblocks == 0);
>
> The newly allocated inode file has some blocks?
>
> You mentioned that there is a known problem where we don't fully release
> page bufs when unmounting. Could this be related? Should I open a PV for this
?
>
> Thanks,
>
> Jim
>
Yep, unmount is where it happens - if you cat /proc/slabinfo after an
unmount there are sometimes things left behind in xfs structures.
Russell was going to look into this one.
|