xfs
[Top] [All Lists]

BUG 787427 - file system corrupts with page buf meta data on after unmou

To: cattelan@xxxxxxxxxxxxxxxxxxxx
Subject: BUG 787427 - file system corrupts with page buf meta data on after unmount/mount
From: pv@xxxxxxxxxxxxxxxxxxxxxx (mostek@xxxxxxx)
Date: Fri, 7 Apr 2000 12:38:20 -0700 (PDT)
Cc: linux-xfs@xxxxxxxxxxx
Reply-to: sgi.bugs.xfs@xxxxxxxxxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
Webexec: webpvsubmit,PvProjectIncident
Webpv: sgigate.sgi.com
View Incident: 
http://co-op.engr.sgi.com/BugWorks/code/bwxquery.cgi?search=Search&wlong=1&view_type=Bug&wi=787427

Submitter : mostek                    Submitter Domain : sgi.com            
Assigned Engineer : cattelan          Assigned Domain : engr                
Assigned Group : xfs-linux            Category : software                   
Customer Reported : F                 Priority : 1                          
Project : xfs-linux                   Status : open                         
Description :
I did a complete kernel build with page buf meta data on.
Then, I did unmount and mount and when trying to use
that file system again, it was corrupt. Here is the output from
xfs_repair -n:

[root@carlos /root]# xfs_repair -n /dev/hda1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
would zero unused portion of secondary superblock 0 sector
would zero unused portion of secondary superblock 1 sector
would zero unused portion of secondary superblock 2 sector
would zero unused portion of secondary superblock 3 sector
would zero unused portion of secondary superblock 4 sector
would zero unused portion of secondary superblock 5 sector
would zero unused portion of secondary superblock 6 sector
would zero unused portion of secondary superblock 7 sector
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
imap claims in-use inode 441017 is free, would correct imap
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem starting at / ... 
        - traversal finished ... 
        - traversing all unattached subtrees ... 
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
disconnected inode 441017, would move to lost+found
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
[root@carlos /root]# xfs_repair -n /dev/hda1
[root@carlos /root]# xfs_db /dev/hda1
xfs_db: mode = NATIVE, sim mode = NATIVE, arch = 1, magic = 0x58465342
xfs_db: inode 441017
xfs_db: p
core.magic = 0x494e
core.mode = 0100600
core.version = 1
core.format = 2 (extents)
core.nlinkv1 = 1
core.uid = 0
core.gid = 0
core.atime.sec = Thu Apr  6 22:32:15 2000
core.atime.nsec = 050000000
core.mtime.sec = Thu Apr  6 22:32:15 2000
core.mtime.nsec = 050000000
core.ctime.sec = Thu Apr  6 22:32:15 2000
core.ctime.nsec = 050000000
core.size = 0
core.nblocks = 16
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.gen = 0
next_unlinked = null
u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,170100,16,0]
xfs_db: quit

We can see this inode has blocks and should have been freed.

Steve claims that this corruption is due to unmount not flushing
out all the dirty buffers. I suspec that this inode was really freed
before.

Here is a mail exchange between Steve and I on this issue:

> 
> 
> I was running with page buf meta data turned on and XFSDEBUG and DEBUG.
> 
> I did lots of stuff on an XFS file system including building the
> kernel: make clean, make dep, ...
> 
> Then, I noticed with top that 113MB or the 128MB of memory were in use.
> I did an unmount and this went down to 30MBs. I'm guessing that we had
> lots in cache.
> 
> After the unmount, I did a mount and was going to run some performance
> numbers. The first file create resulted in the following ASSERT:
> 
> linux/fs/xfs/xfs_inode.c:xfs_ialloc()
> 
>       ASSERT(ip->i_d.di_nblocks == 0);
> 
> The newly allocated inode file has some blocks?
> 
> You mentioned that there is a known problem where we don't fully release
> page bufs when unmounting. Could this be related? Should I open a PV for this
?
> 
> Thanks,
> 
> Jim
> 


Yep, unmount is where it happens - if you cat /proc/slabinfo after an
unmount there are sometimes things left behind in xfs structures.

Russell was going to look into this one.

<Prev in Thread] Current Thread [Next in Thread>
  • BUG 787427 - file system corrupts with page buf meta data on after unmount/mount, mostek@xxxxxxx <=