Received: by oss.sgi.com id ; Wed, 29 Nov 2000 14:18:28 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:44611 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 29 Nov 2000 14:18:10 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via SMTP id OAA02629 for ; Wed, 29 Nov 2000 14:26:09 -0800 (PST) mail_from (nathans@wobbly.melbourne.sgi.com) Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA15330; Thu, 30 Nov 2000 09:16:50 +1100 Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (980427.SGI.8.8.8/980728.SGI.AUTOCF) id JAA74873; Thu, 30 Nov 2000 09:16:48 +1100 (EDT) From: "Nathan Scott" Message-Id: <10011300916.ZM156274@wobbly.melbourne.sgi.com> Date: Thu, 30 Nov 2000 09:16:46 -0400 In-Reply-To: Thomas Graichen "Re: alpha again" (Nov 28, 10:25am) References: <10011261353.ZM165451@wobbly.melbourne.sgi.com> <10011280910.ZM168487@wobbly.melbourne.sgi.com> X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: Thomas Graichen , thomas.graichen@innominate.de Subject: Re: alpha again Cc: linux-xfs@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;linux-xfs-outgoing hi, On Nov 28, 10:25am, Thomas Graichen wrote: > Subject: Re: alpha again > "Nathan Scott" wrote: > ... > > heh - thats completely bogus. so the problem is in the kernel > > (xfs mount/umount code paths) after all. > ... > > my next best guess at the probable cause is that this may > > be a blocksize related problem. we know that the primary > > superblock is pretty much intact (otherwise xfs_db would have > > gone haywire) - but since its offset is at start of blk 0, > > we're always likely to get that right no matter what the page > > & blksizes are, I think. > ... > so looks like the umount code trashes things - this would also make > clear why xfs survives the dbench 64 - the filesystem seems to be > stable while operating and only gets trashed on umount ... > ok, i've read through the umount code and have a theory. (debugging by proxy is fun!) ;-) is there any chance that the device block size is being set back to 1024 at the end of the umount? i.e. at the end of linvfs_put_super(), is the set_blocksize() call being passed 1024? (throw a printk in there) if so, is there a chance we are still doing IO at the end of linvfs_put_super() -(Russell?)- in particular, is there any chance we could still be writing out the superblock after we've called set_blocksize() on the device? i think this would produce the behavior you're seeing here - if the underlying device blocksize was 1024 and we wrote out the (512 byte) superblock thinking the blocksize was 512, well we'd end up putting random junk in the AGF since thats the next 512 bytes right after the superblock. if the blocksize does prove to be reset to something other than 512, Thomas, could you try commenting out everything between "/* Reset device block size */" and the end of the function (linvfs_put_super) - 3/4 lines - and see if you still see repair needing to fix the AGF after umount? >> root@cyan:/usr/src/xfs/linux# xfs_repair /dev/sdb1 >> Phase 1 - find and verify superblock... >> Phase 2 - using internal log >> - zero log... >> - scan filesystem freespace and inode maps... >> bad magic # 0x0 for agf 0 >> bad version # -1 for agf 0 >> bad length 0 for agf 0, should be 4142 >> flfirst -2147483648 in agf 0 too large (max = 128) >> reset bad agf for ag 0 >> freeblk count 1 != flcount 1084270339 in ag 0 >> bad agbno 2966461184 for btbno root, agno 0 >> bad agbno 16580607 for btbcnt root, agno 0 >> - found root inode chunk >> Phase 3 - for each AG... >> - scan and clear agi unlinked lists... >> - process known inodes and perform inode discovery... >> - agno = 0 >> - agno = 1 >> - agno = 2 >> - agno = 3 >> ... thanks. -- Nathan