Nathan Scott wrote:
> hi,
>
> On Nov 28, 10:25am, Thomas Graichen wrote:
> > Subject: Re: alpha again
> > "Nathan Scott" <nathans@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > ...
> > > heh - thats completely bogus. so the problem is in the kernel
> > > (xfs mount/umount code paths) after all.
> > ...
> > > my next best guess at the probable cause is that this may
> > > be a blocksize related problem. we know that the primary
> > > superblock is pretty much intact (otherwise xfs_db would have
> > > gone haywire) - but since its offset is at start of blk 0,
> > > we're always likely to get that right no matter what the page
> > > & blksizes are, I think.
> > ...
> > so looks like the umount code trashes things - this would also make
> > clear why xfs survives the dbench 64 - the filesystem seems to be
> > stable while operating and only gets trashed on umount ...
> >
>
> ok, i've read through the umount code and have a theory.
> (debugging by proxy is fun!) ;-)
>
> is there any chance that the device block size is being
> set back to 1024 at the end of the umount? i.e. at the
> end of linvfs_put_super(), is the set_blocksize() call
> being passed 1024? (throw a printk in there)
>
> if so, is there a chance we are still doing IO at the end
> of linvfs_put_super() -(Russell?)- in particular, is there
> any chance we could still be writing out the superblock
> after we've called set_blocksize() on the device?
Hmm I doubt it, linvfs_put_super through a series of vfs calls
fs_dounmount -> xfs_unmount calls XFS_bdflush, aka pagebuf_delwri_flush
which should force out any pending meta data buffers.
A few SYNCs have been through in along the way which should clean
up and io hooked to inodes. Of course all the inodes need to be gone
for the file system can be unbusy enough to unmount in the first place.
>
>
> i think this would produce the behavior you're seeing here
> - if the underlying device blocksize was 1024 and we wrote
> out the (512 byte) superblock thinking the blocksize was
> 512, well we'd end up putting random junk in the AGF since
> thats the next 512 bytes right after the superblock.
>
> if the blocksize does prove to be reset to something other
> than 512, Thomas, could you try commenting out everything
> between "/* Reset device block size */" and the end of the
> function (linvfs_put_super) - 3/4 lines - and see if you
> still see repair needing to fix the AGF after umount?
It is entirely possible the ag is getting trashed the first time the
super block is written out. The file system will run just
fine since most of the ag info is keep in memory and not
re-read from disk.
I suspect the partially valid page stuff is at fault.
Try this mount a fresh file system.
create a small file.
run xfs_db -r <device>
agf 0
from man page:
The AGF block is the header for block allocation
information; it is in the second 512-byte block
of each allocation group. The following fields
are defined:
magicnum: AGF block magic number, 0x58414746
('XAGF')
If the magic number is wrong then we have trashed the block right off
the bat.
Note in thinking about this you may want to wait 5 seconds or so to make
sure
the updates to the super block are written out.
>
>
> >> root@cyan:/usr/src/xfs/linux# xfs_repair /dev/sdb1
> >> Phase 1 - find and verify superblock...
> >> Phase 2 - using internal log
> >> - zero log...
> >> - scan filesystem freespace and inode maps...
> >> bad magic # 0x0 for agf 0
> >> bad version # -1 for agf 0
> >> bad length 0 for agf 0, should be 4142
> >> flfirst -2147483648 in agf 0 too large (max = 128)
> >> reset bad agf for ag 0
> >> freeblk count 1 != flcount 1084270339 in ag 0
> >> bad agbno 2966461184 for btbno root, agno 0
> >> bad agbno 16580607 for btbcnt root, agno 0
> >> - found root inode chunk
> >> Phase 3 - for each AG...
> >> - scan and clear agi unlinked lists...
> >> - process known inodes and perform inode discovery...
> >> - agno = 0
> >> - agno = 1
> >> - agno = 2
> >> - agno = 3
> >> ...
>
> thanks.
>
> --
> Nathan
|