xfs
[Top] [All Lists]

Re: Crashes in various ext2 functions while running xfstest/check

To: Chris Pascoe <c.pascoe@xxxxxxxxxxxxxx>
Subject: Re: Crashes in various ext2 functions while running xfstest/check
From: Steve Lord <lord@xxxxxxx>
Date: Thu, 24 May 2001 09:34:06 -0500
Cc: Steve Lord <lord@xxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: Message from Chris Pascoe <c.pascoe@csee.uq.edu.au> of "Thu, 24 May 2001 22:45:28 +1000." <Pine.GSO.4.33.0105242136050.21984-100000@mango.csee.uq.edu.au>
Sender: owner-linux-xfs@xxxxxxxxxxx
Hmm, thanks for the info, both the dumps come from lines where ext2 is
accessing the b_data field of buffers it just got back from cache for
the first time. It looks like the b_data field is NULL, this is a
legitimate state for b_data, but not for ext2 metadata buffers, so,
it may be a mix of ext2 and xfs running in parallel is part of the
problem here, but xfs does not mess with b_data directly, and does not
use the same buffer head based caching of metadata that ext2 does.

More to come on this one.

Steve

> 
> > This is a little scary, the xfs and ext2 filesystems are on different devic
> es,
> > otherwise I would question the raid driver, or are they? Your device number
> s
> > are a little different, where are the partitions you used for the tests?
> 
> Both the sda and sdb devices are separate RAID volumes off the same
> controller - sda is a RAID1 (dual 9GB drives) and sdb is a RAID5 (six 73GB
> drives).
> 
> Start mounting filesystem: sd(8,17)
> Ending clean XFS mount for filesystem: sd(8,17)
> Start mounting filesystem: sd(8,18)
> Ending clean XFS mount for filesystem: sd(8,18)
> 
> Those device numbers are strange?  Thye correlate to /dev/sdb1 and
> /dev/sdb2.  I was using:
>       TEST_DEV="/dev/sdb1"
>         TEST_DIR="/tst1"
>         SCRATCH_DEV="/dev/sdb2"
>         SCRATCH_MNT="/tst2"
> 
> > I have not seen anything like this before, it smells a little of a buffer
> > head getting stamped on. Would it be possible to add disassemblies of the
> > ext2 functions you have hit corruption in? i.e. just run gdb on vmlinux
> > and disassemble them, otherwise it is hard to figure out which memory
> > reference was at fault.
> 
> Sure, I am rerunning and will put up dissassemblies as I catch them.
> Unfortunately I 'cvs update'd and recompiled without saving the old
> vmlinux.  I have two new bt's up now, combined with disassemble's of the
> ext2 functions that were running when the oopses occured at
> http://www.csee.uq.edu.au/~chrisp/xfs/disasm-20010523.  I hope I've done
> the right thing!!
> 
> I'm compiling six new kernels on the machine at the moment, with a few
> combinations of highmem on/highmem off, Pentium II vs Pentium III
> selected, SMP vs Non-SMP, to see if changing any of these makes a
> difference.  I think I can possibly disable the RAID controller too,
> converting it back into a standard aic7xxx - will look into that hopefully
> tomorrow too.
> 
> Chris



<Prev in Thread] Current Thread [Next in Thread>