On Sun, Jan 18, 2009 at 12:25:11AM +0100, Danny ter Haar wrote:
> Quoting Dave Chinner (david@xxxxxxxxxxxxx):
> > Sorry for not getting back to you sooner.
> No problem. I initally posted to LKLM, git redirected by Christoph to this
> list. I'm so stupid that i didn't check the other messages from this list.
> > I think that Alexander tripped over this same problem during his bisect.
> > If you follow the thread from here:
> > http://oss.sgi.com/archives/xfs/2009-01/msg00496.html
> Yep! [cheer] i'm not alone! :-)
> But why only us two ? there must be thousands of users out there using
> XFS. Why did it bite us ? large filesystem together with slow hardware ?
No idea - I can't reproduce it either so there's some state
that your filesystem is getting into that trips over it.
> > You'll see that Alexander had the same problem and managed
> > to continue the bisect once he copied the xfs_btree_trace.h
> > header file from top-of-tree back into the broken commits.
> > I hope this helps (and I hope that the bisect lands on the
> > same commit that it did for Alexander).
> Do you want me to still try it ?
> I think you allready figured out where the culprit is ?!
Yes, i think we have, but it wasn't totally conclusive. Can you
continue your bisect to see if it narrows down to the same commit
on your machine?
I'm still trying to reproduce it but I haven't worked out what the
initial state is. One thing that might be useful is to put a printk
into the kernel on the failure path that prints the inode number
out (e.g. at the goto that the WANT_CORRUPTED_GOTO jumps to). Then
we can use xfs_db to find the file that is causing the problem and
then use xfs_db or xfs_bmap to look at the extent tree prior to
the corruption. That might help me set up the initial state needed
to trip the problem.....