[Top] [All Lists]

Re: problems showing up as XFS problems on kernels after 2.6.28-git2

To: Danny ter Haar <dth@xxxxxxx>
Subject: Re: problems showing up as XFS problems on kernels after 2.6.28-git2
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 9 Jan 2009 11:46:09 +1100
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20090108215602.GA24479@xxxxxxx>
Mail-followup-to: Danny ter Haar <dth@xxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
References: <20090107165218.GA11132@xxxxxxx> <20090107180246.GA15218@xxxxxxxxxxxxx> <20090107182415.GA12039@xxxxxxx> <20090107183115.GA6261@xxxxxxxxxxxxx> <20090107184420.GA15653@xxxxxxx> <20090107185628.GA19255@xxxxxxxxxxxxx> <20090108215602.GA24479@xxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Thu, Jan 08, 2009 at 10:56:02PM +0100, Danny ter Haar wrote:
> I needed the parallel port driver so i compiled 2.6.28-git3 with debug info.
> It barfed: http://www.dth.net/kernel/c3/netconsole_2.6.28-git3-d.txt

Looking at this, I think there are two possibilities in terms of the
problem being detected. We are modifying the inode BMBT here,
so that means we have XFS_BTREE_ROOT_IN_INODE set. The corruption
trigger has occurred because a xfs_btree_increment() call has
returned a zero status. This means we failed here:

1324         /* Fail if we just went off the right edge of the tree. */
1325         xfs_btree_get_sibling(cur, block, &ptr, XFS_BB_RIGHTSIB);
1326         if (xfs_btree_ptr_is_null(cur, &ptr))
1327                 goto out0;

or here:

1351         /*
1352          * If we went off the root then we are either seriously
1353          * confused or have the tree root in an inode.
1354          */
1355         if (lev == cur->bc_nlevels) {
1356                 if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
1357                         goto out0;
1358                 ASSERT(0);

i.e. we either fell off the right edge of the tree or went over the top
of it.

I can't really see how we've done either of those things unless the
tree has been corrupted by a prior operation.

Given that each time it is aptitude that is causing the problem, can you
prevent aptitude from running automatically on boot and run it manually?
If you can reporduce the problem manually then we can move on to the
next step....

> So (in my case) something while going from git2 -> git3 didn't go positive.

That would have been when Linus did the XFS pull...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>