Quoting Dave Chinner (david@xxxxxxxxxxxxx):
> Looking at this, I think there are two possibilities in terms of the
> problem being detected. We are modifying the inode BMBT here,
> so that means we have XFS_BTREE_ROOT_IN_INODE set. The corruption
> trigger has occurred because a xfs_btree_increment() call has
> returned a zero status. This means we failed here:
>
> 1324 /* Fail if we just went off the right edge of the tree. */
> 1325 xfs_btree_get_sibling(cur, block, &ptr, XFS_BB_RIGHTSIB);
> 1326 if (xfs_btree_ptr_is_null(cur, &ptr))
> 1327 goto out0;
>
> or here:
>
> 1351 /*
> 1352 * If we went off the root then we are either seriously
> 1353 * confused or have the tree root in an inode.
> 1354 */
> 1355 if (lev == cur->bc_nlevels) {
> 1356 if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> 1357 goto out0;
> 1358 ASSERT(0);
>
> i.e. we either fell off the right edge of the tree or went over the top
> of it.
> I can't really see how we've done either of those things unless the
> tree has been corrupted by a prior operation.
sounds logical.
First time when it happened i moved the primairy hd to sec ide connector,
connected
a seperate hard drive as new master, installed a fresh debian lenny on that
harddrive, ran xfs-repair on all xfs filesystems: no errors
> Given that each time it is aptitude that is causing the problem, can you
> prevent aptitude from running automatically on boot and run it manually?
> If you can reporduce the problem manually then we can move on to the
> next step....
I wasn't clear (obvioulsy)
This machine is besides my NAS also my apt-cacher-ng server for all my other
machines here at home. The easiest way to trigger the error is often by running
a simple "aptitude update; aptitude -d dist-upgrade"
So when it barfed i did the aptitude by hand.
And it checks everything from the cache at /var/cache/apt-cacher-ng
which is on sda6 (root filesystem on XFS)
So it doesn't "barf" right on boot, it takes a few minutes or even hours:
filer1:~# last -20 reboot
reboot system boot 2.6.28-git2-d Thu Jan 8 12:00 - current(05:18)
reboot system boot 2.6.28-git3-d Thu Jan 8 11:31 - 11:59 (00:27)
reboot system boot 2.6.28-git3-d Thu Jan 8 10:56 - 11:59 (01:02)
reboot system boot 2.6.28-git3-d Thu Jan 8 10:44 - 10:54 (00:10)
reboot system boot 2.6.28-git3-d Thu Jan 8 10:30 - 10:43 (00:12)
reboot system boot 2.6.28-git2 Wed Jan 7 15:08 - 10:28 (19:19)
reboot system boot 2.6.28-git9-d Wed Jan 7 12:29 - 14:58 (02:29)
reboot system boot 2.6.28-git2 Wed Jan 7 10:08 - 12:27 (02:19)
reboot system boot 2.6.28-git9 Wed Jan 7 09:21 - 10:06 (00:45)
reboot system boot 2.6.28-git9 Wed Jan 7 08:42 - 10:06 (01:24)
reboot system boot 2.6.28-git2 Tue Jan 6 21:45 - 08:40 (10:55)
reboot system boot 2.6.28-git4 Tue Jan 6 21:27 - 08:40 (11:13)
reboot system boot 2.6.28-git4 Tue Jan 6 21:22 - 08:40 (11:18)
Sometimes the kernel barfes while accessing /dev/sdb1 of /dev/sdc1
which is only accessed using samba.
I can once more install the "other" debian lenny harddrive, boot from there
and than manually do an xfs_repair on xfs filesystems.
I can than boot a kernel that is know to barf and try to get it to barf.
> > So (in my case) something while going from git2 -> git3 didn't go positive.
> That would have been when Linus did the XFS pull...
Do you want me to figure out what patch from git2->git3 is the cullprit ?
I'll have to compile/reboot for a while.
Tell me what else i can do to resolve this.
Danny
--
|