On Fri, Mar 12, 2004 at 09:43:07PM -0600, Steve Lord wrote:
> Well, its a little wierd, looks like something is broken compilation
> wise in the recovery code, or we trampled on ourselves.
> We passed a null pointer in xfs_next_bit which is why it blew up,
> the caller does have a way of doing this:
> We definitely did not hit any of the expected cases in the switch
> statement and we passed null into xfs_next_bit().
> However, the only caller of this function does a check of the same
> data structure and will fail recovery if it does not get one of the
> recognized flags.
> So unless some of the code in between the two stamped on the memory
> in the mean time it looks like a compilation bug.
> Is this repeatable? We could probably come up with some extra checks in
> here to narrow it down. It might also be worth doing an
> xfs_logprint -t -i -b /dev/xxx
> if this happens again and sending the output. This would at least
> determine if what was on the disk was correct.
> Nathan, any other ideas?
A compiler bug isn't out of the question, though I don't see signs of
it in the disassembly (doesn't mean it's not happening).
I have to brew up some way to corrupt the fs and get it to try to
recover via mounting it to reproduce, which doesn't happen for ordinary
pulling the plug -type of affairs. It did recur 3 times when I once I
got it corrupted properly, so it's mostly a question of doing that. I
think it's related to heavy io going on when the plug is pulled.