On Thu, 2002-09-19 at 01:12, Sidik Isani wrote:
> On Thu, Sep 19, 2002 at 02:22:02PM +1000, Nathan Scott wrote:
> > Hi,
> > On Wed, Sep 18, 2002 at 04:23:59PM -1000, Sidik Isani wrote:
> > >
> > > Anyway, the crash still happens right after "Starting XFS recovery".
> > > Please let me know what I should do next...
> > >
> > I seem to remember Steve fixing an endian problem in the log code
> > at one point several months ago - this may be what's biting you -
> > perhaps you have a little endian number somewhere in the on disk
> > log and the new kernel is expecting it is as big endian.
> > If so, the right thing to do would be to run recovery using the
> > old kernel - so first mount, then unmount the filesystem. This
> > should complete fine with the old kernel. Then run xfs_repair
> > on the filesystem. This will zero out the log in phase 2, so
> > you can start afresh. Once you've done those things you should
> > be able to mount and use this filesystem with the latest kernel.
> Great. That makes perfect sense. I guess it is not reasonable
> to expect the kernel itself to repair damage from previous bugs.
> Could it help you to have a copy of my log anyway, to at least
> make the new kernel detect this corruption in a more elegant way
> than crashing? I do see checks in there masking bits for valid
> version numbers and all. I wonder why these are not finding it.
> Another RAID with totally different data on it does the same thing.
> Thanks for solving my problem (well ... I'll let you know soon ;-)
> Be seeing you,
> - Sidik
This is what I fixed:
date: 2002/08/29 21:26:29; author: lord; state: Exp; lines: +2 -1
when processing unlinked inodes and dealing with the di_next_unlinked
field, endian flip it.
However, this code is purely in recovery dealing with what is in the
log, not with how the log is layed down. An old kernel would have
crashed in the same way on this problem. Basically it would arise on
a crash where the log had a lot of unlinked files to remove. There
were some earlier fixes, too, but nothing which would make things
worse so far as I can see.
I presume in your original message you are saying you have the
fs and the 2.4.16 kernel will mount it and the 2.4.19 one will not?
Is this something you have done more than one? If you are mounting
readonly, you must also be specifying norecovery, otherwise the first
mount will actually have done a recovery on the fs.
Probably too late for this, but if not, can you send a complete
backtrace, and the xlog printout you have, you might want to get new
commands first, there have been changes in logprint I think, we are at