On Thu, Sep 19, 2002 at 06:58:14AM -0500, Stephen Lord wrote:
> On Thu, 2002-09-19 at 01:12, Sidik Isani wrote:
> > On Thu, Sep 19, 2002 at 02:22:02PM +1000, Nathan Scott wrote:
> > > Hi,
> > >
> > > On Wed, Sep 18, 2002 at 04:23:59PM -1000, Sidik Isani wrote:
> > > >
> > > > Anyway, the crash still happens right after "Starting XFS recovery".
> > > > Please let me know what I should do next...
> > > >
> > >
> > > I seem to remember Steve fixing an endian problem in the log code
> > > at one point several months ago - this may be what's biting you -
> > > perhaps you have a little endian number somewhere in the on disk
> > > log and the new kernel is expecting it is as big endian.
> > >
> > > If so, the right thing to do would be to run recovery using the
> > > old kernel - so first mount, then unmount the filesystem. This
> > > should complete fine with the old kernel. Then run xfs_repair
> > > on the filesystem. This will zero out the log in phase 2, so
> > > you can start afresh. Once you've done those things you should
> > > be able to mount and use this filesystem with the latest kernel.
> >
> > Great. That makes perfect sense. I guess it is not reasonable
> > to expect the kernel itself to repair damage from previous bugs.
> > Could it help you to have a copy of my log anyway, to at least
> > make the new kernel detect this corruption in a more elegant way
> > than crashing? I do see checks in there masking bits for valid
> > version numbers and all. I wonder why these are not finding it.
> > Another RAID with totally different data on it does the same thing.
> > Thanks for solving my problem (well ... I'll let you know soon ;-)
> >
> > Be seeing you,
> >
> > - Sidik
> >
>
> This is what I fixed:
>
> revision 1.240
> date: 2002/08/29 21:26:29; author: lord; state: Exp; lines: +2 -1
> modid: 2.4.x-xfs:slinx:126410a
> when processing unlinked inodes and dealing with the di_next_unlinked
> field, endian flip it.
>
> However, this code is purely in recovery dealing with what is in the
> log, not with how the log is layed down. An old kernel would have
> crashed in the same way on this problem. Basically it would arise on
> a crash where the log had a lot of unlinked files to remove. There
> were some earlier fixes, too, but nothing which would make things
> worse so far as I can see.
>
> I presume in your original message you are saying you have the
> fs and the 2.4.16 kernel will mount it and the 2.4.19 one will not?
> Is this something you have done more than one? If you are mounting
> readonly, you must also be specifying norecovery, otherwise the first
> mount will actually have done a recovery on the fs.
>
> Probably too late for this, but if not, can you send a complete
> backtrace, and the xlog printout you have, you might want to get new
> commands first, there have been changes in logprint I think, we are at
> 2.3.3 now.
Ok. Two attachments. One is the backtrace from the serial console
from an attempt to mount with 2.4.19 (same thing happens with CVS
kernel from last week.) The last thing that happened to the filesystem
was a clean "Ending recovery" message from 2.4.16. It rebooted cleanly.
The second attachment is xfs_logprint output.
Please let me know what I should do next...
Thanks,
- Sidik
minicom.cap
Description: Text document
xfs_logprint.out
Description: Text document
|