[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kernel Oops RedHat 7.1 kernel-2.4.5 xfx-1.0.1



> (human oops). During bootup, the system oops'd when trying to mount the XFS
> partition we had grown. That disk remains unmountable, and will oops any of
> the three machines we have on the SAN when mounted (we had a backup of the
> data and sufficient extra, so we just left this partition of death for
> testing). The fourth oops just occurred today when I tried to xfsdump a
> partition. I've included the ksymoops output for each of the oops's in
> order.

I've seen this behaviour, too. The solution was to make sure the kernel
did not try to mount any XFS filesystem in single user mode and to
xfs_repair the filesystems.


-------- Original Message --------
Subject: Re: xfsdump estimates sometimes fail
Date: Sun, 29 Jul 2001 14:53:45 +0200
From: "Bernhard R. Erdmann" <be@berdmann.de>
To: "amanda-users@amanda.org" <amanda-users@amanda.org>
CC: jrj@cc.purdue.edu, Linux XFS Mailing List <linux-xfs@oss.sgi.com>
References: <200107232132.QAA03943@gandalf.cc.purdue.edu>
<3B5E6D9D.53541623@berdmann.de>

> > >on a particular host (Linux 2.4.6-prexy-xfs, RH 7.1 + LVM + XFS, Amanda
> > >2.4.2p2) sometimes sendsize fails to estimate sizes with xfsdump.
> 
> I recognized kernel oopses in syslog when Amanda tried to backup this
> host (maxdumps=3):
[...]
> Ok, I'll upgrade the kernel from 2.4.6-pre5-xfs to 2.4.7-xfs before any
> further investigation...

Boy, that was a story... 

- Upgraded to Kernel 2.4.7-xfs and rebooted.
- Mounting of /var failed with OOPS (sorry, no copy)
- Rebooted to single user mode
- "xfs_repair /dev/vg1/var" ran forever (> 30 min) with no output.
Instead, each RAID array on that Compaq ML 530 containing 13 hard disks
was read sequentially (that's what the LED's suggested).
- xfs_repair of other XFS filesystems showed that same behaviour
- no interruption possible, had to power off to reboot
- Commented out the XFS filesystems in /etc/fstab avoiding them to be
mounted by RedHat's init scripts even in single user mode - / was still
on ext2fs
- Reboot (i.e., power off)
- xfs_repair for the /var filesystem produced a very ugly output (sorry,
no copy)
- xfs_repair of other XFS filesystems was ok
- After editing /etc/fstab again I rebooted and went back to normal
operation

Conclusions:
- Under rare circumstances it is possible to crash the kernel by trying
to mount a damaged XFS filesystem with no chance to further xfs_repair
with that kernel

Many thanks to John R. Jackson for his help how to drive Amanda's
sendsize by hand.